AI models collapse when trained on recursively generated data - Nature
Briefly

The development of LLMs requires massive amounts of training data. If future models train on data from existing models like GPT, 'model collapse' can occur, leading to loss of true data distribution.
Model collapse results in models forgetting the true data distribution, converging to a point estimate with minimal variance over successive generations, even without a shift in the distribution.
Close concepts like catastrophic forgetting or data poisoning do not fully explain model collapse. Access to original data distribution is crucial, especially in tasks where distribution tails are significant.
Read at Nature
[
|
]