Beware of AI 'model collapse': How training on synthetic data pollutes the next generation
Briefly

Oxford University scholars warn that using synthetic data to train gen AI can drastically degrade model accuracy, potentially rendering them useless.
Model collapse, a degenerative process described by Ilia Shumailov's team, results in generative models generating data that pollutes subsequent training sets, leading to misperception of reality.
Over generations, models fed synthetic data lose track of less-common facts, becoming generic and producing irrelevant outputs that turn into gibberish.
Read at ZDNET
[
|
]