AI Companies Running Out of Training Data After Burning Through Entire Internet
Briefly

Some companies are looking for alternative sources of data training now that the internet is growing too small, with things like publicly-available video transcripts and even AI-generated synthetic data as options.
OpenAI and Anthropic are exploring the use of synthetic data to train AI models, aiming to avoid issues like 'model collapse' by creating higher-quality synthetic data.
Anthropic admitted that its Claude 3 LLM model was trained on 'data we generate internally,' indicating a move towards more controlled synthetic data usage.
Concerns about AI firms facing a data shortage are prompting exploration of novel and sometimes controversial means of data training.
Read at Futurism
[
add
]
[
|
|
]