OpenAI Secretly Trained GPT-4 With More Than a Million Hours of Transcribed YouTube Videos
Briefly

We used publicly available data and licensed data. So, videos on YouTube?
It's yet another data point illustrating how AI companies are relying on massive amounts of murky and possibly copyright-infringing data to train their models.
The practice has already led to a number of lawsuits, with rightsholders accusing companies including OpenAI and Microsoft of misattributing their practices to 'fair use,' a doctrine of US copyright law.
If OpenAI had in fact trained Sora on YouTube videos, that would be a 'clear violation' of the video platform.
Read at Futurism
[
add
]
[
|
|
]