Apple, Salesforce, Anthropic, and others used 'the Pile,' a dataset including YouTube captions from over 173,000 videos, without creator consent, potentially violating YouTube's terms.
Apple sourced data from a company that scraped YouTube videos, including content from popular YouTubers. Despite not collecting data directly, implications of using such data raise ethical concerns and may persist.
Collection
[
|
...
]