An AI Watchdog accused OpenAI of using copyrighted books without permission
Briefly

The AI Disclosures Project has accused OpenAI of unauthorized training of its GPT-4o model on copyrighted works from O'Reilly Media. By analyzing a dataset of 34 O'Reilly books, the researchers found that GPT-4o exhibited notable familiarity with non-public material, indicating potential copyright infringement. The study underscores the need for clearer corporate transparency and licensing frameworks in AI training methodologies. While OpenAI has pursued licensing agreements, concerns continue to grow over its data sourcing practices in the rapidly evolving AI landscape.
An artificial intelligence watchdog is accusing OpenAI of training its default ChatGPT model on copyrighted book content without permission.
The researchers used a legally obtained dataset of 34 copyrighted O'Reilly books and found that GPT-4o showed 'strong recognition' of the company's paywalled content.
These results highlight the urgent need for increased corporate transparency regarding pre-training data sources as a means to develop formal licensing frameworks for AI content training.
The researchers acknowledged limitations in their study but argued the issue is likely part of a broader systemic problem in how large language models are developed.
Read at Fast Company
[
|
]