Tim O'Reilly alleges that OpenAI used copyrighted materials from his publishing house to train their AI model GPT-4o without obtaining permission. In light of ongoing lawsuits concerning the use of copyrighted data, he partners with co-authors on a study exploring this issue. Their research utilizes a method of questioning known as DE-COP inference attacks to determine if the model had incorporated text from 34 O'Reilly books. As a result, they found evidence that raises serious concerns regarding copyright practices in AI training.
The study revealed that OpenAI's GPT-4o model potentially used O'Reilly Media's copyrighted books as training data without consent, igniting concerns over copyright violations.
Tim O'Reilly and his co-authors conducted rigorous testing, indicating that GPT-4o had absorbed content from their books, which raises significant legal and ethical questions.
OpenAI maintains it has done nothing wrong, while this situation highlights broader legal debates regarding the use of copyrighted material in AI training.
The investigation demonstrated a method to gauge AI training practices by assessing the model’s ability to accurately select verbatim excerpts from protected texts.
Collection
[
|
...
]