OpenAI's models 'memorized' copyrighted content, new study suggests

from TechCrunch 3 months ago

A recent study indicates that OpenAI might have trained its AI models using copyrighted materials without permission, lending support to ongoing lawsuits from various rights-holders. OpenAI maintains a fair use defense, but the plaintiffs argue there is no legal allowance in U.S. copyright law for such training practices. Researchers from the University of Washington, University of Copenhagen, and Stanford developed a method using 'high-surprisal' words to identify memorized content in OpenAI's models. Their findings raise questions about the models' learning processes and the implications of using copyrighted data in AI training.

The study co-authored by researchers from major universities suggests OpenAIâs AI models may have memorized copyrighted content during their training process.

High-surprisal words, statistically less likely to appear in context, were used to assess whether models like GPT-4 recalled specific training data.

Read at TechCrunch

#openai #copyright #ai-training #fair-use #legal-issues

Collection

[

...

]

OpenAI's models 'memorized' copyrighted content, new study suggests | TechCrunchOpenAI's models 'memorized' copyrighted content, new study suggests | TechCrunch Briefly

OpenAI's models 'memorized' copyrighted content, new study suggests | TechCrunch
OpenAI's models 'memorized' copyrighted content, new study suggests | TechCrunch
Briefly