Anthropic reached a proposed class settlement with three authors who alleged that their pirated works were used to train the Claude family of large language models. A judge previously found Anthropic's legal purchases of books for training to be fair use but ruled that some tactics, including use of LibGen, amounted to piracy. The company negotiated to forgo a trial that would have determined damages. The case underscores tensions around where training data originates, how much copyrighted material AI models use, and the risks AI firms face when compiling large training libraries.
Also: Claude wins high praise from a Supreme Court justice - is AI's legal losing streak over? The writers claimed that Anthropic used the authors' pirated works to train Claude, its family of large language models (LLMs), on prompt generation. The AI startup negotiated a "proposed class settlement," Anthropic announced Tuesday , to forgo a trial determining how much it would owe for the infringement.
The preliminary settlement's details are scarce. In June, a judge ruled that Anthropic's legal purchase of books to train its chatbot was fair use -- that is, free to use without payment or permission from the copyright holder. However, some of Anthropic's tactics, like using a website called LibGen, constituted piracy, the judge ruled. Anthropic could have been forced to pay over $1 trillion in damages over piracy claims, .
The settlement highlights one of the many dilemmas AI companies face as they train their models on material for prompt generation and query responses. To offer up succinct, helpful responses to a user, an AI chatbot must be trained on a multitude of data. GPT-4, for example, was trained on 1 trillion data parameters. Anthropic, on the other hand, is said to have accumulated a library of over 7 million works to train Claude, according to Wired's report.
Collection
[
|
...
]