Elsevier vs. Meta: first science publisher sues over scraped research papers

"Elsevier - which publishes thousands of journals, including Cell and The Lancet - was part of a class-action lawsuit filed on 5 May against technology company Meta and its chief executive Mark Zuckerberg in the Southern District of New York. Also named as plaintiffs on the lawsuit are book-publishing giants Hachette and Macmillan, and the US fiction author and lawyer Scott Turow. The publishers allege that Meta obtained and reproduced copyrighted works in developing its large language model (LLM) Llama."

"This case is the first AI action brought by major publishing houses, who have their own story to tell about Meta's flagrant violation of their rights,"

"The case mirrors those of authors and media companies - including The New York Times - suing AI firms on similar grounds.Some cases have been settled but, overall, they have yet to establish a clear precedent on whether it is legal to use copyrighted works to train an LLM. A Meta spokesperson has said the company would "fight this lawsuit aggressively"."

"To train Llama, the lawsuit alleges that Meta used the Common Crawl data set, a sample of billions of web pages made by trawling the Internet, which the plaintiffs say is likely to have included unauthorized copies of copyrighted works, such as scientific abstracts and paywalled papers. The publishers also allege that Meta downloaded and torrented (sourced using a file-sharing method) works from sites including LibGen, a database of books, research papers and textbooks; and Sci-Hub, a repository"

A scientific publisher joined other firms and individuals in suing artificial intelligence companies over alleged use of copyrighted works in training AI models. Elsevier, which publishes major journals, filed a class-action lawsuit on 5 May in the Southern District of New York against Meta and Mark Zuckerberg. Other plaintiffs include Hachette, Macmillan, and author-lawyer Scott Turow. The publishers allege Meta obtained and reproduced copyrighted works while developing the Llama large language model. The lawsuit parallels actions by authors and media companies, including The New York Times. Many cases have not yet established clear legal precedent on whether copyrighted works may be used for LLM training. Meta said it would fight the lawsuit aggressively. The complaint alleges use of Common Crawl data and downloading and torrenting from sites such as LibGen and Sci-Hub.

#ai-litigation #copyright-infringement #large-language-models #publishing-industry #training-data

Read at Nature

Unable to calculate read time

Collection

[

...

]

Elsevier vs. Meta: first science publisher sues over scraped research papersElsevier vs. Meta: first science publisher sues over scraped research papers Briefly

Elsevier vs. Meta: first science publisher sues over scraped research papers
Elsevier vs. Meta: first science publisher sues over scraped research papers
Briefly