Authors Take Page from Anthropic in Alleging Apple Infringed Works by Training AI on Pirated Books
Briefly

Authors Take Page from Anthropic in Alleging Apple Infringed Works by Training AI on Pirated Books
"Taking their cue from the recent Bartz v. Anthropic saga, the authors of a neuroscience book and professors at the State University of New York filed a class action complaint on October 9 with the U.S. District Court for the Northern District of California, alleging that Apple Inc. committed mass copyright infringement by using pirated books to train its artificial intelligence systems."
"Apple infringed upon Martinez-Conde, Macknik, and Class members' copyrighted materials by reproducing their registered works without obtaining authorization to build databases of training materials, according to the filing. Central to the allegations was Apple's use of datasets containing Books3, described as a "notorious 'shadow library,' a dataset of pirated, copyrighted books." This dataset, derived from a private tracker called Bibliotik, contained approximately 196,640 books, including Martinez-Conde and Macknik's international bestseller, Sleights of Mind: What the Neuroscience of Magic Reveals About Our Everyday Deceptions."
"According to the lawsuit, Apple's own documentation indicated its use of the infringing materials. Apple's model card and GitHub repository for its Open Efficient Language Models (OpenELM) stated that the pre-training dataset included the Pile and a subset of RedPajama, which included Books3, a well-known component of the Pile, by a dataset curated by the research organization EleutherAI. Furthermore, the "Books" component of the RedPajama dataset was described as a direct copy of the Books3 dataset."
Plaintiffs Susana Martinez-Conde and Stephen Macknik filed a class action in the U.S. District Court for the Northern District of California on October 9, alleging Apple engaged in mass copyright infringement. The complaint asserts Apple built its Apple Intelligence platform, including OpenELM and Foundation Models, by reproducing copyrighted works without permission or compensation. The suit identifies the Books3 dataset, sourced from a private tracker called Bibliotik, as a central component, containing about 196,640 books including the plaintiffs' bestseller. The complaint cites Apple's model card and GitHub documentation showing use of the Pile and a RedPajama subset that included Books3.
[
|
]