For the past two years, my colleague Alex Reisner has investigated precisely how tech companies use massive data sets to train their LLMs. He has repeatedly found that so-called architects of AI have relied heavily on enormous databases of copyrighted work to create chatbots and other programs, and has also found that this work is generally taken without the consent or awareness of its creators: musicians, filmmakers, YouTubers, podcasters, illustrators, writers.
The price comparison tool within T-Mobile's T-Life app uses AT&T's password-protected software without permission, AT&T told a Texas federal judge on November 30. AT&T is asking for a temporary restraining order. AT&T is accusing T-Mobile of unauthorized scraping of AT&T customer data and says T-Mobile "violates several prohibitions in AT&T's publicly available Terms of Use." It sent a cease-and-desist order to T-Mobile on November 26, but T-Mobile has refused to comply.
The suit raises a host of questions. Is Perplexity's agent a rogue buyer with unacceptable security risks, or is Amazon bullying an insurgent competitor out of the game? Whose interests does a semi-autonomous AI agent represent, the customer or the agent's maker, and who is liable for its misconduct? The next iteration of AI may hang in the balance of the suit.
Ben Lee, chief legal officer at Reddit, told The Register in an emailed statement that AI companies are desperate for quality content generated by real people and that need is fueling an industrial scale data laundering economy. "Scrapers bypass technological protections to steal data, then sell it to clients hungry for training material," said Lee. "Reddit is a prime target because it's one of the largest and most dynamic collections of human conversation ever created."
We fully comply with the GDPR, ensuring your personal data is protected and handled transparently. We only collect publicly available information and you have rights to access, rectify, erase, and restrict processing of your data.
"I scraped millions of Google Maps restaurant reviews, and gave each reviewer's profile picture to an AI model that rates how hot they are out of 10," says San Francisco-based website creater Riley Walz. "This map shows how attractive each restaurant's clientele is. Red means hot, blue means not."
Reddit has charged Anthropic with training its AI, Claude, on user posts without consent, highlighting a shift towards monetizing access to digital content.