Reddit Dubs Perplexity AI and Data Scraping Companies 'Would-Be Bank Robbers'
Briefly

Reddit Dubs Perplexity AI and Data Scraping Companies 'Would-Be Bank Robbers'
"Reddit filed a lawsuit yesterday against artificial intelligence (AI) company Perplexity AI and three other defendants for their alleged illegal circumvention of Reddit security measures meant to protect misuse of its content and data. Reddit, which describes itself in the complaint as "one of the largest repositories of human conversation in existence," likened the actions of Oxylabs UAB, AWMProxy, and SerpApi to those of "would-be bank robbers.""
"Perplexity AI, meanwhile, refused to enter into an agreement with Reddit and is a customer of SerpApi, and allegedly was caught "red-handed by using the digital equivalent of marked bills...to track Reddit data and confirm that Perplexity was using Reddit data acquired through the scraping of Google SERPs," according to the complaint. Reddit said it sent Perplexity a cease-and-desist letter but that Perplexity subsequently only increased its use of Reddit data "forty-fold," the lawsuit added."
"Reddit charges that each of the defendants is profiting by "evading technological control measures to access Reddit data it knows it does not have permission to access or use." Because Reddit has over 100 million active users per day, its data is "widely seen as invaluable to AI companies" and "is particularly well-suited to training" large language models (LLMs) because it is constantly growing."
Reddit alleges Perplexity AI, Oxylabs UAB, AWMProxy, and SerpApi illegally circumvented Reddit's anti-scraping measures to access and misuse Reddit content and data. The filings state the defendants developed tools that bypass Google's and Reddit's protections and scraped Reddit content from Google search results. Reddit alleges Perplexity refused a licensing agreement, used SerpApi as a supplier, and increased its use of Reddit data after a cease-and-desist. Reddit asserts the defendants profit by evading technological controls to obtain data without permission. Reddit emphasizes its large, constantly growing user base and characterizes its data as especially valuable for training large language models.
[
|
]