
"A growing number of websites are taking steps to ban AI bot traffic so that their work isn't used as training data and their servers aren't overwhelmed by non-human users. However, some companies are ignoring the bans and scraping anyway. Online traffic analysis conducted by BuiltWith, a web metrics biz, indicates that the number of publishers trying to prevent AI bots from scraping content for use in model training has surged since July."
"About 5.6 million websites presently have added OpenAI's GPTBot to the disallow list in their robots.txt file, up from about 3.3 million at the start of July 2025. That's an increase of almost 70 percent. Websites can signal to visiting crawlers whether they allow automated requests to harvest information through entries in their robots.txt files. Compliance with these directives is voluntary, but repeated failure to respect these rules may come up in litigation, as it did in Reddit's scraping lawsuit against Anthropic earlier this year."
"The situation is similar for AppleBot, now blocked at about 5.8 million websites, up from about 3.2 million in July. Even GoogleBot - which indexes data for search - faces growing resistance, perhaps because it's also used for the AI Overviews now surfaced atop search results. BuiltWith reports that 18 million sites now ban the bot, which would also mean that those sites could not be indexed in Google Search."
Since July, the number of websites adding AI crawlers to robots.txt disallow lists has surged, with about 5.6 million sites blocking OpenAI's GPTBot, up from 3.3 million. Anthropic's ClaudeBot and Claude-SearchBot face similar block rates, with ClaudeBot blocked at about 5.8 million sites. AppleBot is blocked at roughly 5.8 million sites. GoogleBot is banned by about 18 million sites, potentially excluding those sites from Google Search. Robots.txt directives signal crawler permissions but rely on voluntary compliance, and repeated disregard can prompt litigation as seen in Reddit's suit against Anthropic. About half of news sites blocked GPTBot as of July.
Read at Theregister
Unable to calculate read time
Collection
[
|
...
]