Anthropic bot crawlers feast on web content and give little back, a new ranking shows
Briefly

Anthropic bot crawlers feast on web content and give little back, a new ranking shows
"Since then, I've been looking for reliable data that shows this important and under-discussed part of the AI revolution. While tech companies spend lavishly on data centers, GPUs, and talent, they avoid talking about the other key ingredient of AI success: data. That's because they don't want to pay for the high-quality human data that's needed for AI model training, inference, and AI outputs. Instead, they send out bots to crawl websites and scoop up this information, mostly for free."
"In the past, tech companies would send users to the original sources of this information. This formed the grand bargain of the web. Sites would let their data be taken for free on the understanding that they would get referrals in return, and could pay for their efforts through advertising, subscriptions, and other techniques. In the new generative AI world, this deal is breaking down."
AI companies deploy web crawlers that scrape large volumes of website content while sending little referral traffic back to original sites. Tech firms invest heavily in infrastructure and talent but often avoid paying for human-created content, instead harvesting publicly available material to train and serve models. Historically, websites allowed content reuse in exchange for user referrals, enabling monetization through ads and subscriptions. Generative AI answer engines reduce visits to source sites by providing direct answers, breaking that bargain and shifting costs onto content creators. Cloudflare, which services roughly 20% of websites, has begun measuring crawl-to-referral ratios to quantify the resulting traffic and cost impacts.
Read at Business Insider
Unable to calculate read time
[
|
]