Here are the biggest misconceptions about AI content scraping
Briefly

AI bots, specifically those used for Retrieval Augmented Generation (RAG), are significantly increasing their scraping activities on publishers' sites, outpacing bots for training large language models. The growth rate for RAG scrapes was 49% compared to 18% for training scrapes from Q4 2024 to Q1 2025. RAG bots provide real-time, factual information for AI queries, which threatens publishers' traffic and revenue. Unlike training scrapes, which are static, RAG scrapes represent a continuous demand, creating both challenges and potential opportunities for publishers.
From Q4 2024 to Q1 2025, bot scrapes used for Retrieval Augmented Generation, or RAG, per site grew 49%. That is nearly 2.5 times the rate of training bot scrapes.
Training scrapes are "one-and-done... to feed a model's general knowledge." RAG scrapes, on the other hand, are continuous and have compounding value.
RAG AI bots, or agents, retrieve factual, current information in real-time. They respond to user prompts in AI products like Perplexity and ChatGPT.
Responses include links or citations to the original sources, such as publishers' sites. RAG can surface and summarize articles without storing them in training data.
Read at Digiday
[
|
]