
"If you're in the business of publishing content on the internet, it's been difficult to know how to deal with AI. Obviously, you can't ignore it; large language models (LLMs) and AI search engines are here, and they ingest your content and summarize it for their users, killing valuable traffic to your site. Plenty of data supports this. Creating a content strategy that accounts for this changing reality is complex to begin with."
"That would be hard even if there were clear rules that everyone's operating under. But that is far from a given in the AI world. A topic I've revisited more than once is how tech and media view some aspects of the ecosystem differently (most notably, user agents), leading to new industry alliances, myriad lawsuits, and several angry blog posts. But even accounting for that, a pair of recent reports suggest the two sides are even further apart than you might think."
Publishers face traffic loss because large language models and AI search engines ingest and summarize website content for users. Decisions about which content to expose, block, or optimize for AI must balance business goals. Tech and media disagree on key ecosystem definitions such as user agents, fueling alliances, lawsuits, and angry public statements. Common Crawl provides large-scale web data used to train models including GPT-3.5, and publishers have requested deletion of their content to prevent training use. Common Crawl kept requested content in its archive while making it invisible to its online search tool, causing spot checks to come up empty.
Read at Fast Company
Unable to calculate read time
Collection
[
|
...
]