
"HTML pages contain navigation, styling and scripts that add little semantic value for LLMs. A simple Markdown heading costs roughly three tokens, but the equivalent HTML markup uses 12-15 tokens. The company says a blog post that requires 16 180 tokens in HTML shrinks to about 3 150 tokens when converted to Markdown."
"Publishers can insert three signals: search, ai‑input and ai‑train into robots.txt comments to declare whether their content may be indexed, used as real‑time AI input or included in model training. A 'yes' allows a use, 'no' forbids it, and absence expresses no preference. Cloudflare acknowledges that the signals are merely preferences, not enforceable rules."
Cloudflare launched 'Markdown for Agents', enabling AI crawlers to request Markdown versions of web pages through the Accept: text/markdown header, significantly reducing token consumption compared to HTML. A blog post requiring 16,180 tokens in HTML format shrinks to approximately 3,150 tokens in Markdown. The company simultaneously proposed 'Content Signals', a mechanism allowing publishers to declare content usage permissions through robots.txt comments using three signals: search, ai-input, and ai-train. Publishers can specify 'yes' to allow usage, 'no' to forbid it, or leave it absent to express no preference. Cloudflare acknowledges these signals are preferences rather than enforceable rules and notes that Markdown responses currently default to permitting all uses.
Read at InfoQ
Unable to calculate read time
Collection
[
|
...
]