#ai-data-scraping

[ follow ]
Digital life
fromFast Company
7 hours ago

The Internet Archive at 30: Can the web's memory bank withstand the AI era?

The Internet Archive preserves vast web and digital history, but rising costs and access restrictions threaten its ability to keep collecting and serving it.
Intellectual property law
fromFuturism
6 months ago

Perplexity Just Got Caught Breaking the Rules Red-Handed

Companies plant fake content (mountweazels) to detect unauthorized scraping; Reddit used a Google-crawl-only test post to catch Perplexity displaying scraped content.
Tech industry
fromBusiness Insider
8 months ago

Anthropic bot crawlers feast on web content and give little back, a new ranking shows

AI companies heavily crawl websites for training data while returning minimal referral traffic, undermining the web's traditional data-for-traffic exchange.
[ Load more ]