News outlets like NYT and USA Today are blocking the Internet Archive's Wayback Machine to prevent AI training models from using their content

"Major media outlets, including USA Today and the New York Times, are blocking the Internet Archive's Wayback Machine from saving web pages to prevent AI giants from training models on snapshots of old articles."

"Mark Graham, the director of the Wayback Machine, emphasizes that the digital archive has controls to limit abuse of AI automation and prevent large-scale data extraction."

"Publishers can archive their material, but a third party maintains a more incorruptible version of stories that can hold outlets accountable when it's revised after publication."

"Graham is reportedly in talks to regain access to the material, while more than 100 media workers signed a letter supporting Wayback."

The Internet Archive's Wayback Machine faces restrictions from 23 news organizations, including USA Today and the New York Times, blocking its web crawler. This action aims to prevent AI companies from using archived content to train language models. Mark Graham, director of the Wayback Machine, asserts that the archive has measures to limit AI abuse. Publishers can archive their content, but the Wayback Machine provides a more reliable version for accountability. Similar restrictions occurred last year with Reddit, and Graham is negotiating for renewed access while media workers support the archive.

#internet-archive #wayback-machine #ai-training #media-outlets #copyright-concerns

Read at Fortune

Unable to calculate read time

Collection

[

...

]