
"In January, Nieman Lab broke the story that major news publishers - including The New York Times, The Guardian, and USA Today Co. - had started blocking the Internet Archive due to concerns that AI companies might scrape the nonprofit's repositories for training data. No news publisher has confirmed to Nieman Lab that an AI company has already scraped their content from the Wayback Machine. Still, in the five months since we published our story the number of news sites blocking the Internet Archive has continued to rise."
"Our new analysis shows that more than 340 local news sites across the United States are now limiting the Internet Archive's ability to access and preserve their stories. Many sites in our sample are owned by five of the seven largest local news publishers in the country: USA Today Co., McClatchy, Advance Local, MediaNews Group, and Tribune Publishing. The latter two are both subsidiaries of the " vulture hedge fund " Alden Global Capital."
"Researchers, historians, and citizens around the world rely on the web archives of local news sites to do their work. "Blocking the Internet Archive's web crawlers threatens one of the most effective ways that we capture and store news content for the long term," Edward McCain, a journalism librarian at the University of Missouri, said. "In the present we may have some workarounds, but in the long run, it weakens a vital link in primary source materials that we need to understand where we've been and where we want to go.""
"Working journalists are among the most frequent users of the Wayback Machine's local news archives. Over the last month, online petitions have called for news media companies to allow the Internet Archive to preserve their journalism. "I cover news within a larger news desert in New York's Rockland, Sullivan, and Rockland counties. This means I need to heavily rely on archival data of old news articles from now"
Major news publishers began blocking the Internet Archive due to concerns that AI companies might scrape archived material for training data. No publisher has confirmed that scraping has already occurred, but the number of blocked sites has continued to grow. The increase is concentrated among local news outlets. Analysis shows more than 340 local news sites in the United States limit the Internet Archive’s ability to access and preserve their stories. Many of these sites are owned by large local news publishers, including companies tied to Alden Global Capital. Researchers, historians, and citizens rely on web archives for long-term access to local reporting. Blocking web crawlers threatens a key method for capturing and storing news content, and working journalists depend on archived articles in areas with limited local coverage.
Read at Nieman Lab
Unable to calculate read time
Collection
[
|
...
]