The fight over the Wayback Machine has escalated as major news organisations block its crawler, raising concerns about control over digital history and the future of journalism accountability amid AI development fears.
The fight over the Internet Archive’s Wayback Machine has become a dispute about who gets to keep the public record. In an interview for CounterSpin aired on 17 April 2026, Fight for the Future’s Lia Holland said the archive has spent three decades preserving the web and has become a routine tool for reporters checking claims, tracking changes and recovering deleted material. She argued that the current backlash against the service is less about technical necessity than about control over history itself.
That tension sharpened after Wired reported that USA Today had used the Wayback Machine in a story about Immigration and Customs Enforcement while, at the same time, blocking the archive from preserving its own pages. According to the reporting discussed by Janine Jackson and Holland, other major outlets, including The New York Times, have also restricted the Internet Archive’s crawler. The result is an awkward contradiction: news organisations that benefit from archived evidence are also helping to limit the very system that makes that evidence available.
Mark Graham, the Internet Archive’s director, has said the concern is misplaced. Speaking to PC Gamer, he argued that fears about artificial intelligence should not be used to justify weakening web preservation, and that libraries and archives are not the problem. His point echoes a broader argument made by defenders of the Wayback Machine: if publishers worry about AI companies scraping material, the answer is not to erase the archive that journalists, researchers and the public use to verify what once existed online.
The scale of the dispute is growing. TechRadar reported that 23 major news sites were blocking the Wayback Machine’s crawler over fears that archived material could be used to train large language models without permission. Tom’s Hardware also reported that the list includes USA Today and The New York Times, and that the underlying anxiety is that AI firms may try to lean on archived pages as a route around copyright restrictions. Even so, the Internet Archive has maintained that it works with publishers and aims to preserve content respectfully, not undermine its commercial value.
For journalists, the stakes go well beyond one archive. Holland told CounterSpin that the Wayback Machine is relied on for accountability reporting, from labour disputes to government deletions. Fight for the Future said more than 100 journalists, including Rachel Maddow, Cory Doctorow and Ellen Nakashima, have signed a letter backing the archive’s role in preserving the public record. Their message is straightforward: if digital history can be edited out of existence, journalism loses one of its most important safeguards.
Source Reference Map
Inspired by headline at: [1]
Sources by paragraph: - Paragraph 1: [2], [6] - Paragraph 2: [1], [3], [4] - Paragraph 3: [2] - Paragraph 4: [3], [4], [5] - Paragraph 5: [1], [7]
Source: Noah Wire Services
Verification / Sources
- https://fair.org/home/the-wayback-machine-has-been-the-best-archive-for-preserving-our-digital-lives/ - Please view link - unable to able to access data
- https://www.pcgamer.com/hardware/preserving-the-web-is-not-the-problem-losing-it-is-claims-the-director-of-the-internet-archive/ - Mark Graham, director of the Internet Archive, responds to concerns about AI scraping, noting that major websites like Reddit, The New York Times, and The Guardian have blocked the Wayback Machine from archiving their content. He argues that while fears of AI misuse are understandable, blocking web archives threatens public access to reliable historical records and harms research and journalism. Graham emphasizes that libraries and preservation tools aren't the source of the problem, and preventing archival efforts could have serious unintended consequences.
- https://www.techradar.com/computing/internet/ai-could-mean-the-end-of-the-wayback-machine-as-news-websites-are-increasingly-blocking-it-to-prevent-content-scraping - A growing number of major news websites are blocking the Wayback Machine, a digital archive run by the non-profit Internet Archive, from preserving their content. This trend is driven by concerns over artificial intelligence (AI), specifically that archived material is being used to train large language models (LLMs) without permission, thereby violating copyright laws and creating competition for original content publishers. Notable outlets like The New York Times and USA Today are among the 23 sites restricting the Wayback Machine's web crawler, despite some having benefited from the archive in their own investigative reporting.
- https://www.tomshardware.com/tech-industry/big-tech/news-outlets-are-blocking-wayback-machine-from-archiving-their-pages-23-outlets-concerned-ai-companies-might-abuse-fair-use-and-use-it-to-train-their-models - As of April 2026, 23 major news publications, including USA Today and The New York Times, are blocking the Wayback Machine’s crawler, ia-archiverbot, from archiving their webpages. The primary concern is that AI companies could exploit archived content under fair use to train large language models (LLMs). This move has sparked debates about the implications for public access to historical records and accountability, as online articles are easily altered or deleted. While the legal system supports the Internet Archive's work as fair use—deeming it crucial for research and discovery—critics argue that blocking archives could harm societal access to information, particularly in an era of misinformation and AI hallucinations.
- https://www.morningbrew.com/stories/2026/04/15/news-orgs-are-raging-against-the-wayback-machine - Major media outlets are blocking the Internet Archive’s Wayback Machine from saving web pages to prevent AI giants from training models on snapshots of old articles. Wired reported that 23 news organizations, including USA Today and the New York Times, are among the 241 sites denying Internet Archive’s web crawler access to their articles. It’s not personal—some outlets still use the Archive in their reporting—it’s about the looming threat of AI: Tech companies can skirt copyright laws by using the Wayback Machine as a workaround for training language models on their content (including recipes, probably). Mark Graham, the director of the Wayback Machine, emphasizes that the digital archive has controls to limit abuse of AI automation and prevent large-scale data extraction.
- https://www.dw.com/en/digital-memory-at-stake-why-news-outlets-block-the-wayback-machine/a-76887853 - The 'Wayback Machine,' custodian of digital memory, is fighting for its survival. An increasing number of media outlets are refusing to allow the Web Archive to archive their content. For 30 years, the archive.org internet platform has been archiving digital content. The 'Wayback Machine' contains more than 1 billion archived web pages and is considered an indispensable tool for journalists, researchers, historians and lawyers who wish to view deleted or modified online content in its original form. However, this unique project instigated by a San-Francisco-based non-profit is facing an existential crisis — and the most recent threat comes from those, of all things, who need the archive most urgently: the media themselves.
- https://www.fightforthefuture.org/news/2026-04-13-100-journalists-applaud-the-internet-archives-role-in-preserving-the-public-record/ - Over 100 journalists including Rachel Maddow, Cory Doctorow, and Ellen Nakashima have signed a letter to the Internet Archive celebrating the Wayback Machine as a crucial resource for their work. The letter reads in part: 'We are thankful that the Internet Archive itself proactively partners with news organizations, and does not engage in paywall circumvention or irresponsible scraping. They value the work of journalists, and it shows in the care that they take to preserve it with integrity. We commend the Internet Archive for its commitment to preserving journalism for future generations. We welcome its continued work to ensure that today’s reporting remains available to tomorrow’s journalists, researchers, and the public. Preserving this record is essential to protecting journalism’s legacy.'
Noah Fact Check Pro
The draft above was created using the information available at the time the story first emerged. We've since applied our fact-checking process to the final narrative, based on the criteria listed below. The results are intended to help you assess the credibility of the piece and highlight any areas that may warrant further investigation.
Freshness check
Score: 8
Notes: The article references events up to April 2026, with the latest source dated April 22, 2026. (amp.dw.com) However, similar discussions about the Wayback Machine's challenges have been reported since February 2026. (pcgamer.com) This suggests the narrative has been evolving over several months, with the most recent developments included.
Quotes check
Score: 7
Notes: The article includes direct quotes from Lia Holland and Mark Graham. While these quotes are attributed to specific individuals, their earliest known usage cannot be independently verified through the provided sources. (pcgamer.com) The lack of verifiable sources for these quotes raises concerns about their authenticity.
Source reliability
Score: 6
Notes: The article cites sources such as PC Gamer and DW.com. (pcgamer.com) While these are established publications, the specific articles referenced are not accessible for direct verification. The reliance on these sources without direct access diminishes the overall reliability of the information presented.
Plausibility check
Score: 7
Notes: The claims about major news outlets blocking the Wayback Machine due to AI scraping concerns are plausible and align with reports from other reputable sources. (tomshardware.com) However, the article lacks specific details and supporting evidence, making it difficult to fully assess the accuracy of these claims.
Overall assessment
Verdict (FAIL, OPEN, PASS): FAIL
Confidence (LOW, MEDIUM, HIGH): MEDIUM
Summary: The article presents a narrative about the Wayback Machine's challenges, citing various sources and including direct quotes. However, the inability to independently verify these quotes and the reliance on inaccessible sources significantly undermine the article's credibility. (pcgamer.com)