3 min read

News Publishers Are Limiting the Wayback Machine

News Publishers Are Limiting the Wayback Machine
News Publishers Are Limiting the Wayback Machine
6:42

The New York Times, The Guardian, and USA Today have begun restricting the Internet Archive's Wayback Machine from preserving their articles. The stated reason: fear that AI companies will scrape the archive to train commercial models. The response from journalists and digital rights organizations: you are burning the public record to fight a battle that belongs in court.

A petition organized by digital rights nonprofit Fight for the Future has collected signatures from over 120 journalists, including Cory Doctorow, Taylor Lorenz, and Ron Suskind. Rachel Maddow called the Internet Archive "a national treasure." The Electronic Frontier Foundation issued a formal call for publishers to lift their restrictions and instead pursue AI companies through litigation.

The confrontation is real, the stakes are significant, and the outcome is not settled.

What Publishers Actually Did — and Why

The restrictions began quietly. The New York Times implemented a hard block on Wayback Machine access late last year. The Guardian and USA Today followed with their own limitations. Reporting by Nieman Lab in January 2026 broke the story publicly, revealing that the decisions were driven by publishers' concern that the Archive's free library of webpage snapshots could be scraped by AI companies to train their models without compensation.

The concern is not irrational. AI training datasets have been built, in significant part, from publicly accessible web content — including journalism. Publishers arguing that their work has been used to build commercial AI products without consent or payment have a legitimate grievance. Several major publishers have already filed or are pursuing copyright litigation against AI companies on exactly these grounds.

The question is whether restricting the Wayback Machine is an effective or proportionate response to that grievance.

What the Wayback Machine Actually Does — and Who Uses It

The Internet Archive's Wayback Machine has been preserving snapshots of web pages since 1996. It holds over 900 billion pages. Wikipedia links to more than 2.6 million news articles preserved by the Archive across 249 languages. Researchers, journalists, lawyers, historians, and ordinary citizens use it daily to access content that has been edited, paywalled, deleted, or simply lost to link rot.

The use cases are not abstract. Brishti Basu, a reporter at PressProgress who signed the petition, described using the Wayback Machine to prove that the Vancouver Police Department had edited a press release after she published a critical article — and then publicly accused her of fabricating information. The archived original version was her evidence. Without it, her account of events had no documentary support.

This is what the Wayback Machine does that nothing else does: it creates an immutable record of what was published, when, and in what form. Edits, retractions, corrections, and deletions all become visible against the archived original. For accountability journalism, that function is foundational.

The EFF's Argument: Take AI Companies to Court, Not the Archive

The Electronic Frontier Foundation's position, articulated by senior policy analyst Joe Mullin, draws a clear distinction between the legitimate legal dispute over AI training data and the collateral damage of restricting public archives.

His argument: disputes over AI training must be resolved in court, not by removing journalism from the public record. The two things are not the same fight. AI companies that scraped copyrighted content without authorization can be sued. The Wayback Machine — a nonprofit library that preserves the public record — is not the party that owes publishers compensation.

Mark Graham, director of the Wayback Machine, made the same argument in a February op-ed: "Whatever legitimate concerns people may have about generative AI, libraries are not the problem, and blocking access to web archives is not the solution."

The counterpoint — that restricting archive access reduces the surface area available for scraping — conflates the mechanism of harm with its source. The AI companies doing the scraping are the source. Dismantling public infrastructure to make scraping harder is, as the EFF puts it, a potentially irreversible sacrifice of the public record.

Why This Is an AI Story as Much as a Journalism Story

The news publishers' decision sits at the intersection of two forces that are reshaping information: the AI industry's voracious appetite for training data and the fragility of the open web as a commons.

Publishers are not wrong that their content has been used to train commercial AI products. They are not wrong that this represents a significant economic and legal challenge. But the response of restricting public archiving treats a symptom while potentially causing greater harm. If the archive is degraded — if decades of journalism become inaccessible because publishers chose restriction over litigation — the loss is permanent and asymmetric. The AI companies have already ingested what they ingested. The public loses the record.

For those of us tracking how AI is reshaping media, marketing, and the information ecosystem, this conflict is a clear signal that the industry's rapid scaling has produced externalities that are only now becoming visible. The cost of building AI on top of the open web is being distributed across institutions — libraries, archives, newsrooms — that were not party to the original transaction.

What It Means for Marketers and Content Professionals

For marketing and content teams, the practical implication is worth noting: the open web as a research resource is not as stable as it appears. Articles get edited. Sites go dark. Archives get blocked. The tools you rely on for competitive research, content verification, and historical reference operate on infrastructure that is increasingly contested.

More broadly, the tension between AI's training data needs and the public record is a precursor to the regulatory and legal frameworks that will eventually govern how AI companies can use published content. Those frameworks will affect content strategy, intellectual property, and the economics of content creation. Staying informed matters.

At Winsome Marketing, we help growth teams navigate the AI-driven shifts that are reshaping how content works — from strategy to execution. If you want a clear-eyed view of what these changes mean for your business, let's talk.