Anthropic's copyright settlement with US authors and publishers is more than a large payout. It is an early signal that the market for AI training data may be moving from vague legal arguments toward concrete prices, obligations, and limits.
The agreement, which has received preliminary approval from U.S. District Judge William Alsup, centers on allegations that Anthropic used around 500,000 copyrighted books without permission. The company has agreed to pay at least $1.5 billion to resolve the class action lawsuit.
A copyright case built around AI training data
The lawsuit accused Anthropic of “Napster-like” copyright infringement. According to a motion for preliminary approval filed in federal court in California on September 5, 2025, the case focused on the company's mass downloading and storage of books from pirate sites including Library Genesis (LibGen) and Pirate Library Mirror (PiLiMi).
Plaintiffs claimed that Anthropic trained its AI models on hundreds of thousands of works obtained illegally. The dispute therefore did not only ask whether copyrighted books can be used to train AI systems. It also focused on how those books were acquired in the first place.
That distinction matters. A court ruling in June 2025 said that training AI models with copyrighted books can sometimes count as fair use, but only if the books were obtained legally. Material from pirate sources like LibGen or PiLiMi did not receive that protection.
In July 2025, the federal court in San Francisco allowed the class action to proceed and described Anthropic's actions as “Napster-style” copyright infringement. The court made clear that fair use does not apply to pirated copies, even when the material is used for transformative AI training. Acquiring the material illegally was treated as a violation on its own.
What Anthropic agreed to pay
Under the settlement, Anthropic will pay at least $1.5 billion into a non-refundable fund. The payments will be spread across four installments over two years.
With about 500,000 works involved, the settlement averages out to $3,000 per book. If more titles are added, Anthropic will pay another $3,000 for each one.
The agreement is based on the number of works, not the number of claimants. If both an author and a publisher claim the same book, a working group from the Authors Guild and the Association of American Publishers will advise on how to divide the payment.
That structure gives the settlement a practical importance beyond this single lawsuit. There is still no standard price for licensing books or other copyrighted works for AI training. But a settlement of this size begins to create a reference point for future negotiations.
The source article also notes one comparison already in the market: Microsoft reached a licensing deal with HarperCollins at $5,000 per book for AI training, while Anthropic's settlement lands at around $3,000 per title.
What the settlement does not cover
The agreement has important limits. It only covers past infringements through August 25, 2025. It also specifically excludes claims related to AI-generated outputs, whether past or future.
Books that are not included in the final “Works List” are excluded as well. That means the settlement does not settle every possible copyright claim involving Anthropic, AI training, or books.
Anthropic must also delete all files sourced from LibGen and PiLiMi, along with any copies. That deletion must happen within 30 days after the settlement is finalized or after any court-ordered retention ends.
U.S. District Judge William Alsup has given preliminary approval to the $1.5 billion settlement. Alsup called the agreement “fair,” but said he will issue a final ruling only after affected authors are notified.
Plaintiffs Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson said Alsup's decision “brings us one step closer to real accountability for Anthropic and puts all AI companies on notice they can’t shortcut the law or override creators’ rights.”
Why AI companies are watching closely
For AI companies, the settlement points to a central business question: what is the price of training data when the data is copyrighted? The answer has been unclear because many AI systems were built during a period when scraping and large-scale data collection moved faster than licensing markets.
This case does not create a complete pricing system. It does, however, put a large dollar value on allegedly unauthorized use of copyrighted books from pirate sources. That could influence how publishers, authors, and AI companies approach future licensing deals.
The settlement also matters for fair use arguments. In fair use lawsuits, courts often consider whether the use of copyrighted material threatens a potential market for rights holders. As a licensing market becomes more established, it may become harder for AI companies to argue that using copyrighted works without permission has no market consequence.
At the same time, the legal boundary is not fully settled. The case makes pirate sources like LibGen and PiLiMi clearly risky for AI training. But the source article notes that a gray area remains around scraping publicly available web content without the consent of authors or site owners.
It is still unclear how much of that data, if any, will need to be licensed in the future. For now, the Anthropic settlement shows that the cost of AI training data can be real, large, and tied directly to how the material was obtained.