The Decoder September 6, 2025 IDIOCRACY

Authors challenge Apple AI training over alleged pirated books

Apple is facing a California lawsuit from authors Grady Hendrix and Jennifer Roberson over claims that their books were used to train AI models. The complaint points to the Books3 dataset, Applebot, OpenELM and Apple Intelligence, and seeks damages plus a court order blocking use of the authors’ works.

WTF Index IDIOCRACY

◄ Terminator 1 Idiocracy 2 ►

The story mildly leans Idiocracy because it centers on AI training allegedly built from pirated creative works, raising concerns about degraded respect for human authorship and quality.

Authors challenge Apple AI training over alleged pirated books

Apple is now part of the expanding copyright fight around AI training data. Authors Grady Hendrix and Jennifer Roberson have filed a lawsuit in California claiming the company violated their copyrights by using their books to train AI models.

The case centers on a direct question: what material can a technology company use when building AI systems such as OpenELM and Apple Intelligence? The authors say Apple crossed a line by relying on pirated books and copied web content.

What the authors allege

The lawsuit says Apple used the Books3 dataset, described in the complaint as a collection of more than 196,000 pirated books. According to the source article, that dataset includes works by both Grady Hendrix and Jennifer Roberson.

The complaint does not stop with Books3. It also accuses Apple of using Applebot, the company’s web crawler, to copy website content and pull material from so-called shadow libraries.

Those claims matter because they focus on the inputs behind AI models, not simply the outputs those systems produce. The authors are arguing that the training process itself involved copyrighted works they did not authorize Apple to use.

The AI systems named in the dispute

The lawsuit links the alleged use of copyrighted books to AI models including OpenELM and Apple Intelligence. The source article does not describe the technical training process in detail, but it identifies those systems as part of the authors’ complaint.

That makes the case significant for Apple’s AI strategy because the dispute is not framed as a narrow publishing issue. It is about whether the data used to build AI models included protected books and other copied material.

For readers, the important distinction is simple: the case is not just about whether an AI product can mention an author or imitate a genre. It is about whether the material used to create the model was gathered and used lawfully in the first place.

What the plaintiffs want

Grady Hendrix and Jennifer Roberson are seeking damages. They are also asking for a court order that would bar Apple from using their works.

Those two requests point to different stakes. Damages would address alleged past harm. A court order would focus on what Apple may or may not do with the authors’ works going forward.

The source article does not provide the amount of damages sought. It also does not say how Apple has responded to the complaint. Based on the facts available, the dispute remains centered on the allegations made by the authors in California.

Why Books3 is central

Books3 is central because the lawsuit identifies it as a collection of more than 196,000 pirated books. The complaint says that collection includes works by both authors.

In AI training disputes, datasets can become the key issue because they connect individual works to broader model development. If a dataset contains copyrighted books, the legal question turns to whether using that dataset for training was permitted.

The source article also says the complaint accuses Apple of using Applebot to copy website content and obtain material from shadow libraries. Taken together, the claims describe more than one alleged path by which copyrighted material may have entered Apple’s AI training pipeline.

Part of a wider copyright fight

The case follows a recent lawsuit against Anthropic that ended in a settlement after similar copyright claims. The source article does not give further details about that settlement, but the comparison shows that Apple’s case is part of a broader pattern of author challenges to AI training practices.

For the AI industry, these lawsuits keep returning to the same practical question: who controls the books, articles and web content used to build large AI systems? For authors, the concern is whether their work can be absorbed into training datasets without permission.

For Apple, the California lawsuit brings that question directly to models named in the complaint, including OpenELM and Apple Intelligence. Until the case moves further, the known facts are limited to the authors’ allegations, the datasets and tools named in the complaint, and the remedies the plaintiffs are seeking.