The Decoder February 12, 2025 NEUTRAL

Why one AI fair use loss may not settle OpenAI’s cases

A US federal court rejected Ross Intelligence's fair use defense after finding that its use of Westlaw legal summaries violated copyright law. The ruling matters for AI copyright disputes, but the court also drew a line between Ross' non-generative system and large language models.

WTF Index NEUTRAL

◄ Terminator 0 Idiocracy 0 ►

This is mainly a copyright and fair use legal development, not a clear signal of AI danger or societal deskilling.

Why one AI fair use loss may not settle OpenAI’s cases

A US federal court has rejected Ross Intelligence's fair use defense in a copyright dispute over AI training data. The decision is important because it shows how courts may examine copyrighted material used to build AI systems, but it does not automatically answer the bigger questions around OpenAI and other generative AI companies.

The key reason is that Ross' system, as described in the ruling, was not treated as the same kind of tool as a large language model. The court focused on how Ross used Thomson Reuters' Westlaw material, what market it entered, and whether its product competed with the original service.

What the court found in the Ross case

Ross Intelligence had obtained roughly 25,000 legal summaries from Thomson Reuters' Westlaw database through indirect means. It then converted those summaries into training data for an AI system.

The court evaluated the use under four traditional fair use factors: the purpose and character of the use, the nature of the copyrighted work, the amount and significance of the portion used, and the effect on potential market value. Ross failed on all four factors.

Two points were especially damaging. The court found that Ross' use was commercial, and it also found a lack of "transformative" value. In plain terms, Ross was not merely using the material to build a broad-purpose system; it was creating a product that operated in the same legal research space as Thomson Reuters' system.

The way the material was processed did not save the defense. Ross had converted legal summaries into numerical data about word relationships, but the court still looked at the practical result: a competing legal research product.

Why market impact mattered

The market effect factor played a central role in the ruling. The court considered both the existing legal research market and a potential market for AI training data.

That potential training-data market matters because it frames copyrighted content as something that may have commercial value beyond its original publication. The source article notes that some major AI labs already purchase training data, which can be read as an acknowledgment that such data has value outside a fair use claim.

The burden also mattered. The ruling placed the burden of proof on Ross to show that these markets would not be affected. That detail could become relevant in future AI copyright cases, especially where a company argues that training use does not harm the rightsholder's business.

Still, the court's reasoning was tied closely to the facts in front of it. Ross was using legal summaries connected to a legal research database and building a system that returned existing court decisions. That made the competitive overlap unusually direct.

Why LLMs may be treated differently

The court explicitly stated that the ruling applies only to non-generative AI. That distinction is central to understanding why the case may have limited implications for OpenAI and other companies building large language models.

Language models are described differently in the fair use arguments made by OpenAI and other AI labs. Their position is that training data is used to develop general language, coding, musical, artistic, or other skills, rather than to reproduce or compete with the original content itself.

The source article compares this to a student learning from a textbook. The AI system "views" the data during training, while the final model does not simply contain copied training material in the same form. At the same time, the source also notes an important practical point: the data must be stored, at least temporarily, for training to work.

That difference does not guarantee a win for generative AI companies. It does mean the Ross decision is not a simple template for every AI copyright dispute. A system that returns existing legal decisions from a legal research workflow presents a different question from a chatbot that creates new content.

Future AI copyright cases remain unsettled

Courts appear to be moving case by case. A single, definitive ruling on fair use for generative AI training data is not expected anytime soon. Instead, different precedents are likely to develop as courts examine different systems and different uses of copyrighted material.

The Ross dispute was relatively straightforward because one legal database was competing directly with another. Other cases raise harder questions:

Is a chatbot competing with a news website?
Does an AI music generator compete with human musicians?
When does AI-generated output become a substitute for the original work?

The source article points to a recent lawsuit by Raw Story and AlterNet against OpenAI as an example of that complexity. In that case, the judge dismissed the lawsuit and accepted OpenAI's fair use defense that ChatGPT creates new content rather than copying articles directly, while facts themselves are not protected by copyright.

For Ross Intelligence, however, the result was severe. The company shut down in 2021 after it was unable to raise enough funds to continue operating while fighting what it called an "unfounded lawsuit."

The broader lesson is narrower than the headline might suggest. The court rejected one AI company's fair use defense on specific facts involving Westlaw summaries, a legal research market, and a non-generative system. The larger fight over AI training, copyright, and generative models remains open.