TechCrunch AI November 29, 2024 IDIOCRACY

Canadian publishers take OpenAI to court over ChatGPT training

A group of Canadian news and media companies has sued OpenAI, alleging copyright infringement and unjust enrichment. The case centers on whether journalism scraped from publisher websites can be used to train large language models without consent or payment.

WTF Index IDIOCRACY

◄ Terminator 0 Idiocracy 1 ►

The story mildly leans Idiocracy because it concerns generative AI extracting value from journalism, potentially undermining quality information production.

Canadian publishers take OpenAI to court over ChatGPT training

A group of Canadian news and media companies has filed a lawsuit against OpenAI, accusing the ChatGPT maker of using their journalism without permission to train its large language models. The companies are seeking monetary damages and a ban on further use of their work.

What the Canadian lawsuit alleges

The lawsuit was filed Friday by companies including the Toronto Star, the Canadian Broadcasting Corporation, the Globe and Mail, and others. Their central claim is that OpenAI infringed their copyrights and unjustly enriched itself at their expense.

According to the news companies, OpenAI used content scraped from their websites to train the large language models that power ChatGPT. They describe that content as “the product of immense time, effort, and cost on behalf of the News Media Companies and their journalists, editors, and staff.”

The complaint is not only about whether copyrighted material appeared somewhere in the training process. It is also about value. The publishers argue that their journalism required substantial labor and investment, and that OpenAI converted that work into part of a commercial AI product without consent or consideration.

In the suit, the companies wrote that “rather than seek to obtain the information legally, OpenAI has elected to brazenly misappropriate the News Media Companies’ valuable intellectual property and convert it for its own uses, including commercial uses, without consent or consideration.”

Why publisher consent is the core issue

The Canadian news companies are asking for two broad forms of relief: money and limits on future use. They want monetary damages, and they want OpenAI barred from making further use of their work.

That request makes the case about more than past training. It also raises a forward-looking question for generative AI: when a model is built using news articles, what rights should publishers have over the use of that material?

The companies say they have “never received from OpenAI any form of consideration, including payment, in exchange for OpenAI’s use of their Works.” That position places payment and permission at the center of the dispute.

At the same time, the lawsuit lands in a wider environment where OpenAI has made licensing deals with some publishers. The source article names The Associated Press, Axel Springer, and Le Monde as examples of publishers that have signed such deals with OpenAI.

For the Canadian plaintiffs, those deals may sharpen the contrast. Their argument is that OpenAI was capable of reaching arrangements with media organizations, but did not do so with them.

OpenAI’s response

OpenAI disputes the framing offered by the publishers. An OpenAI spokesperson said ChatGPT is used by “hundreds of millions of people around the world … to improve their daily lives, inspire creativity, and solve hard problems.”

The spokesperson also said OpenAI’s models are “trained on publicly available data, grounded in fair use and related international copyright principles that are fair for creators and support innovation.”

That response points to a different view of the same facts. The publishers focus on ownership, investment, consent, and compensation. OpenAI emphasizes publicly available data, fair use, international copyright principles, and the broader utility of ChatGPT.

OpenAI also says it works with publishers in product features. The spokesperson said, “We collaborate closely with news publishers, including in the display, attribution and links to their content in ChatGPT search, and offer them easy ways to opt-out should they so desire.”

That statement introduces another practical issue: whether opt-out tools and attribution are enough for publishers whose work may already have been used to build AI systems. The Canadian companies’ lawsuit suggests they do not see those measures as a substitute for prior consent or payment.

Part of a larger copyright fight

The Canadian case is not happening in isolation. OpenAI is already facing copyright lawsuits from The New York Times, New York Daily News, YouTube creators, and authors including comedian Sarah Silverman.

Taken together, these cases show how generative AI has become a major legal flashpoint for media, creators, and technology companies. The disputes revolve around a shared question: how should copyright apply when large language models are trained on vast amounts of existing content?

The source article also notes that the new lawsuit came shortly after Columbia University’s Tow Center for Digital Journalism published a study. That study found that “no publisher — regardless of degree of affiliation with OpenAI — was spared inaccurate representations of its content in ChatGPT.”

That finding adds another concern for news organizations. Their complaints are not limited to whether AI systems can learn from their work. They also extend to how their content may be represented when users interact with AI tools.

For publishers, accuracy, attribution, payment, and control are closely linked. If a chatbot draws on journalism, displays links, summarizes content, or represents a publication’s reporting, the publisher has a stake in how that process works.

What is at stake for AI and news

The lawsuit highlights a basic tension in the AI economy. News companies produce reporting through the work of journalists, editors, and staff. AI companies build tools that can generate answers, summaries, and other outputs from large language models trained on large bodies of data.

The Canadian publishers argue that their work should not become part of that system without permission or compensation. OpenAI argues that its models are trained on publicly available data under principles that support both creators and innovation.

The outcome of this lawsuit is not provided in the source article. What is clear is the legal demand: damages for alleged past use and a ban on future use of the publishers’ work by OpenAI.

For readers, the case matters because it sits at the intersection of journalism, copyright, ChatGPT, AI training, and the economics of online information. The court fight will test competing claims about who benefits from news content when it becomes training material for generative AI systems.