Publishers challenge Cohere over AI training and fake news

Condé Nast and other major publishers allege Cohere used their journalism without permission to train and run its AI service. The lawsuit also says Cohere generated fabricated articles attributed to publishers, while Cohere calls the case misguided and frivolous.

WTF Index IDIOCRACY
◄ Terminator 1 Idiocracy 3 ►

The story mainly highlights AI eroding truth and quality by allegedly fabricating and misattributing news content, with a secondary concern about unauthorized training use.

Publishers challenge Cohere over AI training and fake news

A group of major news organizations is taking Cohere to court over how the AI startup allegedly used journalism in its products. The case centers on copyright, trademarks, AI training, real-time outputs, and the growing market for licensing publisher content to artificial intelligence companies.

What the publishers allege

Condé Nast and several other media companies sued Cohere in US District Court for the Southern District of New York, accusing the company of “systematic copyright and trademark infringement.” Their complaint says Cohere used scraped copies of articles without permission or payment across training, real-time use, and outputs.

The publishers argue that this use directly supports Cohere’s AI service while competing with publisher offerings and the emerging market for AI licensing. They also claim the service sometimes presents fabricated material as if it came from established news brands.

Condé Nast, which owns Ars Technica, Wired, and The New Yorker, is joined in the lawsuit by The Atlantic, Forbes, The Guardian, Insider, the Los Angeles Times, McClatchy, Newsday, The Plain Dealer, Politico, The Republican, the Toronto Star, and Vox Media.

The complaint seeks statutory damages of up to $150,000 under the Copyright Act for each infringed work. It also seeks damages tied to trademark infringement and “false designations of origin.”

Why the outputs matter

The lawsuit is not limited to the claim that articles were used to train a model. The publishers also say Cohere’s system produced outputs that copied, closely summarized, or misattributed their work.

In Exhibit A, the plaintiffs listed over 4,000 articles as an “illustrative and non-exhaustive list of works that Cohere has infringed.” Other exhibits, according to the source article, include responses to queries and alleged “hallucinations” that the publishers say infringe copyrights and trademarks.

The complaint says Cohere “passes off its own hallucinated articles as articles from Publishers.” In plain terms, the publishers are arguing that an AI system can harm them in two related ways: by using their reporting as raw material without authorization, and by attaching their brands to material they did not publish.

The requested remedies reflect both concerns. The lawsuit asks for an order requiring Cohere to destroy infringing copies of copyrighted works. It also asks that Cohere install a filter or other technology to stop the system “from retrieving or copying Publishers copyrighted works, whether from Publishers’ websites or other locations.”

Cohere’s response

Cohere rejected the complaint in a statement provided to Ars, calling the lawsuit frivolous. The company said it stands by its practices for training enterprise AI and said it has prioritized controls intended to reduce the risk of IP infringement.

The company also said it would have welcomed a conversation about the publishers’ concerns before learning about them in a filing. Cohere said it expects the matter to be resolved in its favor.

Cohere offers AI products for businesses in areas including financial services, health care and life sciences, manufacturing, energy and utilities, and the public sector. The company says its investors include Salesforce, Oracle, Nvidia, SAP, Fujitsu, and AMD, and its customers include Notion and Oracle.

The source article also notes that Cohere was valued at $5.5 billion in a recent funding round. A News/Media Alliance press release described Cohere as “an AI company valued at over $5 billion.”

The licensing fight behind the case

The dispute sits inside a broader commercial question: who gets paid when journalism is used to build or operate generative AI systems? The publishers involved in the suit have licensed content to other AI companies, such as OpenAI. At the same time, OpenAI faces a lawsuit from The New York Times over alleged use of news articles without permission, and that case is proceeding through discovery.

Condé Nast CEO Roger Lynch told staff that the lawsuit against Cohere “is a first for our industry, coming together to protect our rights and assert that creative and journalistic work cannot be taken without permission or fair compensation.”

Vox Media President Pam Wasserstein said the case aims to create a legal precedent and “establish the terms of the playing field for licensed use of journalism for AI, including for training and also real-time uses,” according to The Wall Street Journal.

The publishers are also pointing to Cohere’s own positioning. Cohere, based in Toronto, markets itself as business-friendly AI and has run an advertisement saying it is not just an “ordinary AI.” The lawsuit says that instead of licensing the content it uses, Cohere “helps itself to unlicensed copies of Publishers’ news and magazine articles to build a training dataset.”

A fabricated Guardian example

The complaint describes one example involving The Guardian. The Guardian published an article on October 7, 2024 titled “‘The pain will never leave’: Nova massacre survivors return to site one year on.’”

According to the lawsuit, when prompted for that piece with RAG [Retrieval-Augmented Generation] turned off, Cohere produced an inaccurate article that it represented was “published on June 29 2022 in The Guardian by Luke Harding.” The complaint says the output confused the October 7, 2023 massacre at The Nova Music Festival with a mass shooting in Nova Scotia, Canada in 2020.

The lawsuit also says Cohere manufactured details about the Nova Scotia tragedy and attributed several quotes, including material gathered in The Guardian’s reporting, to Tom Bagley, who was murdered in the 2020 shootings. The complaint says that fictional article never appeared in The Guardian.

That example captures why the case goes beyond copying alone. The publishers are framing the alleged conduct as a threat to their rights, their brands, and public trust in what their newsrooms actually published.