The Decoder March 7, 2025 IDIOCRACY

AI search engines still stumble over news citations

A Columbia University's Tow Center for Digital Journalism study found that AI search engines frequently fail to identify news sources correctly. The problem affected free and paid tools, publisher partnerships, URLs, syndication, and Robots Exclusion Protocol settings.

WTF Index IDIOCRACY

◄ Terminator 1 Idiocracy 3 ►

The story mainly shows AI search eroding trust and information quality through widespread citation errors, not increasing physical danger or autonomy.

AI search engines still stumble over news citations

AI search engines are becoming a routine way to look for information, but a new study raises a basic problem for news: these systems often struggle to say where an article came from.

Research from Columbia University's Tow Center for Digital Journalism tested eight AI search engines, including ChatGPT, Perplexity, and Google Gemini. The researchers asked the systems to identify headlines, sources, publication dates, and URLs from random news articles. According to the study, more than 60% of queries received incorrect answers.

Why news attribution matters

For news, attribution is not a small formatting detail. A reader needs to know which outlet reported a story, when it was published, and where the original article can be found. Without that information, an AI search result can make it harder to evaluate reliability, follow reporting back to its source, or give the publisher proper credit.

The study comes as nearly 25% of Americans now use AI search engines instead of traditional tools, according to recent data cited in the source article. That shift makes the accuracy of AI-generated citations more important. If users rely on these systems as a gateway to news, flawed attribution can shape where attention, traffic, and trust go.

The Tow Center's test focused on practical details that should be straightforward for a search product: title, source, date, and URL. Yet the results showed that even this basic retrieval task can break down across major services.

The best result still had a high error rate

Among the systems tested, Perplexity performed best, but still had a 37% error rate. At the other end, Grok 3 misattributed 94% of citations.

The overall result was broader than one weak tool. More than 60% of queries across the study produced incorrect answers. That means the issue was not limited to a single company, product design, or model behavior described in the source article.

The errors also took different forms. Some systems misidentified where articles came from. Others pointed users toward syndicated versions rather than originals. In more than half of cases, Grok 3 and Google Gemini created URLs that did not exist.

For users, a fabricated URL is especially damaging because it can look like a precise citation while leading nowhere. A wrong source can be just as confusing, because it may shift credit from the publisher that produced the article to another platform that carried or republished it.

Paid tools were not a simple fix

The study found an unexpected pattern: paid services such as Perplexity Pro and Grok 3 performed worse than their free counterparts. The source article says these services tried to answer more queries, but were more likely to give incorrect information instead of saying they did not know.

That distinction matters. In news search, refusing to answer can be more useful than confidently returning a false citation. A user can work around uncertainty; a wrong answer can send them in the wrong direction.

The finding also complicates the idea that paying for a more advanced AI search product automatically means better news attribution. Based on the study described in the source, a tool's willingness to answer did not necessarily match its ability to answer correctly.

Publisher agreements did not solve the issue

The study also looked at a question publishers care about: whether formal deals with AI companies improve attribution. The answer, according to the source article, was not reassuring.

Despite Hearst's agreement with OpenAI, ChatGPT correctly identified only one in ten San Francisco Chronicle articles. Perplexity frequently cited syndicated versions of Texas Tribune articles instead of the originals.

Those examples show why licensing or partnership agreements are not the same as accurate attribution in the product experience. A system may have a formal relationship with a publisher and still fail to point users to that publisher's own article in a reliable way.

The study also found that AI search engines often directed users to syndication platforms like Yahoo News rather than original sources. For publishers, that creates a visibility problem. The article may exist in the AI system's answer, but the connection to the outlet that produced it can be weakened or lost.

Robots settings and wider concerns

The source article says several systems ignored publishers' Robots Exclusion Protocol settings. One example was Perplexity accessing National Geographic content even though the publisher explicitly blocked its crawlers.

That adds another layer to the attribution problem. The issue is not only whether an AI search engine cites a publisher correctly after using its content. It is also whether the system respects the publisher's stated access preferences in the first place.

Time Magazine's COO Mark Howard said AI companies are working to improve their systems, but warned against assuming today's free products are perfectly accurate.

"If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."

The source article also notes that a separate BBC study in February identified similar problems with AI assistants handling news queries, including factual errors and poor sourcing.

Taken together, the studies described in the source point to a clear caution: AI search engines may be useful for discovery, but their news citations should not be treated as automatically reliable. When the subject is journalism, the details of attribution are part of the information itself.