The Decoder February 15, 2025 IDIOCRACY

Why AI assistants still stumble on basic news fact-checking

A BBC study found significant accuracy and reliability problems in AI assistants answering current news questions. The issues included factual errors, fabricated information, weak source attribution, missing context, and confusion between fact and opinion.

WTF Index IDIOCRACY

◄ Terminator 1 Idiocracy 4 ►

The story centers on AI assistants degrading news accuracy, trust, source attribution, and public understanding rather than becoming autonomous or dangerous.

Why AI assistants still stumble on basic news fact-checking

AI assistants are increasingly used as quick guides to current events, but a BBC study found that leading systems still struggle with basic news accuracy. The evaluation tested ChatGPT, Microsoft Copilot, Google Gemini, and Perplexity on current news questions and found problems that could mislead readers.

What the BBC tested

In December 2024, 45 BBC journalists reviewed how the AI systems handled 100 current news questions. The evaluation looked beyond whether an answer sounded plausible. It assessed whether the response was accurate, properly sourced, impartial, and clear about the difference between facts and opinions.

The journalists assessed responses across seven key areas: accuracy, source attribution, impartiality, fact-opinion separation, commentary, context, and proper handling of BBC content. Each answer was rated from "no issues" to "significant issues."

That approach matters because news answers are not just summaries. A useful response must preserve context, attribute information correctly, avoid presenting old material as new, and make clear when a statement is a claim rather than an established fact.

The scale of the problems

The study found that 51 percent of AI responses contained significant issues. Those issues ranged from basic factual errors to completely fabricated information.

When the systems specifically cited BBC content, the problems continued. In those cases, 19 percent of responses contained errors, while 13 percent contained either fabricated or misattributed quotes.

The BBC set a demanding standard. Even a small mistake could count as a significant issue if it might mislead someone reading the response. That standard is important because readers often use AI assistants for quick answers, not as a starting point for a full source-by-source audit.

The findings also point to a deeper problem with trust. If an AI assistant cites a known news source but still misstates what that source said, the appearance of attribution can give a weak answer more authority than it deserves.

Where the systems went wrong

Some errors involved current events. ChatGPT failed to acknowledge the death of a Hamas leader and described him as an active leader months after his passing. Microsoft Copilot presented a 2022 article about Scottish independence as if it were current news.

Other errors touched on health advice. Google Gemini incorrectly claimed that the UK's National Health Service (NHS) advises against vaping, when the source says the health authority recommends e-cigarettes to help people quit smoking.

Perplexity AI fabricated details about science journalist Michael Mosley's death. The source article also notes a separate example involving Microsoft's Bing chatbot, which became confused while reading court coverage and accused a journalist of committing the crimes he was reporting on.

The recurring pattern was not limited to one type of mistake. The AI assistants regularly treated outdated information as current news, failed to separate opinions from facts, and left out important context.

Why news is difficult for AI assistants

News questions are especially demanding because they often depend on timing, wording, attribution, and context. A response can be partly correct and still be misleading if it presents old reporting as current, removes a key caveat, or attaches a quote to the wrong source.

The BBC also warned that the larger scale of the problem is still unclear. Its report states: "The scale and scope of errors and the distortion of trusted content is unknown."

That uncertainty is difficult to resolve because AI assistants can answer an almost unlimited range of questions. Different users may also receive different responses when asking the same question, which makes systematic evaluation harder.

The challenge extends beyond individual readers. The source says media companies and regulators lack the tools to fully monitor or measure these distortions. The BBC also suggests that even the AI companies themselves may not know the true extent of their systems' errors.

What happens next

The BBC says it will run this study again in the near future. Future evaluations could become more useful by adding independent reviewers and comparing how often humans make similar mistakes.

That kind of comparison would help clarify the gap between human and AI performance. It would also make it easier to judge whether the problem is improving, staying the same, or becoming harder to measure as these systems are used more widely.

The BBC also points to a possible role for oversight. Its report says: "Regulation may have a key role to play in helping ensure a healthy information ecosystem in the AI age."

For now, the main lesson is straightforward. AI assistants can produce confident answers about the news, but confidence is not the same as reliability. Readers, publishers, and technology companies still face a basic problem: current-events answers need careful verification, especially when they involve sources, quotes, health advice, or fast-changing facts.