The Decoder July 24, 2025 IDIOCRACY

AI drug review tool Elsa faces scrutiny inside the FDA

The FDA is using Elsa, a generative AI system, to help evaluate new drugs. Staff quoted by CNN say the system can invent studies or misrepresent research data, while the FDA's head of AI acknowledges that hallucinations are possible.

WTF Index IDIOCRACY

◄ Terminator 2 Idiocracy 3 ►

Elsa's hallucinated studies and distorted research summaries in FDA drug review point more to eroded truth and overreliance than autonomous AI danger.

AI drug review tool Elsa faces scrutiny inside the FDA

The US Food and Drug Administration is using a generative AI system called Elsa to support work on new drug evaluations. According to staff members cited by CNN, the system can produce unreliable outputs, including invented studies and distorted descriptions of research data.

Elsa, short for Efficient Language System for Analysis, is meant to speed up drug approvals. The concern raised by insiders is not that AI is being explored, but that this particular tool is already being used in sensitive review work while its outputs may require careful checking.

What Elsa is being used for

The source describes Elsa as a generative AI system used by the FDA in the drug approval process. Its role includes helping review clinical protocols and helping assess risks during inspections.

Those are document-heavy tasks where speed can matter. A system that can summarize, compare, or analyze large amounts of material could appear useful in that setting. But the same tasks also depend on accuracy, because a wrong summary or a false reference can change how a reviewer understands the evidence in front of them.

That is why the reported problem is significant. Staff members say Elsa does not merely make minor mistakes. They say it can fabricate studies or misrepresent research data, which are exactly the kinds of errors that can undermine confidence in an AI-assisted review process.

Staff concerns center on hallucinations

One current FDA employee told CNN:

"Anything that you don’t have time to double-check is unreliable. It hallucinates confidently,"

That statement captures the core risk with generative AI in a regulatory workflow. If a reviewer has time to verify every output against the underlying research, the system may function as a rough assistant. If there is not enough time to check, the output can become a liability.

The issue is described as a known problem with large language models. These systems can generate fluent answers that sound authoritative even when the content is wrong. In Elsa's case, staff members reported that the tool frequently invents studies or misrepresents research data.

That distinction matters. An obvious error is easier to catch. A confident, research-like answer can be harder to challenge, especially when it appears inside a workflow designed to move faster.

The FDA acknowledges the risk

The FDA's head of AI, Jeremy Walsh, acknowledged that Elsa can face the same issue seen in other generative AI systems. He said:

"Elsa is no different from lots of [large language models] and generative AI. They could potentially hallucinate."

That acknowledgement does not remove the concern. It clarifies the tradeoff. The agency is using a tool that can help accelerate review work, while also recognizing that the technology can produce false or misleading material.

For a system used in drug evaluation, that creates a practical question: how much checking is enough? If every AI-generated statement must be verified, the efficiency gain may be limited. If outputs are not checked closely enough, the agency risks relying on material that staff say may be wrong.

A regulatory gray area

The source says Elsa operates in a regulatory gray area because there are currently no binding rules for AI in US healthcare. That leaves the FDA using generative AI in a setting where the oversight framework has not caught up with the technology.

This gap is especially important because Elsa is not described as a casual office tool. It is already being used to review clinical protocols and assess risks during inspections. Those uses place the system inside the practical machinery of regulation.

The situation points to several issues that follow directly from the reported facts:

AI-assisted drug review depends on verification, especially when outputs can include fabricated studies.
Inspection risk assessments may be affected if research data is misrepresented.
Staff confidence can be weakened when a tool produces fluent but unreliable answers.
Without binding rules for AI in US healthcare, agencies may be left to manage these risks internally.

Why this matters for AI in drug approval

Elsa shows both the appeal and the danger of generative AI in high-stakes administrative work. The appeal is speed. The danger is that speed can turn into overreliance if people treat an AI-generated answer as dependable before it has been checked.

The staff warning is direct: if there is no time to double-check, the result cannot be trusted. That is a demanding standard, but it fits the type of work described in the source. Drug review, clinical protocol analysis, and inspection risk assessment all depend on careful handling of evidence.

The FDA's use of Elsa therefore raises a broader question for institutions adopting generative AI: where should these systems assist, and where should their outputs be treated only as drafts that require verification? Based on the concerns described by staff, Elsa's usefulness depends less on how confidently it answers and more on whether its answers can be checked against real research.