Hallucinated AI code packages open a supply-chain risk

Research using 16 large language models found that AI-generated code often names package dependencies that do not exist. Those repeatable hallucinated names could give attackers a path to publish malicious packages and wait for developers to install them.

WTF Index TERMINATOR
◄ Terminator 3 Idiocracy 1 ►

AI coding hallucinations create a plausible software supply-chain attack path through malicious packages.

Hallucinated AI code packages open a supply-chain risk

AI coding tools can produce useful scaffolding, but new research highlights a dangerous failure mode: they may invent software packages that look plausible enough to install. In the wrong hands, those made-up dependency names can become a route into the software supply chain.

What the research found

The study used 16 of the most widely used large language models to generate 576,000 code samples. Across those samples, researchers found 2.23 million package references, and 440,445 of them, or 19.7 percent, pointed to packages that did not exist.

The researchers call this problem "package hallucination." In AI, a hallucination is an output that is factually wrong, nonsensical, or disconnected from the task. In code, the mistake can be especially risky because a dependency name is not just text. It is often something a developer may try to install and run.

The study also found 205,474 unique package names among the 440,445 hallucinated references. That matters because the larger the pool of invented names, the more opportunities there may be for attackers to claim one of them in a package registry.

Why nonexistent dependencies can become attacks

Modern software depends heavily on third-party libraries. A dependency is a separate code component that another program needs in order to work. Dependencies save developers from rebuilding common functionality, but they also make the software supply chain larger and more fragile.

The risk described in the research connects to dependency confusion, also known as package confusion. In that kind of attack, software is tricked into using the wrong dependency. One route is to publish a malicious package under a name that software might request, sometimes with a version stamp that makes it appear newer than a legitimate option.

Package confusion was first demonstrated in 2021 in a proof-of-concept exploit that ran counterfeit code on networks belonging to major companies, including Apple, Microsoft, and Tesla. The broader danger is that software supply-chain attacks aim at the source of software creation, where one poisoned component can affect downstream users.

AI-generated package names add a new twist. If a model repeatedly suggests a dependency that does not exist, an attacker can publish a malicious package using that hallucinated name. A developer who trusts the model output and installs the package without carefully checking it could execute the attacker’s payload.

The repetition is the warning sign

The most important detail is not only that models invented package names. It is that many invented names appeared more than once.

The study found that 43 percent of package hallucinations were repeated over 10 queries. It also reported that 58 percent of the time, a hallucinated package appeared more than once in 10 iterations.

That pattern makes the problem more serious than a stream of isolated mistakes. If a hallucinated name appears again and again, attackers can identify those names, publish packages under them, and wait for the same suggestions to reach developers.

In practical terms, repeatability turns an AI error into a target. A one-time typo is hard to exploit at scale. A recurring invented package name is more useful to someone looking for a predictable path into developer workflows.

Which models and languages hallucinated more

The study found differences between model types and programming languages. Open source LLMs hallucinated the most, with 21 percent of dependencies linking to nonexistent libraries. The average package hallucination rate for open source LLMs such as CodeLlama and DeepSeek was nearly 22 percent, compared with a little more than 5 percent for commercial models.

The researchers also tested Python and JavaScript. They ran 30 tests in total: 16 in Python and 14 in JavaScript. Each test generated 19,200 code samples.

Python code produced fewer hallucinations than JavaScript code. The average was almost 16 percent for Python and a little over 21 percent for JavaScript.

Joseph Spracklen, a University of Texas at San Antonio PhD student and lead researcher, said the causes are difficult to pin down because large language models are complex systems. He attributed the gap between commercial and open source models in part to much larger parameter counts in commercial variants, while also noting that architecture and training details remain proprietary.

Spracklen also pointed to other possible factors, including training data, fine-tuning, instruction training, and safety tuning. For JavaScript, he speculated that the higher hallucination rate may stem from JavaScript having roughly 10 times more packages in its ecosystem than Python, along with a more complicated namespace.

What this means for AI-generated code

The findings underline a simple point: AI-generated code still needs verification before it becomes part of a real project. A model may produce code that looks coherent while quietly including a dependency that does not exist.

That is a different class of problem from a syntax error. A broken import may fail quickly. A hallucinated dependency name can become dangerous if someone else registers that name and attaches malicious code to it.

The research is scheduled to be presented at the 2025 USENIX Security Symposium. It lands as Microsoft CTO Kevin Scott has predicted that 95 percent of code will be AI-generated within five years.

If AI-generated code becomes more common, the handling of dependencies becomes more important, not less. Package names, versions, and sources are part of the security boundary. The study shows that when models invent those details, the mistake can create an opening for package confusion attacks.