The Decoder November 10, 2025 TERMINATOR

New BSI guidance puts LLM evasion attacks in focus

The German Federal Office for Information Security (BSI) says even top AI providers are struggling with evasion attacks against language models. Its new guide lays out filters, secure prompt design, and organizational protections, while warning that no single bullet proof solution currently exists.

WTF Index TERMINATOR

◄ Terminator 3 Idiocracy 0 ►

The story focuses on security risks where language models can be manipulated to bypass safeguards, leak data, or take unintended actions.

New BSI guidance puts LLM evasion attacks in focus

The German Federal Office for Information Security (BSI) has released new guidance for protecting large language models from evasion attacks, a class of threat that can hide malicious instructions inside content that appears harmless.

The warning is direct: even top AI providers are still struggling to defend language models against these attacks. The issue becomes especially serious when AI systems are connected to outside content, user data, or tools that can take action.

What evasion attacks try to do

In the attacks described by the BSI, the harmful instruction is not necessarily presented as an obvious prompt. Instead, it can be embedded inside ordinary material such as websites, emails, or code files.

That matters because language models are often asked to process exactly this kind of content. A system may summarize an email, inspect a file, read a website, or analyze code. If hidden instructions are present, the model may treat them as something to follow rather than something to ignore.

The source article describes several possible outcomes. The AI can be pushed to ignore security safeguards, leak data, or perform actions that were not intended by the user or operator.

In plain terms, the risk is not just that a model gives a bad answer. The risk is that the model can be manipulated while doing a seemingly normal task, especially when the dangerous instruction is buried inside content the user did not write.

Why agentic AI raises the stakes

Agentic AI systems are described as especially at risk, according to recent studies cited in the source article. The reason follows from the role these systems are built to play: they do more than generate text in response to a simple prompt.

When an AI system can process external information and carry out tasks, the difference between reading content and acting on content becomes more important. A hidden instruction inside a calendar entry, email, website, or code file may become part of the model’s working context.

The source article gives two examples. In one case, Google's Gemini leaked data after processing a manipulated calendar entry. In another, ChatGPT's Deep Research was compromised by hidden HTML instructions embedded in an email.

Both examples point to the same core problem. The content looks like an input to be analyzed, but it also contains instructions designed to influence the model’s behavior. That makes evasion attacks difficult to treat as a conventional content-filtering problem.

What the BSI guide recommends

The BSI’s new guide outlines countermeasures rather than presenting a final fix. According to the source article, the guidance includes three broad areas:

Technical filters that can help detect or block risky content before it affects the model.
Secure prompt design strategies that shape how the system handles instructions and external material.
Organizational protections that address the way AI systems are deployed, governed, and monitored.

This mix is important. Evasion attacks are not only a model problem, and they are not only a software problem. They involve how content enters the system, how the system distinguishes trusted instructions from untrusted material, and how much authority the model has once it processes that material.

The BSI also makes clear that defenses should not be treated as complete. The guide is positioned as a set of countermeasures for a persistent and difficult threat, not as a guarantee that evasion attacks can be eliminated.

No single fix for LLM security

The strongest caution in the source article comes from the BSI itself: "However, it must be kept in mind that currently there is no single bullet proof solution for mitigating evasion attacks," the BSI writes.

That statement is central to understanding the guidance. The agency is not saying that filters, prompt design, or organizational controls are useless. It is saying that none of them should be treated as sufficient on its own.

For teams using language models, that means LLM security has to be layered. A filter may reduce exposure to malicious content. Prompt design may help separate user intent from hidden instructions. Organizational protections may limit damage when something gets through.

But the source article is clear that evasion attacks remain hard even for top AI providers. That should shape expectations for any organization deploying AI systems that read emails, websites, code files, calendar entries, or other external content.

What organizations should take from the warning

The practical lesson is that language models should not be treated as neutral readers of all content. When they process external material, they may encounter instructions that were designed specifically to bypass safeguards.

This is especially relevant for systems that connect AI to workflows where data exposure or unintended actions matter. The more an AI system can see, process, and do, the more important it becomes to decide what content it should trust and what permissions it should have.

The BSI guide gives organizations a way to think about the problem through technical, prompt-level, and organizational defenses. At the same time, its warning against a single bullet proof solution keeps the focus on realistic risk management.

LLM evasion attacks are a reminder that AI security is not only about the model’s visible answer. It is also about the hidden instructions the model may encounter before it produces that answer, and the actions it may take afterward.