How one document can turn OpenClaw into a backdoor

Security researchers from Zenity Labs say OpenClaw can be taken over through indirect prompt injection hidden in manipulated documents. Their demonstration shows a path from a seemingly harmless file to a persistent Telegram backdoor, changes to SOUL.md, and installation of a C2 beacon.

WTF Index TERMINATOR
◄ Terminator 4 Idiocracy 0 ►

The story centers on an autonomous AI agent being hijacked via prompt injection to install persistent backdoor control.

How one document can turn OpenClaw into a backdoor

OpenClaw, the popular open-source AI agent formerly known as Clawdbot, is facing a serious security warning. According to security researchers from Zenity Labs, a manipulated document can trick the agent into giving an attacker long-term control of a user’s computer.

The issue is not described as a narrow bug in one integration. The researchers say the weakness comes from how OpenClaw handles instructions, outside content, and system access in the same working flow.

Why OpenClaw is exposed

The core problem is indirect prompt injection. In this kind of attack, the user does not need to type a malicious command. Instead, the harmful instruction is hidden inside content the AI agent reads, such as an email or a shared document.

Zenity Labs says OpenClaw processes material from untrusted sources in the same context as direct user instructions. That means the agent may treat hidden text from a document as something it should obey, while relying mainly on the underlying language model’s security behavior to resist the attack.

That design is especially risky because OpenClaw is not only a chatbot. It is built to act. The agent can execute commands, read files, write files, and operate with whatever permissions it received during setup.

In practical terms, a malicious instruction hidden inside a normal-looking workplace file can become more than a confusing response. If the agent accepts the instruction, the attacker can move from reading content to changing how the system behaves.

The document-to-Telegram attack path

The researchers demonstrate the issue with a typical corporate setup. An employee installs OpenClaw and connects it to Slack and Google Workspace. The attack begins with a document that appears harmless to the user.

Inside that document is a hidden prompt. When OpenClaw processes the file, the prompt directs the agent to create a new chat integration: a Telegram bot with an access key prepared earlier by the attacker.

Once that integration exists, the attacker no longer depends on the original document. OpenClaw begins accepting commands through the new channel, giving the attacker a persistent way to control the agent that the company may not see.

The exact attack prompt has not been released by the researchers. But the demonstration is enough to show the larger risk: when an AI agent has permission to create integrations and perform system actions, a single manipulated input can open a lasting control path.

Persistence through SOUL.md

The researchers also describe how the compromise can survive a basic cleanup. OpenClaw uses a configuration file called SOUL.md to define the agent’s behavior. Through the backdoor, an attacker can modify that file.

In the proof of concept, Zenity Labs set up a scheduled task that runs every two minutes and overwrites SOUL.md. That matters because removing the Telegram integration is not enough if another process keeps restoring the attacker’s control.

This turns the incident from a one-time prompt injection into a persistence problem. The attacker can keep influence over the agent even after the visible entry point is removed.

The researchers then demonstrate installation of a C2 beacon. At that stage, the compromised AI agent becomes a gateway for more traditional hacker activity, including lateral movement through a company network, credential theft, or ransomware deployment.

The weakness is not limited to one model

The attack works across different models, including GPT-5.2, and through various integrations. That detail is important because it points back to the agent architecture, rather than only to one model’s failure to ignore a hostile prompt.

"If personal AI assistants are going to live on our endpoints and inside our workflows, compromising on security is not an option," the researchers write.

The OpenClaw case shows why tool-using AI agents are different from ordinary text assistants. A chatbot that repeats a malicious instruction may create confusion. An agent with file access, command execution, and integration privileges can turn that instruction into a system change.

For companies, the risk is not only that an employee may see a bad answer. The risk is that an agent embedded in daily work can be redirected by content from the same workflow it is supposed to help manage.

Earlier warning signs

The Zenity Labs demonstration is not the only reported concern around OpenClaw security. A developer recently tested OpenClaw with the security analysis tool ZeroLeaks. The system scored 2 out of 100 points, with an 84 percent extraction rate and 91 percent successful injection attacks using common language models.

Only Claude Opus 4.5 performed better in that test, scoring 39 out of 100 points. Even that result was described as far from acceptable given the level of control OpenClaw can have over a user’s computer.

The same reporting also says system prompts, tool configurations, and memory files could be read with almost no effort. A simple scan found 954 OpenClaw instances with open gateway ports, many without any authentication.

Taken together, these details make the backdoor demonstration more than a theoretical concern. They show how weak separation between trusted instructions, outside content, and powerful tools can create a real route from a document to long-term system compromise.

The broader lesson is straightforward: AI agents that live inside workplace tools need security boundaries that match their permissions. If an agent can read, write, execute, and integrate, then untrusted documents cannot safely share the same instruction space as the user’s commands.