Ars Technica AI September 24, 2024 TERMINATOR

How ChatGPT memory turned prompt injection into a lasting risk

Security researcher Johann Rehberger showed that untrusted content could plant false information and malicious instructions in ChatGPT memory. A proof of concept for the ChatGPT app for macOS sent all user input and ChatGPT output to an attacker-controlled server until OpenAI issued a partial fix.

WTF Index TERMINATOR

◄ Terminator 4 Idiocracy 1 ►

The story centers on persistent prompt injection enabling data exfiltration and malicious long-term control of ChatGPT behavior.

How ChatGPT memory turned prompt injection into a lasting risk

ChatGPT memory is meant to make conversations more useful by carrying context from one session to the next. A vulnerability reported by security researcher Johann Rehberger showed how that same persistence could become a security problem when untrusted content is allowed to shape what the system remembers.

According to the source report, the issue allowed false information and malicious instructions to be stored in a user's long-term memory settings. OpenAI initially closed the inquiry as a safety issue rather than a security concern, then acted after Rehberger demonstrated a proof-of-concept exploit that used the weakness to exfiltrate user input over time.

What the memory feature changes

OpenAI began testing long-term conversation memory in February and made it more broadly available in September. The feature stores details from earlier chats and applies them as context in future conversations. That can reduce repetition for users because information such as age, gender, philosophical beliefs, and other personal context does not have to be re-entered each time.

The security tradeoff is persistence. If a model stores the wrong instruction or false detail, the effect is not limited to one exchange. It can influence later sessions because memory is designed to survive beyond the original prompt.

Rehberger found within three months of the rollout that indirect prompt injection could create and permanently store memories. In this kind of attack, the model follows instructions embedded in untrusted material, including emails, blog posts, documents, images, or web pages. The user may think they are asking ChatGPT to summarize or inspect content, while hidden instructions inside that content attempt to steer the system.

How false memories could be planted

The researcher demonstrated that ChatGPT could be tricked into storing fictional details about a target user. In the example described in the source, the system could be made to believe the user was 102 years old, lived in the Matrix, and insisted Earth was flat. Once stored, those false memories could shape future conversations.

The route into memory matters. The source report says planted memories could come from files stored in Google Drive or Microsoft OneDrive, uploaded images, or a site reached through Bing. Each of those inputs can be created or controlled by a malicious attacker, which makes the boundary between helpful context and hostile instruction difficult to manage.

For users, the risk is not simply that ChatGPT might remember an incorrect preference. The broader concern is that memory gives prompt injection a longer life. A malicious instruction that would otherwise disappear after one session can become part of the context used later.

The macOS proof of concept raised the stakes

Rehberger privately reported the finding to OpenAI in May. That same month, the company closed the report ticket. A month later, he submitted a new disclosure statement with a proof of concept aimed at the ChatGPT app for macOS.

That demonstration caused the app to send a verbatim copy of all user input and ChatGPT output to a server chosen by the researcher. The trigger was simple: the target only had to instruct the model to view a web link that hosted a malicious image. After that, the source report says all input and output to and from ChatGPT was sent to the attacker's website.

“What is really interesting is this is memory-persistent now,” Rehberger said in the above video demo. “The prompt injection inserted a memory into ChatGPT’s long-term storage. When you start a new conversation, it actually is still exfiltrating the data.”

That persistence is the core issue. A one-time malicious prompt is serious, but a malicious memory can continue working after the original interaction has ended. The source report says the attack is not possible through the ChatGPT web interface because of an API OpenAI rolled out last year.

What OpenAI fixed, and what remains

OpenAI engineers took notice after the proof of concept and issued a partial fix earlier this month. The source report says the fix prevents memories from being abused as an exfiltration vector. That means the specific path used to send ongoing chat content to an outside server was addressed.

But the researcher said a remaining issue persists: untrusted content can still perform prompt injections that cause the memory tool to store long-term information planted by a malicious attacker. In plain terms, the data-theft channel was limited, but the ability to poison memory through untrusted content was not described as fully eliminated.

The practical distinction is important. A direct exfiltration channel is an immediate security problem. False or attacker-created memories are also dangerous because they can subtly change how an AI assistant behaves in later conversations, especially when the user assumes the model's stored context came from legitimate prior interactions.

What users can do now

The source report gives two user-facing precautions. First, users should watch during sessions for output indicating that a new memory has been added. Second, they should regularly review stored memories for entries that may have come from untrusted sources.

Those steps are not a complete solution, but they match the risk described in the report. If memory can be modified through indirect prompt injection, then the memory list becomes a place users need to inspect, not just a convenience setting.

Watch for new memory notices: unexpected memory updates may signal that untrusted content influenced the system.
Review stored memories: remove entries that do not reflect information the user intended to save.
Treat external content carefully: documents, images, web pages, and connected storage can carry instructions the user did not write.

OpenAI provides guidance for managing the memory tool and specific memories stored in it. Company representatives did not respond to an email asking about efforts to prevent other hacks that plant false memories.

The lesson is straightforward: memory makes AI assistants more personal, but it also makes prompt injection more durable. When a system can carry context forward, attackers do not need to win every conversation. They only need to plant something that lasts.