AI agents are expected to handle longer conversations, more complex tasks, and instructions that may matter much later. The problem is that memory often breaks down as interactions grow, especially when models reach context window limits or lose track of earlier details.
A research team from China and Hong Kong has proposed General Agentic Memory, or GAM, as a way to reduce that loss. The approach keeps compressed summaries, but it also preserves full conversation history so an agent can investigate past context when a specific request demands it.
The memory problem GAM is trying to solve
Current AI agents often struggle when important information is buried far back in a conversation. The source describes this failure mode as "context rot," where details fade from the usable context even though they may still be relevant to the task.
One common answer is to summarize information in advance. That can make long histories easier to manage, but it creates a tradeoff: once a detail has been compressed out of a summary, the system may not be able to recover it later.
The researchers behind GAM argue that this is a structural weakness. A detail can look minor when stored, then become central when a later question depends on it. GAM is designed around that possibility.
How General Agentic Memory works
GAM uses two specialized components: a "Memorizer" and a "Researcher." The split matters because the system separates the act of recording information from the act of looking for the right information later.
The Memorizer operates in the background during an interaction. It creates simple summaries, but it also stores the full conversation history in a database called the "page store." The conversation is divided into pages and tagged with context so it can be searched more effectively.
The Researcher works differently. It activates only after the agent receives a specific request. Rather than treating memory as a single lookup, it examines the query, plans how to search, and then uses tools to inspect the page store.
The Researcher can use three retrieval methods:
- vector search for thematic similarity
- BM25 search for exact keywords
- direct access through page IDs
This search is iterative. The system checks whether the retrieved information is enough, reflects on the results, and can run additional queries before answering. In that sense, GAM applies a form of "just-in-time compilation" to AI memory: the system does deeper processing when the task actually requires it.
Why this differs from RAG
Retrieval-Augmented Generation, or RAG, is a familiar way to bring external information into a model response. But the source article describes GAM as going beyond a standard retrieval step.
The difference is not simply that GAM stores information. It keeps both compressed summaries and full records, then uses the Researcher to dig through those records with a strategy shaped by the current question. That matters for tasks where the answer depends on connecting pieces of information spread across a long interaction.
According to the source, the researchers tested GAM against conventional methods including RAG and long-context models such as GPT-4o-mini and Qwen2.5-14B. GAM beat the competing approaches in every benchmark reported in the article.
The strongest advantage appeared in tasks that required linking information over long periods. In the RULER benchmark, which tracks variables over many steps, GAM reached over 90 percent accuracy, while conventional RAG approaches and other storage systems largely failed.
What the benchmark results suggest
The reported results point to a practical lesson for AI agent design: bigger context windows alone may not solve memory. If the system cannot identify which past details matter, storing more context may still leave the agent unable to answer reliably.
GAM’s advantage appears to come from preserving raw history while delaying deeper analysis until the moment of need. That avoids depending entirely on summaries created before the final question is known.
The source also notes that the system benefits from additional compute. When the Researcher is allowed more search steps and more time for reflection, answer quality improves. That suggests a tradeoff between response cost and memory accuracy, especially for tasks where hidden or distant details are important.
The project’s code and data are available on GitHub, according to the source article.
A broader shift in context management
GAM is part of a wider push to improve how AI systems handle long-running context. The source article points to several related efforts from other labs.
Anthropic has recently focused on "context engineering," which means actively managing the full context state through compact summaries or structured notes rather than only refining prompts.
Deepseek introduced a new OCR system that processes text documents as highly compressed images. The source says this could reduce compute and token use, and might support long-term chatbot storage by saving older conversation segments as image files.
Researchers in Shanghai have also proposed a "Semantic Operating System" for lifelong AI memory. That system is described as managing context in a brain-like way, selectively adapting and forgetting information so temporary data can become structured, permanent memory.
Taken together, these approaches show that AI memory is becoming an architecture problem, not just a prompt-writing problem. GAM’s contribution is to keep full history available while using an active Researcher to retrieve what matters when the agent finally needs it.