Why AI memory needs more than bigger context windows

Researchers argue that AI needs a new way to manage context if it is ever going to support durable, human-like memory. Their proposed Semantic Operating System would store, revise and forget information across long periods instead of relying on short-lived context windows.

WTF Index NEUTRAL
◄ Terminator 1 Idiocracy 0 ►

The story concerns a research proposal for more durable AI memory, which mildly increases capability but does not clearly imply harm or societal degradation.

Why AI memory needs more than bigger context windows

AI systems are getting better at handling larger inputs, but the researchers behind a new proposal argue that scale alone will not solve the memory problem. Their answer is a Semantic Operating System: a framework for preserving meaning, updating knowledge and deliberately forgetting information over long periods.

The idea reframes context engineering as more than a prompt-writing technique. It treats context as the working material of AI memory, and asks how machines could manage it across conversations, modalities and time.

How context engineering reached this point

The researchers describe context engineering as moving through four phases. In the 1990s, early context-aware systems were limited by rigid inputs. People had to convert intentions into structured commands that machines could process.

That changed in 2020 with models like GPT-3. Instead of depending only on explicit instructions, these systems could interpret natural language and infer meaning from unstructured input. Context shifted away from sensor data and toward the forms people already use: conversation, implication and ordinary language.

The term later returned to wider attention as an addition to prompt engineering. Anthropic brought the concept back into focus, while prompt engineer Riley Goodside was already using the term in early 2023. By the summer of 2025, Shopify CEO Tobi Lutke and former OpenAI researcher Andrej Karpathy were discussing it as well.

In the researchers' framework, today's systems are not yet at the destination. They write, "We are currently in Era 2.0, transitioning to Era 3.0." Era 3.0 would involve human-level interpretation, including social cues and emotions. Era 4.0 would go further, imagining systems that can identify connections people may not see themselves.

Why bigger context windows are not enough

The paper points to a practical problem: as context grows, model performance can decline. Many systems start degrading even when their memory is only half full. That means adding more space for input does not automatically create reliable long-term understanding.

Cost is another barrier. Transformer models compare every token with every other token, so doubling the context does not merely double the work; it quadruples it. The source gives the scale plainly: about 1 million comparisons for 1,000 tokens and roughly 100 million for 10,000.

This is why feeding an entire PDF into a chat window can be inefficient when only a few pages matter. Models usually work better when the input is narrowed to what is relevant. The difficulty is that most users do not want to manage context manually, and most chat interfaces make it easy to upload everything.

The same tension appears in generative AI-powered company search. Such systems can help with exploration, but they do not guarantee a precise return of exactly what was requested. Prompt engineering still matters, yet context engineering becomes the discipline of understanding what the model has available to reason from.

What a Semantic Operating System would manage

The proposed Semantic Operating System is meant to store and manage context in a more durable, structured way. The researchers outline four required capabilities:

  • Large-scale semantic storage that captures meaning rather than only raw data.
  • Human-like memory management that can add, modify and forget information intentionally.
  • New architectures that handle time and sequence more effectively than transformers.
  • Built-in interpretability so users can inspect, verify and correct the system's reasoning.

The paper also reviews several ways to organize textual context. Timestamping is the simplest method because it preserves order and is easy to use in chatbots. Its weakness is that it lacks semantic structure and does not scale well.

Another method sorts information into functional roles such as "goal," "decision," or "action." That can make context clearer, but it can also become too rigid for flexible reasoning. Other approaches transform context into question-answer pairs or build hierarchies from broad concepts to specific ones.

Each option has a cost. Question-answer reformulation can interrupt the flow of thought. Hierarchies can clarify concepts while missing logical relationships or changes over time. The Semantic Operating System is presented as a way to manage these trade-offs more deliberately.

Memory must cross text, images, audio and more

Modern AI does not work with text alone. It must combine text, images, audio, video, code and sensor data. These inputs are different by nature: text is sequential, images are spatial and audio is continuous.

The researchers describe three broad strategies for multimodal processing. One places all data into a shared vector space so related concepts can cluster together. Another feeds multiple modalities into a single transformer so they can attend to one another at every layer. A third uses cross-attention, allowing one modality to focus on selected parts of another.

The challenge is that technical systems still rely on fixed mappings, unlike the human brain, which shifts between sensory channels fluidly. A central concept in the proposal is "self-baking," which means turning temporary impressions into stable, structured memories.

In that model, short-term memory holds current information. Long-term memory captures repeated or important patterns. Learning happens as data moves between the two.

Early signs and deeper stakes

The source points to early examples that resemble pieces of a Semantic Operating System. Anthropic's LeadResearcher can keep long-term research plans after processing more than 200,000 tokens. Google's Gemini CLI uses the file system as a lightweight database, storing project backgrounds, roles and conventions in a central file and compressing them with AI-generated summaries.

Alibaba's Tongyi DeepResearch takes another route by condensing information into a "reasoning state." Future searches can then build on summaries rather than carrying entire histories forward.

The researchers also suggest that brain-computer interfaces could eventually change context collection by recording focus, emotional intensity and cognitive effort. That would move AI memory beyond external actions and toward internal signals.

The paper ends with a broader claim about identity. Drawing on Karl Marx's idea that people are shaped by social relationships, the researchers argue that digital traces now play a similar role. They write, "The human mind may not be uploaded, but the human context can—turning context itself into a lasting form of knowledge, memory, and identity."

In that view, conversations, decisions, communication styles and patterns of thinking could persist, evolve and produce new insights beyond a person's lifetime. The Semantic Operating System is proposed as the technical foundation for making that kind of durable AI memory possible.