Why AI agents may need experience to make the next leap

Richard S. Sutton and David Silver argue that future AI progress should come from agents that learn through action and feedback. Their view shifts attention from static human-written data toward reinforcement learning, world models and long-running adaptation.

WTF Index TERMINATOR
◄ Terminator 3 Idiocracy 1 ►

The story focuses on more autonomous AI agents that learn continuously from action and feedback, increasing capability and control concerns without describing direct harm.

Why AI agents may need experience to make the next leap

The next phase of AI may depend less on absorbing what humans have already written and more on systems that act, observe results and keep improving. That is the direction outlined by Richard S. Sutton and David Silver in their essay "Welcome to the Era of Experience," which builds on Sutton's earlier "Bitter Lesson."

The central idea is direct: the most durable advances in AI come from scalable learning and search, not from manually injecting human insight into machines. Sutton and Silver now extend that argument to AI agents that learn continuously from their own experience.

From human data to lived feedback

Today's generative AI, including large language models, is built largely on human-created material such as books, websites and forums. That approach has produced capable systems, but Sutton and Silver argue that it has a ceiling. High-quality human data is finite, and some discoveries may sit beyond what people have already documented.

In their view, imitation can make AI useful, but it is not enough for true creativity. A model trained mainly to reproduce patterns in existing text remains tied to the limits of that text. The proposed alternative is an agent that keeps learning after deployment by interacting with an environment and receiving feedback from what happens next.

This marks a shift away from fixed datasets and toward open-ended exploration. Instead of training a system once and treating it as largely static, an experience-driven AI would adapt over months or years. Each action becomes another chance to learn, and each result becomes new training signal.

Why reinforcement learning returns to the center

Sutton's "Bitter Lesson," published in 2019, argued that AI breakthroughs have usually favored methods that scale with compute and learning over systems built around human intuition. That idea is already foundational in reinforcement learning, the technology behind AlphaGo and newer "reasoning" language models.

Sutton, a Turing Award winner and head of DeepMind's Alberta lab, and Silver, his former PhD student and Deepmind RL lead, describe the new direction as a return to reinforcement learning roots. They argue that long-term adaptation will require classic RL ideas such as temporal abstraction, exploratory behavior and dynamic value functions.

One example in the source article is AlphaProof, a DeepMind system for formal mathematics. It combines a pre-trained language model with the AlphaZero reinforcement learning algorithm. After a short phase of learning from human proofs, AlphaProof generated over 100 million additional proof steps through autonomous exploration and outperformed systems trained only on curated human data.

For Sutton and Silver, that example points to a broader pattern: human knowledge can help start the process, but the larger gains may come when AI systems explore further on their own.

World models could change how agents plan

The paper also argues for a different view of machine "thinking." Current language models often try to imitate human reasoning with methods such as chain-of-thought prompts. Sutton and Silver warn that this can also carry forward human errors and biases.

They propose that future agents should build their own internal "world models." These are simulations that help an agent predict the consequences of possible actions. In that framework, planning is not just producing plausible language; it is testing expected outcomes against a model of how the environment behaves.

The source article gives several possible domains for this kind of learning:

  • Health assistants that study sleep patterns and adjust advice.
  • Educational agents that follow student progress over years.
  • Scientific AI systems that conduct their own experiments.

The feedback for these systems would not come only from human ratings. It could come from measurable signals in the environment, including resting heart rate, CO₂ levels and test results. Human input may still matter, but Sutton and Silver emphasize feedback connected to real consequences, such as how a cake tastes or how someone feels after a workout.

Autonomy raises both promise and control questions

More capable AI agents would bring new responsibility. Systems that plan over long periods and keep adapting could develop abilities usually associated with people. That also means they may become harder to control and tune than conventional software.

Sutton and Silver do not frame experience only as a risk. They suggest that agents embedded in real environments could learn from unintended consequences and adjust. Reward functions could also be refined through user feedback, while real-world constraints such as medical studies would slow reckless progress.

The authors argue that the technical ingredients already exist: compute, simulation environments and reinforcement learning algorithms. What they call "experiential intelligence" is still young, but their message is that the AI field should be willing to move toward a new paradigm.

The limits of language-only AI are becoming clearer

The source article notes that the idea of pure language modeling being insufficient for superhuman AI has become more common in the industry. Even with large amounts of text, models still struggle with basic common sense and generalizing across tasks.

Other leading figures are also described as looking beyond language models. Ilya Sutskever, OpenAI co-founder and ex-chief scientist, is working on alternative paths to superintelligence at "SSI." Yann LeCun at Meta is pushing for new architectures, and Sam Altman said in 2023 that language alone is not enough for AGI and beyond.

That broader movement makes Sutton and Silver's argument more than a technical preference. It suggests a different foundation for AI development: agents that learn by doing, build models of the world, and improve through feedback rather than only remixing what humans have already written.