Meta and Ohio State University have introduced Early Experience, a training approach for language agents that uses an agent's own actions as a source of learning. Instead of relying only on human demonstrations or waiting for external reward signals, the agent explores alternatives, observes outcomes, and turns those outcomes into additional training data.
The method is aimed at a central problem in agent training: examples from experts are useful, but they cover only a limited range of situations. When a task changes or a new scenario appears, a model trained mainly to imitate may struggle to adapt.
Why Early Experience matters
Traditional agent training often starts with human demonstrations. The model learns by watching examples of successful behavior, then tries to reproduce similar actions. That can work when the test situation resembles the training data, but the source article says these demonstrations often fail to generalize to new problems.
Reinforcement learning offers another path, but it depends on clear reward signals. The source notes that such signals are often missing in real-world environments. Early Experience is presented as a middle ground between imitation learning and reinforcement learning: it gives the agent more useful experience without requiring a separate outside evaluation for every action.
In this setup, the agent does not merely copy expert moves. It tries different actions, watches what happens, and uses those results as learning signals. That makes the training process more self-directed while still remaining connected to concrete task outcomes.
Two ways agents learn from themselves
The researchers developed two main techniques for Early Experience. The first is implicit world modeling. In this approach, an agent learns to predict the result of an action before or as part of training. If it clicks on a website, for example, it learns to anticipate the next page. Those predictions then become targets for training.
The second technique is self-reflection. Here, the agent compares its own actions with expert moves and generates natural language explanations for why the expert action was better. In an online shopping task, for instance, the agent might explain that a more expensive item exceeded the budget.
Both approaches use the agent's own behavior and the consequences of that behavior. The important point is that the agent is not waiting for a hand-designed reward at every step. It is extracting useful information from its own attempts.
Testing across different tasks
The team evaluated Early Experience in eight different environments. These included website navigation, simulated household chores, scientific experiments, multi-step tool use, and complex planning tasks like travel arrangements.
The experiments used three relatively small language models: Llama-3.1-8B, Llama-3.2-3B, and Qwen2.5-7B. Across all tasks, both Early Experience methods outperformed standard training approaches. On average, success rates rose by 9.6 percentage points, while performance in new scenarios improved by 9.4 percentage points.
The gains were strongest on harder problems. In travel planning, self-reflection improved results by up to 15 percentage points. In online shopping, implicit world modeling raised scores by as much as 18.4 percentage points.
Those results matter because the evaluated tasks are not all the same kind of problem. Some involve websites, others involve planning, tools, experiments, or simulated physical work. The source describes improvements across all tasks, which suggests the method was not limited to one narrow environment in these tests.
How it connects to reinforcement learning
The researchers also tested whether Early Experience could help when reinforcement learning was available. Some environments do provide reward signals, so the team trained models using different methods and then put all of them through the same reinforcement learning process.
The result was clear in the source article: models that began with Early Experience training consistently performed better after reinforcement learning than the others. In some cases, the performance gap became larger as reinforcement learning continued.
That makes Early Experience useful in two ways. It can train stronger systems when rewards are not available, and it can prepare models to get more from reinforcement learning when rewards do exist. In practical terms, it functions as a bridge between imitation-based training and later reinforcement learning.
Scaling and data efficiency
The study also examined whether the method works beyond smaller models. Tests with models up to 70 billion parameters showed that Early Experience still produced improvements. The gains also held when using resource-efficient LoRA updates.
The team looked at the amount of expert demonstration data required as well. Early Experience stayed ahead even with less data. In some tests, using just one eighth the original number of demonstrations was enough to outperform standard training with the full dataset.
That finding is important because expert demonstrations can be limited by the scenarios they cover. If an agent can learn more from fewer examples by testing actions and reasoning about outcomes, the training process may become more adaptable. The source also says this aligns with earlier studies showing that a small number of examples can be enough to reach competitive results.
Early Experience does not remove the need for careful evaluation, but it changes where some of the learning signal comes from. The agent's own actions become part of the training material. For language agents that need to operate across websites, tools, planning tasks, and changing environments, that is the core idea behind the method.