Three Ways AI Agents Get Smarter Without Getting Bigger

Researchers from the National University of Singapore, Princeton, and the University of Illinois Urbana-Champaign found that AI agents improve most through better data, better training algorithms, and better reasoning strategy. Their DemyAgent-4B model shows that a carefully trained 4-billion-parameter system can compete with models up to 32 billion parameters.

Three Ways AI Agents Get Smarter Without Getting Bigger

Bigger AI models are not always the smarter choice. Researchers from the National University of Singapore, Princeton, and the University of Illinois Urbana-Champaign identified three factors that can make AI agents perform far better: data quality, algorithm design, and reasoning strategy.

Their central finding is direct: a carefully trained 4-billion-parameter model can match or even beat competitors with up to 32 billion parameters. That matters because it shifts attention away from raw model size and toward how an AI agent learns, reasons, and decides when to use tools.

Real Training Data Matters Most

The strongest signal in the research was the quality and type of training data. The team compared models trained on authentic learning trajectories with models trained on artificial data, where intermediate reasoning steps were replaced by tool outputs.

The difference was large. On AIME math benchmarks, a 4-billion-parameter model trained on real data reached 29.79 percent accuracy. The same model trained on synthetic data scored under 10 percent.

The reason is not simply that real data contains better answers. According to the source, real data preserves the whole workflow an agent uses while solving a task. That includes pre-tool analysis, guided execution, error correction, and self-reflection.

Those links are important because AI agents do not only generate text. They may decide when to call a tool, how to interpret the result, and whether their previous step was useful. Synthetic data can miss that chain of decisions, leaving the model with outputs but less understanding of the path that produced them.

Diversity also helped. A mixed dataset of 30,000 examples from math, science, and programming accelerated learning. The AI reached 50 percent accuracy after just 150 training steps, while a math-only dataset needed 220 steps to reach the same benchmark.

GRPO-TCR Improves the Learning Process

The second factor was algorithm design. The researchers tested three algorithm variants to see which learning structure produced the strongest results.

The best-performing approach was called GRPO-TCR. It combines three elements: token-level scoring, broader clipping for more exploration, and a reward setup that discourages overly long answers.

Token-level scoring means the system evaluates smaller chunks of the answer rather than relying only on broader units such as sentences. In this research, that mattered. Token-based scoring outperformed sentence-based methods by about 4 percent.

The optimized approach achieved 70.93 percent accuracy on one math benchmark and 68.13 percent on another. The source also notes that, unlike traditional reinforcement learning, agents can improve both exploration and precision at the same time through tool interactions.

That distinction is important for AI agents because tool use changes the learning problem. The agent is not only choosing words. It is also deciding when outside information or computation should enter the process, then folding that result back into its reasoning.

Better Agents Think More Before They Act

The third factor was reasoning strategy. The researchers identified two broad styles: reactive and deliberative.

Reactive agents think briefly and call tools often. Deliberative agents spend more time reasoning and use tools less frequently. In the study, the deliberative pattern performed better.

Models using the deliberative strategy consistently achieved over 70 percent success rates in tool use. Reactive models performed poorly because their frequent tool calls were often ineffective or wrong.

The point is not that tools are bad. The finding is that tool use has to be selective and purposeful. An agent that calls tools rapidly without enough analysis may create more chances for error, while an agent that thinks longer can make fewer but better tool decisions.

The research also found a limitation in current long-chain-of-thought models. Even though they are optimized for extended thinking, they tend to avoid tool calls entirely and rely only on internal reasoning processes. That suggests long reasoning alone is not enough; the agent also needs to integrate tools at the right moments.

DemyAgent-4B Shows the Payoff

Applying the three findings, the researchers built DemyAgent-4B with just 4 billion parameters. Its benchmark results placed it among much larger competitors.

  • 72.6 percent on AIME2024
  • 70 percent on AIME2025
  • 58.5 percent on GPQA-Diamond science tests
  • 26.8 percent on LiveCodeBench-v6 programming benchmarks

Those results put DemyAgent-4B among competitors with 14 to 32 billion parameters. The takeaway is not that parameter count no longer matters. It is that model size is only one part of agent performance.

For AI agents, training design can carry much of the burden. Real learning trajectories teach the model how reasoning unfolds. GRPO-TCR gives the learning process more useful signals. A deliberative strategy helps the agent decide when a tool call is worth making.

The researchers have released their training data and model weights for others to use and build on. That could make the findings easier to test, compare, and extend in future AI agent work.

What This Means for AI Agent Development

The research points toward a practical direction for building smarter AI agents. Instead of only increasing parameter counts, developers can focus on the structure of the training process.

That means collecting data that shows complete reasoning workflows, not just final answers. It means choosing algorithms that reward useful intermediate decisions. It also means encouraging agents to think before acting, especially when tools are available.

The broader implication is simple: smarter AI agents may come from better learning discipline, not just larger models. DemyAgent-4B is presented as evidence that a smaller model, trained with the right data, algorithm, and reasoning strategy, can compete far above its size class.