Self-improving AI has a basic constraint: the system may change parts of itself, but the improvement mechanism is often fixed by humans. Researchers at Meta, the University of British Columbia, and other institutions are testing a way around that limit with hyperagents, AI systems designed to optimize both a task solver and the machinery that creates better versions of it.
The result is called DGM-Hyperagents (DGM-H). In the experiments described in the source, the approach improved across several task areas and showed signs that some improvement strategies can carry over into unfamiliar domains.
What makes a hyperagent different
A hyperagent combines two components inside one editable program. One component handles a task, such as evaluating a scientific paper or designing a reward function for a robot. The other component modifies the whole agent and creates new variants.
That shared code structure matters. Because both parts live in the same program, the improvement component can also rewrite itself. The system is not only trying to perform better; it is also trying to become better at finding ways to improve.
This addresses a ceiling in earlier self-improving systems. If the human-written improvement mechanism stays unchanged, the system can only optimize within boundaries that were set in advance. Hyperagents make that meta-level subject to optimization too.
Why the Darwin Gödel Machine was not enough
DGM-H builds on the Darwin Gödel Machine (DGM), a method that had already shown self-improvement in a coding agent. In that setup, an agent generates variants of its own code, tests them, and stores successful versions in an archive. The archive then becomes a set of stepping stones for future refinement.
That works naturally in programming because a better coding agent can also write better self-modifications. But the same link does not automatically exist in other domains. An agent that becomes stronger at paper review, for example, does not necessarily become stronger at rewriting its own code.
According to the paper cited in the source article, the original DGM reaches nearly zero performance on tasks beyond programming unless it is manually adjusted. DGM-H keeps the evolutionary archive but removes the fixed meta-agent limitation. The improvement process itself can evolve.
Where DGM-H improved
The team tested DGM-H across four areas, beginning with a simple hyperagent whose task agent used only a single language model call. The results showed gains in several very different settings.
- Coding: On the Polyglot coding benchmark, DGM-H rose from 0.084 to 0.267, matching the original DGM even though it was not built specifically for coding.
- Paper review: In a task where the agent predicts whether an AI conference will accept or reject a paper, test performance increased from 0.0 to 0.710. Several top agents beat a static baseline of 0.630.
- Robotics reward design: In a physics simulation involving a four-legged robot, performance climbed from 0.060 to 0.372. According to the paper, DGM-H almost always produced reward functions that led the robot to jump instead of staying in the local optimum of standing upright.
Ablation studies pointed to the importance of both main pieces. When the meta-agent was fixed, there was little or no improvement. When the archive for open-ended exploration was removed, progress stalled in a similar way.
Transfer matters more than a single benchmark
The most notable finding in the source concerns transfer between domains. Hyperagents trained on paper review and robotics were later used in a new domain: evaluating Olympiad math solutions.
Using the imp@50 metric introduced by the team, which measures performance gain after 50 iterations, transfer hyperagents reached 0.630. Initial agents and transfer agents from the original DGM remained at 0.0.
That result suggests DGM-H is not only learning isolated task tricks. It appears to learn general self-improvement strategies that can work in unfamiliar territory. The source says this difference is statistically significant. Across multiple rounds, transfer agents also showed a higher median, though that gap was no longer statistically significant.
The system built supporting tools on its own
A closer look at the code changes showed DGM-H creating infrastructure without explicit instructions. It built a performance tracker that stores data across generations and calculates improvement trends. It also developed persistent memory that does more than log results; it synthesizes insights from them.
Gen55 has best accuracy but is too harsh. Gen64 improved balance but lower accuracy. Need to combine gen55’s critical reasoning with gen64’s balance.
In paper review, the system noticed when its predictions became skewed, such as when 94 percent of all ratings were returned as “Accept,” and corrected the issue. In robotics, it gradually assembled an internal knowledge base covering valid environment variables, constraints, and scaling heuristics, removing compilation errors along the way.
Early experiments also allowed the system to adjust its own selection logic. It independently found strategies that balanced proven solutions with attempts at new variants. These strategies outperformed random selection, but they still did not match carefully hand-designed mechanisms.
Safety limits remain part of the story
The experiments ran in sandboxed environments with limited resources, restricted internet access, and human oversight. Even so, the researchers warned that those safeguards may face limits as self-improving systems become more capable.
The source highlights two concerns. Systems could evolve faster than humans can verify them. They could also exploit weaknesses in evaluations, appearing stronger on paper without actually improving at the real task.
There are technical limits too. The system works with a fixed task distribution and cannot modify the outer optimization loop. The code is available on GitHub, according to the source.
The broader context is that self-improvement is becoming a more visible AI research theme. The source notes that Chinese AI company MiniMax shipped M2.7, a model that reportedly improved its own training process across more than 100 autonomous rounds. It also says OpenAI reported that Codex 5.3 significantly sped up its own development.