The Decoder July 20, 2024 TERMINATOR

Weaker AI models may help stronger ones reason better

Researchers from Shanghai Jiao Tong University, Fudan University, the Shanghai AI Laboratory, and the Generative AI Research Lab developed a weak-to-strong learning approach for improving AI reasoning. The method lets stronger AI models refine training data with guidance from weaker models, improving results on math tasks without depending on human input or even stronger models.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

The story mildly leans Terminator because it describes AI systems improving stronger models' reasoning with less human supervision, in a superintelligence context.

Weaker AI models may help stronger ones reason better

A new training approach suggests that weaker AI models can help stronger AI models improve their reasoning. The idea comes from researchers from Shanghai Jiao Tong University, Fudan University, the Shanghai AI Laboratory, and the Generative AI Research Lab, who developed a method designed to push stronger systems closer to their full potential.

The work focuses on a central problem in advanced AI training: as models become more capable, it becomes harder for humans to provide the kind of feedback needed to keep improving them. The researchers frame this challenge in the context of superintelligence, the goal of building AI systems that surpass human cognitive abilities.

Why human feedback becomes a bottleneck

Current AI training still depends heavily on human involvement. Humans help shape feedback, guide data quality, and evaluate outputs, but that model has limits when the systems being trained begin to operate beyond ordinary human skill in a target area.

The researchers address this by introducing a "weak-to-strong" learning approach. Instead of asking humans, or an even stronger model, to guide a powerful model, the method uses weaker models as a source of supervision and contrast.

That may sound counterintuitive. A weaker model is less capable by definition, so it cannot simply teach the stronger model better answers in the usual sense. The point is different: the strong model can use the weak model's outputs, including its mistakes, as material for improving its own training data and preferences.

This is why the approach matters. It explores whether a capable model can become better by refining data on its own, using weaker models as a starting signal rather than as final authorities.

How the weak-to-strong method works

The researchers describe a two-step training process. Both steps are aimed at helping the strong model improve without direct reliance on human input or a still stronger model.

In the first step, the team combines data from weak models with data from strong models through "in-context learning", described in the source as generating with examples. The weak models named in the work are Llama2-7b, Gemma-2b, and Mistral-7b. The strong models are Llama2-70b and Llama3-70b.

This stage is used to carefully select data sets for later supervised fine-tuning. In plain terms, the strong model is not merely trained on weak data as-is. It helps process and refine the material so the later training data is more useful.

The second step uses preference optimization. Here, the strong model improves further by learning from the weak model's mistakes. Those errors become signals that help the stronger system distinguish better reasoning paths from weaker ones.

The structure can be summarized simply:

Weak and strong model data are combined through in-context learning.
The resulting data is selected for supervised fine-tuning.
The strong model then learns from weak-model mistakes through preference optimization.
The goal is stronger reasoning without depending on human input or a more powerful model.

What the early results showed

The researchers tested the approach on GSM8K, a benchmark for math tasks. The source article reports that initial experiments performed much better than simple fine-tuning on weak data.

In one result, the strong model's performance, supervised only by the weak Gemma 2b model, increased by up to 26.99 points (Stage I). That gain came from the first stage of the process, where the system used the weak model's guidance while refining training data.

The second stage added another improvement. Preference optimization (Stage II) achieved a further 8.49-point improvement.

According to the researchers, the combined method outperforms even the fine-tuning on gold standard solutions. That claim is important because it suggests the method is not just a workaround for missing human-labeled answers. In the reported experiments, the model's self-refined data pipeline produced stronger results than a conventional high-quality fine-tuning reference point.

Why self-refined training data matters

The broader implication is about how AI systems may continue improving when standard training methods begin to run out of room. The researchers argue that their method allows a strong model to continually improve its mathematical skills by refining its training data on its own.

This is especially relevant for tasks that do not yet have predefined human or AI solutions. In those settings, conventional fine-tuning can fail because there may be no clear set of approved examples to train on. Human oversight may also reach its limits when the task becomes too complex or too far beyond available expertise.

The weak-to-strong approach offers another path. A weaker model can generate imperfect material, and the stronger model can use that material to improve the training process rather than accepting it uncritically.

That does not mean weaker models suddenly become better teachers than humans in every setting. The source article does not make that claim. What it does show is narrower and still significant: for the reported math-task experiments, weak-model guidance plus strong-model self-refinement produced large improvements over simple fine-tuning on weak data.

A possible direction for AI progress

The source article also notes that former OpenAI researcher Andreji Karpathy sees AI models optimizing training data as a possible next driver of AI progress. The idea is that AI could help develop the "perfect data set" for AI.

That phrase captures the central shift. Instead of treating training data as something only people prepare, the weak-to-strong approach treats data improvement as part of the model's own learning loop.

If stronger AI models can refine weak guidance into better training material, then progress may depend less on direct human labeling and more on how well models can evaluate, filter, and learn from imperfect signals. Based on the reported experiments, weak models may have an unexpected role in that process: not as final experts, but as useful sources of contrast for stronger systems learning to reason better.