MIT Tech Review AI May 14, 2025 TERMINATOR

How AlphaEvolve turns LLM code into stronger algorithms

Google DeepMind's AlphaEvolve uses Gemini 2.0 models to generate, test, score, and refine code until it finds stronger algorithms. The tool has improved problems in data center management, chip power use, Gemini training, and several areas of mathematics and computer science.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

AlphaEvolve points mildly toward more powerful and autonomous AI-driven optimization, but the story is framed around productive algorithm discovery rather than harm or loss of control.

How AlphaEvolve turns LLM code into stronger algorithms

Google DeepMind has introduced AlphaEvolve, an AI tool that uses large language models to search for better algorithms. Its central idea is simple to state but powerful in practice: generate code, measure whether it works, keep the strongest versions, and repeat until no better result appears.

The company says the approach has already moved beyond theory. AlphaEvolve has produced improvements for data center management, chip design, Gemini training, and long-running problems in math and computer science.

What AlphaEvolve does differently

Large language models can write code, but their output is not reliably correct or efficient on its own. AlphaEvolve is designed around that weakness. Instead of accepting a model's first answer, it turns coding into an iterative search process.

The system uses the Gemini 2.0 family of large language models to create candidate programs for a task. It then runs those programs and scores them against relevant measures, such as whether they return the correct result or whether they improve on earlier solutions.

Weak suggestions are discarded. Stronger ones are sent back for improvement. Over many rounds, the system searches for algorithms that are more accurate or more efficient than existing human-written approaches.

Pushmeet Kohli, a vice president at Google DeepMind who leads its AI for Science teams, described the tool as a kind of advanced coding system because it can produce results that may not have been known before. The point is not just to generate code, but to find a working algorithm that scores well on a defined task.

How the search process works

AlphaEvolve can be prompted with a problem description and supporting hints, including prior solutions. Gemini 2.0 Flash, the smallest and fastest version of Google DeepMind's flagship LLM, generates multiple blocks of code in response.

Those candidates are evaluated by computer. If a piece of code fails, runs slowly, or does not improve on earlier work, it can be removed from the search. If it performs well, AlphaEvolve asks Gemini to refine it further.

The system can also reintroduce earlier solutions to avoid getting trapped in an unproductive path. When needed, it can call on Gemini 2.0 Pro, the most powerful of Google DeepMind's LLMs, while continuing to use the faster Flash model to generate many possibilities.

The process continues through rounds of generation, scoring, and regeneration. It stops when Gemini no longer produces anything better than the best result already found.

This matters because AlphaEvolve is not searching directly for a final answer in every case. It is searching for algorithms that can produce useful answers. Jakob Moosbauer, a mathematician at the University of Warwick in the UK, said this makes the method applicable to a wide range of problems.

From matrix multiplication to data centers

AlphaEvolve follows earlier Google DeepMind systems focused on algorithm discovery. In 2022, AlphaTensor found a faster way to solve matrix multiplications, beating a record that had stood for more than 50 years. In 2023, AlphaDev discovered faster ways to carry out basic calculations performed by computers trillions of times a day.

FunSearch, introduced in late 2023, replaced game-playing AI with LLMs that generate code. Because large language models can work across a wider range of tasks, FunSearch could address more kinds of problems than systems trained around one type of game. It was used to crack a famous unsolved problem in pure mathematics.

AlphaEvolve is the next generation of that idea. Where FunSearch produced short snippets of code for specific problems, AlphaEvolve can produce programs that are hundreds of lines long. That expands the range of problems it can address.

One major test involved matrix multiplication, a basic computation used in applications from AI to computer graphics. The team gave AlphaEvolve a description of the problem and an example of a standard algorithm. The tool then produced new algorithms that could calculate 14 different sizes of matrix faster than any existing approach.

It also improved on AlphaTensor's result for multiplying two four-by-four matrices. AlphaEvolve scored 16,000 candidates suggested by Gemini to find the winning solution. According to Matej Balog, a researcher at Google DeepMind who leads the algorithm discovery team, that was still more efficient than AlphaTensor.

The difference was not only speed. AlphaTensor's solution worked when a matrix was filled with 0s and 1s. AlphaEvolve's approach also works with other numbers.

Manuel Kauers, a mathematician at Johannes Kepler University in Linz, Austria, said the matrix improvement is likely to have practical relevance. Moosbauer also said the new algorithm could speed up computations in practice.

Where AlphaEvolve has already been applied

Google DeepMind tested AlphaEvolve on more than 50 different types of well-known math puzzles. These included problems in Fourier analysis, the minimum overlap problem proposed by mathematician Paul Erdős in 1955, and kissing numbers, a problem introduced by Isaac Newton with applications in materials science, chemistry, and cryptography.

Across those tests, AlphaEvolve matched the best existing solutions in 75% of cases and found better solutions in 20% of cases.

The company also used AlphaEvolve on real-world systems. One result improved the software Google uses to allocate jobs across its many millions of servers around the world. Google DeepMind says the software has been running across all of Google's data centers for more than a year, freeing up 0.7% of Google's total computing resources.

AlphaEvolve also found a way to reduce the power consumption of Google's specialized tensor processing unit chips. It even produced a more efficient algorithm for managing a certain type of computation used in Gemini training, helping speed up the training of Gemini itself.

What the limits reveal

AlphaEvolve is useful when a problem can be described in code and a solution can be evaluated by computer. That is a broad category, but it is not universal.

One limitation is that it cannot be used for problems where solutions need to be judged by a person. Lab experiments that depend on interpretation are one example given by Google DeepMind.

There is also a deeper issue for research. Moosbauer noted that AlphaEvolve may produce strong results without explaining much about how it found them. That can limit the theoretical insight researchers gain, even when the final algorithm is useful.

Still, AlphaEvolve shows a practical direction for AI in science and computing. Instead of treating an LLM as a one-shot answer machine, it wraps the model in a system that tests, scores, and improves its work. For problems where success can be measured by software, that loop can turn code generation into algorithm discovery.