MIT Tech Review AI July 25, 2024 TERMINATOR

Google DeepMind AI Reaches Silver-Level Math Reasoning

Google DeepMind says AlphaProof and AlphaGeometry 2 solved four out of six problems from this year’s International Mathematical Olympiad. The result reached the equivalent of a silver medal and marks a notable step for AI systems built to handle advanced mathematical reasoning.

WTF Index TERMINATOR

◄ Terminator 1 Idiocracy 0 ►

This is mainly a capability advance in structured AI reasoning, with little direct evidence of harm or social degradation.

Google DeepMind AI Reaches Silver-Level Math Reasoning

Google DeepMind has reported a milestone for artificial intelligence in advanced mathematics: two specialized systems worked together to solve four out of six problems from this year’s International Mathematical Olympiad (IMO). The result was strong enough to match the equivalent of a silver medal.

The systems, AlphaProof and AlphaGeometry 2, point to a narrower but important direction for AI. Instead of producing fluent text, they are designed to work through problems that demand structured reasoning, proof, planning, and mathematical discipline.

Why Math Reasoning Is Hard For AI

Many AI models can generate essays and other free-form text with ease. Math problems at the level of the IMO are different. They often require a system to build abstractions, set intermediate goals, test possible paths, backtrack, and then justify the answer.

That kind of work is difficult because it is not enough to produce a plausible response. A mathematical answer must be correct, and for difficult problems the reasoning has to survive close inspection.

Katie Collins, a researcher at the University of Cambridge who specializes in math and AI but was not involved in the project, explained one core challenge: formal math can be checked more reliably, but there is less formal mathematics data online than informal natural language. Google DeepMind’s work is partly an attempt to close that gap.

How AlphaProof Uses Formal Math

AlphaProof is a reinforcement-learning-based system built to prove mathematical statements in Lean, a formal programming language. The system relies on a fine-tuned version of Gemini to translate informal math problems into formal statements that are easier for the AI to process.

That translation step matters because it turns natural-language math into a form where correctness can be evaluated more directly. It also helped create a large library of formal math problems with different levels of difficulty.

AlphaProof then works with AlphaZero, the reinforcement-learning model Google DeepMind trained to master games such as Go and chess. Together, the systems prove or disprove millions of mathematical problems. As AlphaProof solves more problems, it improves at handling harder ones.

Wenda Li, a lecturer in hybrid AI at the University of Edinburgh who peer-reviewed the research but was not involved in the project, described the automated translation of data into formal language as an important advance for the math community. The implication is not just better AI performance, but a more reliable way to check and share mathematical results.

What AlphaGeometry 2 Adds

AlphaGeometry 2 plays a different role. It is an improved version of a system Google DeepMind announced in January, and it is optimized for geometry problems involving movements of objects and equations with angles, ratios, and distances.

The system was trained on significantly more synthetic data than its predecessor. According to the source article, that extra training allowed it to take on more challenging geometry questions.

In the IMO test, the division of labor was clear. AlphaProof solved two algebra problems and one number theory problem, including one described as the competition’s hardest. AlphaGeometry 2 solved a geometry question. The two combinatorics problems were not solved.

Alex Davies, a research engineer on the AlphaProof team, said AlphaProof generally performs much better on algebra and number theory than combinatorics. Google DeepMind is still trying to understand why, with the goal of improving the system.

The IMO Result And What It Means

To evaluate the systems, Google DeepMind researchers gave them the same six problems used in this year’s International Mathematical Olympiad, a prestigious competition for high school students. The systems also had to prove that their answers were correct.

Two mathematicians, Tim Gowers and Joseph Myers, checked the submissions. Each of the four correct answers received full marks, seven out of seven. That produced a total score of 28 points out of a maximum of 42.

For a human competitor, 28 points would earn a silver medal and fall just short of gold, which starts at 29 points. The source article describes this as the first time an AI system has reached medal-level performance on IMO questions.

Gowers called the result impressive and a significant jump from what had previously been possible. Myers also described the answers as a substantial advance, while pointing to open questions about scaling, speed, and whether the approach can extend to other kinds of mathematics.

Why This Could Matter Beyond The Test

The immediate achievement is narrow: two AI systems solved four difficult competition problems. But the broader significance is that the systems handled tasks that require advanced reasoning rather than only fluent language generation.

If systems like AlphaProof and AlphaGeometry 2 continue to improve, they could support new forms of human-AI collaboration in mathematics. Collins suggested that stronger math-solving AI could help mathematicians solve and invent new kinds of problems, and could also teach us more about how humans approach complex mathematics.

The result does not mean AI has mastered all advanced math. The unsolved combinatorics problems show that there are still clear limits. But the silver-level IMO performance gives researchers a concrete sign that specialized AI systems can make progress on reasoning-heavy tasks that have been difficult for current models.