AI models are no longer just assisting with routine math explanations. In recent work around high-level math problems, they are starting to produce results that mathematicians and researchers are taking seriously.
The clearest signal comes from activity around the Erdős problems, a collection of over 1,000 conjectures by the Hungarian mathematician Paul Erdős. Since Christmas, 15 problems have moved from “open” to “solved” on the Erdős website, and 11 of those solutions have specifically credited AI models as part of the process.
A surprising result from ChatGPT
Neel Somani, a software engineer, former quant researcher, and startup founder, was testing OpenAI’s new model when he found something unexpected. He pasted a problem into ChatGPT, let it think for 15 minutes, and returned to a full solution.
Somani then evaluated the proof and formalized it with Harmonic, using a tool called Aristotle. According to the source article, the solution checked out.
His goal was not simply to get an answer. As he put it, “I was curious to establish a baseline for when LLMs are effectively able to solve open math problems compared to where they struggle.” The surprise was that the latest model seemed to move that baseline forward.
The model’s reasoning process drew on advanced mathematical material, including Legendre’s formula, Bertrand’s postulate, and the Star of David theorum. It also found a Math Overflow post from 2013, where Harvard mathematician Noam Elkies had offered an elegant solution to a similar problem.
That did not make the result a simple copy. The final proof differed from Elkies’ work in important ways and gave a more complete solution to a version of a problem posed by Paul Erdős.
Why Erdős problems matter for AI
The Erdős problems have become a natural testing ground for AI-driven mathematics. They vary widely in topic and difficulty, which makes them useful for seeing where models can reason, search, connect prior work, and produce proofs that hold up under review.
The first batch of autonomous solutions came in November from a Gemini-powered model called AlphaEvolve. More recently, Somani and others have found GPT 5.2 to be notably strong at high-level math.
Somani described GPT 5.2 as “anecdotally more skilled at mathematical reasoning than previous iterations.” That kind of wording is careful, but the pattern is becoming harder to dismiss: more problems are being solved, and AI tools are increasingly visible in the work.
Terence Tao has offered a more nuanced view on his GitHub page. He counted eight different problems where AI models made meaningful autonomous progress on an Erdős problem, along with six other cases where progress came from locating and building on previous research.
That distinction matters. A model that independently advances a proof is doing one kind of work. A model that finds relevant prior research and helps extend it is doing another. Both can be valuable, but neither means AI systems are doing mathematics entirely on their own.
Formalization is becoming central
One reason these developments are being taken seriously is the growing role of formalization. In mathematics, a proof must be checked carefully, and formalization makes that process easier to verify and extend.
Formalization is labor-intensive. It does not require AI, and it does not even require computers in principle, but automated tools are making it more practical.
Lean, an open source proof assistant developed at Microsoft Research in 2013, has become widely used in the field for formalizing proof. Tools such as Harmonic’s Aristotle aim to automate much of that work.
This matters because AI-generated mathematical reasoning can look convincing while still needing rigorous review. A formal proof process gives researchers a way to test whether a proposed solution actually holds together.
For Harmonic founder Tudor Achim, the recent jump in solved Erdős problems is not the only important point. The bigger signal is that respected mathematicians and computer science professors are using these tools. As Achim said, “I care more about the fact that math and computer science professors are using [AI tools].” He added, “These people have reputations to protect, so when they’re saying they use Aristotle or they use ChatGPT, that’s real evidence.”
What this changes, and what it does not
The source article is careful about the limits of the moment. This is not evidence that AI systems can handle mathematics without human intervention. The examples described still involve human testing, evaluation, formalization, and judgment.
At the same time, the role of large language models is becoming harder to ignore. They can attempt problems at scale, search through prior work, suggest proof paths, and help with formal reasoning workflows.
On Mastodon, Tao suggested that the scalable nature of AI systems makes them “better suited for being systematically applied to the 'long tail' of obscure Erdős problems, many of which actually have straightforward solutions.” He continued: “As such, many of these easier Erdős problems are now more likely to be solved by purely AI-based methods than by human or hybrid means,”
That framing helps explain why the Erdős problems are such a revealing case. Some may be obscure not because they are impossibly hard, but because human attention is limited. AI systems can be applied broadly across that long tail, surfacing opportunities where a solution may be within reach.
The result is not a clean replacement of mathematicians by machines. It is a shift in the workflow of discovery: models propose, search, connect, and formalize; humans evaluate, interpret, and decide what counts as meaningful progress.
For now, the most important development may be practical rather than philosophical. AI tools are beginning to help with real high-level math problems, and the people best placed to judge that work are starting to use them.