Can reasoning models push LLM scaling toward new science?

OpenAI CEO Sam Altman says the old approach of making pre-trained language models larger with more data is no longer improving as effectively. He argues that combining much bigger pre-trained models with large reasoning models could help AI systems move toward new scientific knowledge.

WTF Index TERMINATOR
◄ Terminator 2 Idiocracy 0 ►

The story focuses on reasoning models making AI more capable and potentially advancing science, with little evidence of social degradation or direct harm.

Can reasoning models push LLM scaling toward new science?

OpenAI CEO Sam Altman is framing the next phase of AI development around a shift in how large language models improve. The old path of training larger LLMs on more data is, in his view, running into limits. The newer bet is to combine broad pre-trained models with reasoning capabilities that perform strongly on tasks with clear right or wrong answers.

Why pre-training alone is losing momentum

Pre-trained language models have been the core engine of recent AI progress, but the source article says they no longer scale as effectively as they once did. Altman now describes pre-training as the "old world," reflecting a broader industry view that simply adding more data and size is not delivering the same gains.

That does not mean pre-training is being discarded. Instead, OpenAI appears to be looking for a way to keep the broad general knowledge of LLMs while adding a more targeted form of competence. The question is whether a larger base model can be paired with reasoning methods that make the system better at solving difficult problems.

This matters because general-purpose LLMs and specialized reasoning systems offer different strengths. LLMs are valued for their breadth. Reasoning models, as described by OpenAI, are being optimized for areas where answers can be evaluated more directly, such as programming and mathematics.

What large reasoning models are meant to solve

OpenAI calls these systems "large reasoning models," or LRMs. Altman described them as the most significant development in the field in the past year. They are trained or optimized with reinforcement learning, especially for tasks where the system can be rewarded for reaching a correct answer.

Altman says reasoning models provide "an incredible new compute efficiency gain." He also says OpenAI can "get performance on a lot of benchmarks that in the old world we would have predicted wouldn't have come until GPT 6" with "models that are much smaller." In plain terms, the new method may deliver certain benchmark gains without requiring the same kind of brute-force scale.

But the improvement is not universal. Altman noted that "when we do it this new way, it doesn't get better at everything. We can get it better in certain dimensions." That is the central tradeoff: the method can make models stronger in particular areas, but it does not automatically improve every capability.

The open question: combining breadth and precision

The main technical question is whether OpenAI can merge the wide capabilities of LLMs with the specialized accuracy of LRMs. Altman suggests that pre-training "a much bigger model" and adding reasoning capabilities could produce "the first bits or sort of signs of life on genuine new scientific knowledge." That is a larger ambition than better benchmark scores or improved coding assistance.

Altman draws a distinction between today’s strengths and the harder frontier ahead. He says the latest model "can program unbelievably well" but is "not so good at going to invent totally new algorithms… or new physics or new biology - and that's the thing I think you'll get with the next two orders of magnitude." The implication is that programming performance is an important signal, but not the final goal.

The programming results cited in the source show why OpenAI sees reinforcement learning as promising. The first reasoning model o1 ranked as "a top 1 millionth competitive programmer in the world." By December, o3 had become "the 175th best competitive programmer in the world." Internal testing now shows "around 50" place, and Altman says "maybe we'll hit number one by the end of this year."

Those examples are limited to competitive programming, but they explain the logic behind the broader strategy. If reinforcement learning can rapidly improve performance in a domain with clear answers, OpenAI wants to know whether similar gains can eventually support broader reasoning and scientific discovery.

Open source returns to the discussion

The source article also notes that Altman reiterated OpenAI’s intention to return to open-source practices. He gave few specifics, but said, "We're going to do it," and added that society seems "willing to take the tradeoffs, at least for now."

Altman also says OpenAI has made significant progress on safe and robust models suitable for open-source applications. According to him, these models are not always used as intended, but they work as designed most of the time. The source does not specify which models might be released or when.

This question has become more visible after Chinese company Deepseek released its R1 reasoning model as open source. The source says R1 achieved similar performance to OpenAI's o1 model. That release has increased scrutiny of OpenAI's more restrictive approach, which the company has justified as necessary to prevent misuse.

What to watch next

The immediate issue is not whether LLM scaling is over. It is whether the next gains come from a different mixture: larger pre-trained models for breadth, plus reasoning systems for targeted accuracy. Altman’s comments suggest OpenAI sees that combination as the path beyond the current limits of pre-training.

The claim to watch is the strongest one: that this combination could show early signs of genuine new scientific knowledge. The source does not present evidence that this has already happened. It presents Altman’s view that a larger model combined with stronger reasoning may be the route toward it.

For now, OpenAI’s clearest example is programming. The reported movement from o1 to o3 and then to internal testing "around 50" place shows why the company is emphasizing reinforcement learning and LRMs. Whether that kind of progress can extend from competitive programming to new algorithms, new physics or new biology remains the unresolved question.