The Decoder January 23, 2025 TERMINATOR

More reasoning time may help AI resist attacks, but opens new risks

An OpenAI study found that giving AI models more time to process information can make them harder to manipulate. But the same reasoning behavior also creates new attack surfaces, including "think less" and "nerd sniping" tactics.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

The story centers on AI security robustness but also new attack surfaces that could make reasoning models easier to manipulate.

More reasoning time may help AI resist attacks, but opens new risks

Giving an AI model more time to work through a request can make it more resistant to manipulation, according to a recent OpenAI study. The finding points to a practical benefit of reasoning-focused systems: extra processing time can improve robustness even without special training.

But the study also shows that longer reasoning is not a simple security fix. While testing o1-preview and o1-mini, researchers found new vulnerabilities that specifically target the way these models use, shorten, or waste their thinking time.

Why reasoning time matters for AI security

The study tested how AI models respond to manipulation attempts when they are given more time to process information. Across several attack methods, the general pattern was encouraging: models tended to become more resistant when they had extra processing time.

That matters because many AI safety failures are not just about what a model knows. They also involve how a model handles pressure, misleading instructions, adversarial examples, or prompts designed to push it away from intended behavior.

OpenAI researchers tested various attack methods, including many-shot attacks, soft token attacks, and human red-teaming. The reported result was that, across these approaches, models generally became more robust against manipulation when they had more time to think.

The important detail is that this improvement appeared without special training. In plain terms, the models were not necessarily being retrained to defend against every tactic. Instead, their additional processing time itself appeared to help them resist some attempts to steer them incorrectly.

The tradeoff: more thinking can also create openings

The findings were not entirely positive. In some cases, more processing time made models more vulnerable, especially when the attacker’s task required a minimum amount of computing time for the model to solve it.

This creates a more complicated security picture. A model that spends longer reasoning may be better at rejecting some manipulation attempts, but that same ability can be exploited if an attacker can design a task that benefits from the model’s extra compute.

For developers and AI operators, the lesson is that reasoning time is not just a performance setting. It can become part of the security boundary. How long a model thinks, when it stops, and what it spends effort on may all affect whether an attack succeeds.

Two new attacks target how models think

The researchers uncovered two attack types aimed directly at reasoning behavior. The first is called "think less." As the name suggests, it attempts to cut short the model’s processing time.

This kind of attack is concerning because it targets the very mechanism that appears to help models resist manipulation. If more thinking can improve robustness, then forcing a model to think less may weaken that defense.

The second attack comes from a different weakness: models can become stuck in what researchers describe as "unproductive thinking loops." Instead of using their extra processing time to solve the task effectively, the model spends effort on pointless calculations.

This creates the basis for "nerd sniping." Unlike a "think less" attack, which tries to rush the model, "nerd sniping" pushes the model in the opposite direction. It tricks the system into wasting time and resources on computations that do not help.

"think less" attacks try to reduce the model’s processing time.
"nerd sniping" attacks try to trap the model in useless computation.
Unproductive thinking loops can make excess processing look like careful analysis.

Why wasted reasoning is hard to detect

One of the most important concerns in the study is visibility. It is relatively easy to notice when a model does not spend enough time on a difficult task. A rushed answer can look incomplete, shallow, or inconsistent.

Excessive processing is harder to judge. A model that takes longer may appear to be analyzing the problem carefully, even if it has been drawn into a resource-draining loop. That makes "nerd sniping" especially difficult to spot from the outside.

This distinction matters for AI safety teams because the same signal can mean different things. Longer processing may indicate stronger reasoning, but it may also indicate that the model is stuck. Without a way to distinguish useful reasoning from wasted effort, monitoring becomes more difficult.

The study therefore reframes reasoning time as a double-edged feature. Extra time can improve resilience against manipulation, but it can also give attackers more room to influence how the model spends its compute.

What the study suggests about future AI defenses

The OpenAI study does not present longer reasoning as a complete solution. Instead, it shows that reasoning models bring both defensive benefits and new attack surfaces.

For future AI systems, the key challenge is not simply to give models more time. It is to make sure that the time is used productively, that attempts to shorten useful reasoning are resisted, and that attempts to waste compute can be detected.

The broader implication is clear: as AI models become better at thinking through complex tasks, security research has to follow the reasoning process itself. Attackers are not only trying to change final answers. They are also learning to manipulate how models arrive there.