MIT Tech Review AI June 10, 2025 NEUTRAL

Pentagon cuts put AI weapons testing under pressure

Secretary of Defense Pete Hegseth has cut the Office of the Director of Operational Test and Evaluation in half. The move could make it easier to field AI and weapons systems faster, while reducing the independent testing capacity meant to catch safety and performance problems.

The Pentagon is sharply reducing the office responsible for independent testing of weapons and AI systems, a move that could reshape how quickly new military technology moves from experimentation to deployment.

On May 28, Secretary of Defense Pete Hegseth announced cuts to the Office of the Director of Operational Test and Evaluation. The office is being reduced to about 45 staff members, down from 94, and its director is being fired and replaced. The office was given seven days to carry out the changes.

What Is Being Cut

The Office of the Director of Operational Test and Evaluation was created in the 1980s after Congress responded to criticism that the Pentagon was fielding weapons and systems that did not perform as safely or effectively as advertised.

Its role is not to develop weapons or AI systems. Its role is to test them independently before they are fielded at scale. That makes the office a critical checkpoint between a company’s claims and military use.

Missy Cummings, a former fighter pilot for the US Navy and now a professor of engineering and computer science at George Mason University, described the office as “the last gate before a technology gets to the field.” The military can run small experiments without going through the office, but systems that are fielded at scale must be tested.

Cummings said the office has historically served a bipartisan purpose by helping reduce “waste, fraud, and abuse.” In practical terms, that means checking whether contractors’ technology works as promised and whether systems can survive more rigorous safety testing.

Why AI Makes The Decision More Consequential

The timing matters because the Pentagon is experimenting with putting AI into everything. Mainstream companies such as OpenAI are now more comfortable working with the military, while defense companies are winning major contracts tied to AI systems.

Anduril, for example, announced a $2.5 billion funding round last Thursday, doubling its valuation to over $30 billion. Anduril and Anthropic have launched AI applications for military use. Neither responded to questions in the source article about whether they pushed for or approve of the cuts. A representative for OpenAI said the company was not involved in lobbying for the restructuring.

The issue is not simply that artificial intelligence is being used by the military. The military was experimenting with artificial intelligence long before the current AI boom, especially with computer vision for drone feeds. Defense technology companies have also been winning major contracts across multiple presidential administrations.

What is different now is the Pentagon’s interest in ambitious pilots for large language models. These systems are relatively nascent, and the source article notes that they produce hallucinations and errors by their nature. That makes independent evaluation of accuracy and reliability especially important before broad use.

The Argument For Speed

Hegseth framed the restructuring as part of a push toward “reducing bloated bureaucracy and wasteful spending in favor of increased lethality.” He said the cuts would “make testing and fielding weapons more efficient,” saving $300 million.

That argument has obvious appeal inside a system where new technologies can take time to reach the field. Mark Cancian, a senior advisor at the Center for Strategic and International Studies who previously worked at the Pentagon in collaboration with the testing office, acknowledged that those trying to get new technologies onto the battlefield sometimes complain that the office slows adoption.

For defense technology companies, a smaller testing office may mean a faster path from contract to real-world military use. The source article describes the move as beneficial to “AI for defense” companies seeking quicker adoption.

But speed and scrutiny are not the same thing. A system can move faster because unnecessary steps were removed, or it can move faster because checks that found real problems were weakened. The concern raised by experts in the source article is that the second scenario is possible here.

The Safety Risk

Cummings warned that the restructuring could clear the way for faster adoption while increasing the chance that new systems will not be as safe or effective as promised.

“The firings in DOTE send a clear message that all perceived obstacles for companies favored by Trump are going to be removed,” she says.

Cancian expressed a more measured but still serious concern. “The cuts make me nervous,” he said. “It’s not that we’ll go from effective to ineffective, but you might not catch some of the problems that would surface in combat without this testing step.”

That is the central issue. Testing does not guarantee that every defect will be caught. But according to Cancian, the office frequently uncovers errors that were missed earlier. That matters most when the military is adopting a new type of technology, including generative AI.

Systems that work well in a lab can behave differently in realistic conditions. The Operational Test and Evaluation office is where those gaps are supposed to become visible before systems are used at scale.

What Comes Next

The full impact of the cuts is not yet clear from the source article. A smaller staff does not automatically mean every test will fail or every system will pass too easily. But the change reduces the capacity of an office whose core purpose is independent scrutiny.

That reduction comes just as AI systems are becoming more prominent in military planning, contracting, and experimentation. The Pentagon is interested in large language models and other AI tools, while companies that build AI applications for defense are trying to move quickly.

The question is whether the Pentagon can accelerate adoption without weakening the tests that reveal whether systems are reliable, accurate, and safe enough for real military use. Based on the concerns raised by Cummings and Cancian, the risk is that faster fielding may come with fewer chances to catch problems before they matter most.