MIT Tech Review AI April 11, 2025 NEUTRAL

How generative AI is changing US military intelligence work

US Marines tested generative AI during a Pacific deployment to help process open-source intelligence faster than manual methods. The experiment showed clear efficiency gains, but experts warn that errors, subjective sentiment analysis, and manipulated public data remain serious risks.

Generative AI is moving from office productivity into military intelligence work. During a Pacific deployment, US Marines used the technology to help translate, summarize, sort, and interpret large volumes of open-source information.

The test points to a broader shift inside the Pentagon: AI is no longer only being explored for drones, autonomous vehicles, or computer vision. It is also being used to help analysts make sense of the flood of data that can shape military decisions.

A Pacific deployment became an AI test bed

For much of last year, about 2,500 US service members from the 15th Marine Expeditionary Unit sailed aboard three ships throughout the Pacific. The unit conducted training exercises in waters off South Korea, the Philippines, India, and Indonesia.

At the same time, an intelligence experiment was taking place onboard. Marines responsible for reviewing foreign intelligence and alerting commanders to possible local threats used generative AI for the first time in that work.

Two officers described using the system to process thousands of pieces of open-source intelligence. That material included nonclassified articles, reports, images, and videos collected in the countries where the unit operated.

The value was speed. Captain Kristin Enzenauer used large language models to translate and summarize foreign news sources. Captain Will Lowdon used AI to help write the daily and weekly intelligence reports sent to commanders.

Lowdon said the work still required human review. “We still need to validate the sources,” he said. But he also said commanders encouraged the use of large language models “because they provide a lot more efficiency during a dynamic situation.”

What Vannevar Labs is building for the Pentagon

The tools used by the Marines were built by Vannevar Labs, a defense-tech company founded in 2019 by veterans of the CIA and US intelligence community. In November, the Pentagon’s startup-oriented Defense Innovation Unit granted the company a production contract worth up to $99 million.

The goal of that contract is to bring Vannevar Labs’ intelligence technology to more military units. The company is part of a wider group of defense AI firms benefiting from the US military’s embrace of artificial intelligence, including Palantir, Anduril, and Scale AI.

Vannevar Labs applies existing large language models, including some from OpenAI and Microsoft, along with bespoke models of its own, to large collections of open-source intelligence. The company has been collecting that data since 2021.

The scale described in the source is substantial: terabytes of data in 80 different languages are collected every day in 180 countries. Vannevar Labs says it can analyze social media profiles and breach firewalls in countries like China to access hard-to-reach information.

The company also uses nonclassified data that is difficult to get online, gathered by human operatives on the ground, and reports from physical sensors that covertly monitor radio waves to detect illegal shipping activities.

Its models are built to translate information, detect threats, and analyze political sentiment. The results are delivered through a chatbot interface similar to ChatGPT, allowing users to ask questions about complex intelligence and receive plain-language responses.

Why the military sees an efficiency opportunity

The appeal is easy to understand. The US intelligence apparatus has faced a long-running problem: there is more data available than human analysts can reasonably process by hand.

Vannevar Labs’ chief technology officer, Scott Philips, described the company’s focus as collecting data, making sense of it, and helping the US make good decisions. The company’s tools are intended to provide information on topics ranging from international fentanyl supply chains to China’s efforts to secure rare earth minerals in the Philippines.

For the Marines in the Pacific, the system helped with practical intelligence tasks. Enzenauer used it to track foreign news reports mentioning the unit’s exercises and to perform sentiment analysis, identifying emotions and opinions expressed in text.

That work previously required more manual effort. Enzenauer said it had involved “researching, translating, coding, and analyzing the data.” She said the AI-assisted process was less time-consuming.

The test was not frictionless. The ships often had spotty internet connections, which limited how quickly the model could synthesize foreign intelligence, especially when photos or video were involved.

Even so, the military’s direction appears clear. Colonel Sean Dynan, the unit’s commanding officer, said on a call with reporters in February that heavier use of generative AI was coming and called the experiment “the tip of the iceberg.”

The risks analysts are watching

The Pentagon has said it will spend $100 million in the next two years on pilots for generative AI applications. In addition to Vannevar Labs, it is turning to Microsoft and Palantir, which are working together on AI models that would use classified data.

Critics warn that speed does not remove the risks. Heidy Khlaaf, chief AI scientist at the AI Now Institute, said the rush to use generative AI in military decision-making overlooks basic weaknesses in the technology. “We’re already aware of how LLMs are highly inaccurate, especially in the context of safety-critical applications that require precision,” she said.

Khlaaf also questioned whether human review can solve the problem. “‘Human-in-the-loop’ is not always a meaningful mitigation,” she said. If a model draws on thousands of data points, she argued, a person may not be able to determine whether the output is wrong.

Sentiment analysis is one area of concern. Khlaaf called it “a highly subjective metric that even humans would struggle to appropriately assess based on media alone.” If AI misreads hostility toward US forces, or misses hostility that exists, the military could make a misinformed decision or escalate a situation unnecessarily.

Philips said Vannevar Labs has built models specifically to judge whether an article is pro-US or not, but MIT Technology Review was not able to evaluate them. Chris Mouton, a senior engineer for RAND, tested leading models, including OpenAI’s GPT-4 and an older version of GPT fine-tuned for intelligence work, on how accurately they flagged foreign content as propaganda compared with human experts.

Mouton said the task is difficult and that AI struggled with more subtle propaganda. He also said the models could still be useful for many other analysis tasks.

The central debate is trust

Open-source intelligence can be valuable, but its reliability is contested. Mouton said open-source data can be “pretty extraordinary.” Khlaaf argued that because it comes from the open internet, it is more vulnerable to misinformation campaigns, bot networks, and deliberate manipulation.

That leaves a central question for the military. Will generative AI remain one investigatory tool among many, or will its subjective analysis become something commanders rely on in decision-making?

Mouton called that “the central debate.” The technology is accessible, fast, and easy to query in plain language. What remains unsettled is how much imperfection the military is willing to accept in exchange for efficiency.