Gemini 3.5 Flash Now Controls Screens Inside the Model

Google has integrated "Computer Use" directly into Gemini 3.5 Flash, so the model can see and operate computers, browsers, and mobile devices. The change moves a capability that was previously separate in Gemini 2.5 into the model itself, with enterprise safeguards aimed at prompt injection attacks.

WTF Index TERMINATOR
◄ Terminator 3 Idiocracy 1 ►

Putting computer-control directly inside a model increases agentic autonomy and potential misuse risk, though safeguards keep the lean moderate.

Gemini 3.5 Flash Now Controls Screens Inside the Model

Google has moved computer-control capabilities directly into Gemini 3.5 Flash, giving developers a model that can see, understand, and interact with computers, browsers, and mobile devices on its own.

The feature, called "Computer Use," was previously available only as a separate Gemini 2.5 model. With the new integration, Google is positioning Gemini 3.5 Flash as a tool for agents that can work across browser, mobile, and desktop environments.

What Changed In Gemini 3.5 Flash

The main change is where the capability lives. Instead of relying on a separate model for computer interaction, Google has built "Computer Use" directly into Gemini 3.5 Flash.

That matters because computer control is not just another text feature. A model with this capability can interpret what is visible on a screen and take actions in software environments. According to the source article, that includes computers, browsers, and mobile devices.

Google is also combining this with existing tools such as function calls, Search, and Maps. Taken together, those tools allow developers to build agents that can move through multiple kinds of digital environments while using external capabilities when needed.

The article points to two practical use cases: software testing and office automation. In both cases, the important shift is from answering questions to carrying out work inside actual interfaces.

Why Computer Use Matters For Agents

Many AI systems are strongest when information is already structured for them. Computer use pushes the model into a less tidy environment: screens, browser pages, app interfaces, and workflows that may require multiple steps.

For developers, this can make agent design more direct. If a model can see and operate an interface, it can potentially handle tasks that would otherwise need custom integration work for every tool involved.

The source does not claim that this removes the need for developer control. In fact, the opposite is clear from the safeguards Google describes. The value is not that the model can act without boundaries, but that it can operate across environments where careful boundaries are especially important.

That is why the combination of "Computer Use," function calls, Search, and Maps is notable. Each tool supports a different part of an agent workflow. Computer control handles the interface, function calls support structured actions, Search can bring in web information, and Maps can support location-related tasks.

How It Performs On OSWorld

The source article cites the OSWorld benchmark as one comparison point for computer-use performance. Gemini 3.5 Flash scores 78.4 on that benchmark.

That score puts it ahead of Gemini 3 Flash at 65.1 and GPT-5.4 mini at 72.1. GPT-5.5 is slightly higher at 78.7, while Anthropic's Opus 4.8 leads the listed group with 83.4.

The same comparison also lists Sonnet 4.6 at 78.4 and Gemini 3.1 Pro at 76.2. In that context, Gemini 3.5 Flash sits near the top of the group named in the article, matching Sonnet 4.6 and coming close to GPT-5.5.

The benchmark numbers are useful because they give a concrete way to compare models on this class of task. They do not, by themselves, describe every production concern. Computer-use agents also depend on controls, permissions, task design, and the environment in which the model is allowed to act.

The Security Problem Google Is Addressing

Giving an AI model the ability to operate a computer raises a direct security concern: prompt injection attacks. The source article says Google uses adversarial training to help guard against that risk.

Google also offers two optional enterprise safeguards. One requires user confirmation before sensitive or irreversible actions. The other automatically stops tasks when it detects indirect prompt injections.

Those safeguards point to a practical principle for computer-use agents: the model should not be treated as an unrestricted operator. Even when the model is capable of taking actions, some steps still need human approval or automatic interruption.

The article also says Google recommends sandboxing, human oversight, and strict access controls. Those recommendations are especially relevant when an agent can interact with browsers, desktop software, or mobile interfaces where mistakes can have consequences outside the model itself.

Where Developers Can Access It

The feature is available through the Gemini API and the Gemini Enterprise Agent Platform. The source article also says a Browserbase demo and a GitHub reference implementation are available.

That mix suggests Google is addressing both experimentation and enterprise deployment. The Gemini API gives developers a way to build with the capability, while the Gemini Enterprise Agent Platform points to organizational use cases.

The Browserbase demo and GitHub reference implementation give developers additional ways to examine how the feature can be used. The source does not describe their contents in detail, so the key point is simply that they are available alongside the API and enterprise platform.

For now, the core story is straightforward: Gemini 3.5 Flash can now handle computer interaction directly, and Google is pairing that capability with benchmark performance claims and safeguards for risky actions. The result is a more capable foundation for agents that need to work inside real software environments.