How Gemini Is Pushing SIMA 2 Toward Broader Game Agents

Google DeepMind has built SIMA 2, a Gemini-based agent that can follow instructions and solve tasks across 3D virtual worlds. The work points toward more general-purpose agents and possible robot applications, but SIMA 2 still has clear limits in memory, long tasks, and control.

WTF Index TERMINATOR
◄ Terminator 2 Idiocracy 0 ►

The story points mildly toward more capable autonomous agents and possible robot applications, though the system remains experimental with clear limits.

How Gemini Is Pushing SIMA 2 Toward Broader Game Agents

Google DeepMind’s latest game-playing agent is not being trained simply to win. SIMA 2 is designed to follow human instructions, navigate 3D spaces, and work through open-ended tasks inside virtual worlds.

The company presents the system as a step toward agents that can operate beyond a web browser and, eventually, help power real-world robots. For now, the work remains experimental, but it shows how large language models are being used to make game agents more flexible.

What SIMA 2 Is Built To Do

SIMA stands for “scalable instructable multiworld agent.” Google DeepMind first demoed SIMA last year, and SIMA 2 builds on that earlier system by connecting it to Gemini, the company’s flagship large language model.

That connection matters because the researchers say Gemini gives SIMA 2 a major boost in capability. The agent can handle more complex tasks inside virtual worlds, communicate with users, ask questions, and provide updates while it works.

Unlike agents built around narrow competitive goals, SIMA 2 is trained for instruction following. A person can tell it what to do through text chat, speak to it out loud, or draw on the game screen. The agent then reads the game visually, frame by frame, and decides which keyboard and mouse actions are needed.

That makes the system different from earlier game AI milestones such as AlphaZero, which beat a Go grandmaster in 2016, and AlphaStar, which beat 99.8% of ranked human competition players at StarCraft 2 in 2019. SIMA 2 is not centered on mastering one rule-bound challenge. It is meant to operate across open-ended environments where the task comes from a human instruction.

Why Games Are Useful Training Grounds

Video games give AI researchers complex spaces where agents can practice navigation, tool use, and problem solving. Even a simple instruction may require several linked actions. Lighting a lantern, for example, can involve finding the object, moving toward it, selecting the right interaction, and completing the action in the game’s interface.

Google DeepMind research scientist Joe Marino said games have long helped drive agent research. His point is that games can compress many useful challenges into controllable virtual settings. An agent must interpret visual input, understand what the user wants, and act through a limited set of controls.

SIMA 2 was trained on footage of humans playing eight commercial video games, including No Man’s Sky and Goat Simulator 3, along with three virtual worlds created by Google DeepMind. From that training data, the agent learned how keyboard and mouse inputs correspond to in-game actions.

This approach gives SIMA 2 a broad base of examples rather than a single game-specific script. The goal is not just to repeat familiar moves, but to respond to new instructions in environments with different rules and layouts.

Gemini Adds Conversation And Trial And Error

The Gemini connection gives SIMA 2 more than a language interface. According to the researchers, it helps the agent reason through certain harder tasks and communicate while it is doing them.

Google DeepMind also tested SIMA 2 in environments it had not seen before. In one set of experiments, researchers used Genie 3, the latest version of the company’s world model, to generate environments from scratch. SIMA 2 was then placed inside those generated worlds and asked to navigate and follow instructions.

The researchers also used Gemini to create new tasks for SIMA 2. When the agent failed, Gemini generated tips that SIMA 2 used on another attempt. Repeating the same task multiple times often helped the agent improve through trial and error until it succeeded, Marino said.

This creates a possible training loop: Genie 3 can produce worlds, Gemini can generate tasks and feedback, and SIMA can practice. Marino and his colleagues hope this could become a kind of endless virtual training dojo where the agent keeps improving through guided repetition.

What Still Holds SIMA 2 Back

SIMA 2 is still an experiment, and its limits are important. The agent struggles with complex tasks that take many steps and more time to complete. It also remembers only its most recent interactions because the team reduced its long-term memory to make it more responsive.

It is also not yet close to human skill with a mouse and keyboard. That matters because the system acts inside games through the same kinds of inputs a player uses. If the agent cannot control the interface smoothly, its higher-level reasoning may not translate into successful action.

Outside researchers see both promise and reasons for caution. Julian Togelius, an AI researcher at New York University who works on creativity and video games, described the result as interesting. He noted that training one system to play multiple games from the screen alone has historically been difficult, calling real-time play from visual input only “hard mode.”

Togelius also pointed to GATO, an earlier Google DeepMind system that did not transfer skills across a significant number of virtual environments. He remains open-minded about whether SIMA 2 could eventually help robotics, but he emphasized that the real world differs from video games in both harder and easier ways.

The Robot Question

Google DeepMind’s long-term ambition is to use agents like SIMA 2 as building blocks for real-world robots. Marino said skills such as navigation, tool use, and collaboration with humans are essential for future robot companions.

That claim is still contested. Matthew Guzdial, an AI researcher at the University of Alberta, said he was not too surprised that SIMA 2 could play many games because many games share similar keyboard and mouse controls. In his view, a game with unusual input could expose weaknesses.

Guzdial also questioned how well SIMA 2’s learning would transfer to robots. Real-world camera visuals are harder to understand than video game visuals, he said, because games are designed to be readable for human players.

The result is a system that is meaningful but not conclusive. SIMA 2 shows how Gemini can make a multiworld game agent more interactive and adaptive. Whether that progress becomes useful for general-purpose agents or robots will depend on how well those virtual skills survive outside the game worlds where they were learned.