TechCrunch AI December 4, 2024 TERMINATOR

Playable 3D worlds move closer with DeepMind's Genie 2

DeepMind has introduced Genie 2, a world model that can turn a single image and text description into an interactive 3D scene. The system is being framed less as a finished game engine and more as a tool for prototyping interactive experiences and testing AI agents.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

Genie 2 advances controllable world models that could help train and test AI agents, but the article frames it mainly as prototyping rather than immediate danger.

Playable 3D worlds move closer with DeepMind's Genie 2

DeepMind has shown a new AI model that points toward a more interactive form of generative media: playable 3D worlds created from a single image and a short text description.

The model is called Genie 2. It follows DeepMind's earlier Genie system, which was released earlier this year, and is designed to generate real-time scenes that users can control with a mouse or keyboard. The result can look less like a static AI video and more like a rough, temporary video game world.

What Genie 2 Generates

According to DeepMind, Genie 2 can create a "vast diversity of rich 3D worlds." The model can start from an image and prompt such as "A cute humanoid robot in the woods," then produce an interactive environment where actions such as jumping and swimming are possible.

The system was trained on videos. DeepMind says that training allows it to simulate object interactions, animations, lighting, physics, reflections and the behavior of "NPCs." In practice, many of the examples resemble AAA video games, though DeepMind has not disclosed many details about how its training data was sourced.

Genie 2 is part of a wider push around world models: AI systems that try to simulate environments rather than only generate text, images or video. The source article compares the direction of Genie 2 with work from Fei-Fei Li's company, World Labs, and Israeli startup Decart.

Why The Controls Matter

The important shift is not only that Genie 2 can make a scene. It is that the scene responds to user input in real time. DeepMind says the model can identify which part of the generated world should move when a person presses keys on a keyboard.

That means the model is not simply animating everything on screen at once. It has to infer which object is the playable character and which objects are background elements. DeepMind gave the example of arrow keys moving a robot rather than trees or clouds.

The model can also produce different perspectives, including first-person and isometric views. DeepMind says Genie 2 can maintain consistent worlds for up to a minute, with the majority lasting 10 to 20 seconds.

That time limit is central to understanding what Genie 2 is and is not. A one-minute world may be technically impressive, but it would not support the kind of persistent progress that makes most games satisfying. A player would quickly run into the limits of an experience that resets or loses continuity after a short span.

The Consistency Problem

World models often struggle with consistency. They can produce artifacts, lose track of spatial layouts or hallucinate details as the generated environment changes. The source article points to Decart's Minecraft simulator, Oasis, as an example of a system with low resolution that quickly "forgets" the layout of levels.

DeepMind says Genie 2 improves on that problem by remembering parts of a simulated scene even when they are no longer visible. When those areas come back into view, the model can render them accurately again. The source article notes that World Labs' models can do this as well.

That kind of memory matters because interactive worlds depend on continuity. If a user turns away from an object and then turns back, the environment needs to remain recognizable. Without that, the experience becomes less like navigating a world and more like watching a sequence of loosely connected generations.

Research Tool, Not A Finished Game

DeepMind is positioning Genie 2 as a research and creative tool rather than a replacement for complete games. The company describes possible uses in prototyping "interactive experiences" and evaluating AI agents.

For creative work, the appeal is speed. DeepMind says concept art and drawings can become fully interactive environments because of Genie 2's out-of-distribution generalization capabilities. That could make early exploration of worlds and mechanics faster, at least at the prototype stage.

For AI research, the value is different. DeepMind says Genie 2 can quickly create varied environments for agents, giving researchers evaluation tasks that agents have not seen during training. In that use case, the generated world is less about entertainment and more about testing how an AI system behaves in unfamiliar settings.

Open Questions For Games And IP

Genie 2 also raises questions that the demonstration itself does not settle. The source article notes that the simulations can look like AAA video games and suggests one possible reason: the training data may include playthroughs of popular titles. DeepMind has not revealed much about its data sourcing methods.

That creates an unresolved intellectual property issue. DeepMind is a Google subsidiary, and the source article notes that Google has previously implied that its ToS gives it permission to use YouTube videos for model training. Whether Genie 2 is creating unauthorized copies of games it may have "watched" is described in the source as a question for the courts.

The response from game creatives may also be mixed. A recent Wired investigation found that major players like Activision Blizzard, which has laid off scores of workers, are using AI to cut corners, ramp up productivity and compensate for attrition.

Google's investment in this area appears to be growing. In October, DeepMind hired Tim Brooks, who was heading development on OpenAI's Sora video generator, to work on video generation technologies and world simulators. Two years ago, the lab also brought in Tim Rocktäschel, known for "open-endedness" experiments with video games like NetHack, from Meta.

Genie 2 is not a practical game platform yet. Its current limits make that clear. But it shows where world model research is heading: toward AI systems that do not merely generate scenes, but create environments that can be entered, controlled and used to test other forms of intelligence.