The Decoder December 12, 2025 TERMINATOR

Runway pushes world models into interactive video with GWM-1

Runway has upgraded Gen-4.5 with native audio generation, audio editing and multi-shot editing. It also introduced GWM-1, its first "General World Model," with versions for explorable worlds, avatars and robotics data.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

Runway's interactive world model and robotics-oriented simulation push AI toward more powerful autonomous systems, though the story is mostly a product update.

Runway pushes world models into interactive video with GWM-1

Runway is expanding beyond conventional video generation with two linked moves: a broader Gen-4.5 update and the introduction of GWM-1, the company’s first "General World Model." The announcement places Runway in a fast-moving field where AI labs are trying to build systems that understand and simulate environments, not just produce media from prompts.

What Runway Added To Gen-4.5

The recently introduced Gen-4.5 now includes native audio generation and audio editing. That matters because video tools increasingly need sound to be part of the creation process, rather than a separate layer handled after the image sequence is complete.

Runway also added multi-shot editing. In practical terms, that feature lets a user change one scene and have the edit carry across the full video. For creators, the value is consistency: a single adjustment can affect more than one shot instead of forcing the same change to be repeated manually.

The source describes these updates alongside GWM-1, which suggests Runway is treating Gen-4.5 as both a production model and a foundation for more interactive systems. The company’s world model is built on the Gen-4.5 architecture, so the video model and the simulation model are connected rather than separate efforts.

How GWM-1 Works

GWM-1 is designed to build an internal representation of an environment and use that representation to simulate future events in real time. Instead of generating an entire video as a fixed output, it generates video frame by frame.

The important shift is control. GWM-1 can respond to inputs such as camera movements, robot commands, or audio. That makes the model more interactive than a standard video generator, because the output can be guided as the simulated scene continues to unfold.

Runway is shipping the world model in three versions:

GWM Worlds for creating explorable environments.
GWM Avatars for generating speaking characters with realistic facial expressions and lip sync.
GWM Robotics for producing synthetic training data for robots.

Runway plans to merge these capabilities into a single unified model eventually. For now, the split shows how broad the world model concept has become: the same general direction can support video environments, digital characters and robotics workflows.

Why World Models Are A Major AI Target

Runway is not alone in pursuing world models. The field is becoming crowded, with Google DeepMind and a new startup from AI researcher Yann LeCun also developing this kind of technology.

The broader goal is to move beyond conventional language models. The source says the industry views world models as an important evolution because current language models still lack a fundamental understanding of the physical world.

Google DeepMind CEO Demis Hassabis confirmed that building these models is central to the company’s strategy for reaching Artificial General Intelligence (AGI). That frames world models not only as a media technology, but also as part of a larger AI research direction.

World models are attractive because they aim to represent how environments behave. A model that can maintain a useful internal picture of a scene can potentially support interaction, prediction and control. Those are different capabilities from simply generating a plausible image or sentence.

The Competitive Field Is Expanding

Several other companies are working on related systems. World Labs, founded by Fei-Fei Li, raised $230 million to develop "Large World Models" (LWMs) with spatial intelligence. The company recently unveiled "Marble," a prototype that can render persistent 3D environments from multimodal prompts.

Munich-based startup Spaitial is developing Spatial Foundation Models designed to generate and interpret 3D worlds with consistent physical dynamics. That emphasis on consistency is important because world models need more than visual quality; they also need environments that remain coherent as users interact with them.

Startups Etched and Decart recently introduced the "Oasis" project, which generates playable, Minecraft-style 3D worlds in real time at 20 frames per second. The system allows basic interactions such as jumping and picking up objects, but it still has consistency challenges. The source notes that players can sometimes end up in different environments simply by turning around.

In August, Tencent released its Hunyuan World Model 1.0, an open-source generative AI model that creates 3D virtual scenes from text or image prompts. That adds another major technology company to the field and shows that world model work is not limited to one market or one type of AI lab.

What This Means For Runway

Runway’s update connects immediate creator tools with a longer-term push toward interactive simulation. Gen-4.5 gains audio features and multi-shot editing, while GWM-1 points toward generated worlds that can respond to control inputs as they are being created.

The three GWM-1 versions also clarify where Runway sees early uses: explorable environments, speaking avatars and synthetic training data for robots. Those are distinct applications, but each depends on a model’s ability to keep track of a scene over time.

The challenge for the whole field is consistency. The examples in the source show that real-time 3D generation and persistent worlds are already possible in prototype form, but maintaining stable environments remains difficult. Runway’s GWM-1 enters that race with a frame-by-frame approach and interactive inputs, backed by the Gen-4.5 architecture.