World models are becoming a major focus for leading artificial intelligence companies as the industry looks for new routes beyond text-centered systems. The goal is to build AI that can understand environments, predict how scenes change and eventually operate in the physical world.
Google DeepMind, Meta and Nvidia are among the companies working on this direction. Their interest comes as the jumps between new large language models from companies including OpenAI, Google and Elon Musk’s xAI have been getting smaller, even as investment remains high.
Why AI companies are looking beyond LLMs
Large language models power widely used chatbots such as OpenAI’s ChatGPT. They are built around language, which makes them useful for many digital tasks, but the source article describes rising questions about whether their progress is reaching a ceiling.
World models point at a different ambition. Instead of relying only on language, they are trained with data from videos, robots and real or simulated environments. That makes them relevant to systems that must understand space, motion, objects and consequences.
The strategic appeal is clear. If AI can learn how the physical world behaves, it could support self-driving cars, robotics and AI agents. Nvidia also sees a broader opportunity across sectors such as manufacturing and health care.
Rev Lebaredian, vice-president of Omniverse and simulation technology at Nvidia, placed the possible market at $100 trillion if AI can understand and operate in the physical world. That figure shows why world foundation models are drawing attention even though the technology remains difficult to build.
What a world model is meant to do
A world model is designed to learn from streams of physical or simulated data. The central idea is not only to generate images or video, but to represent how an environment works over time.
That distinction matters because many video-generation systems have produced motion without a deeper grasp of the scene. Cristóbal Valenzuela, chief executive officer at Runway, said traditional video methods try to create the appearance of movement while the model does not really understand what is happening. He also said earlier video-generation models had physics that did not match the real world.
World models are intended to reduce that gap. If a model can predict how an environment should change, it becomes more useful for interactive scenes, robotic behavior and training systems before they make mistakes in real environments.
That is also why simulation matters. Google DeepMind’s Shlomi Fruchter, co-lead of Genie 3, said AI remains limited to the digital domain, and that building environments resembling or behaving like the real world can provide scalable training without the real-world costs of mistakes.
The projects shaping the race
Several groups have recently shown progress in world models. Google DeepMind previewed Genie 3 last month. It generates video frame by frame and uses previous interactions, unlike video models that typically create the whole video at once.
Meta is pursuing a different path with V-JEPA models, which are trained on raw video content. The company is trying to mirror the way children learn by watching the world around them. Meta’s Facebook Artificial Intelligence Research lab, known as Fair, released the second version of the model in June and has been testing it on robots.
Yann LeCun, Meta chief AI scientist and head of Fair, has been a prominent supporter of this architecture. He has warned that LLMs would not reach human-like reasoning and planning. At the same time, Meta’s chief Mark Zuckerberg has increased investment in AI talent for the next Llama LLM models, including hiring Alexandr Wang, founder of Scale AI, to lead all of Meta’s AI work, with LeCun reporting to Wang.
Nvidia is tying world models to its broader push into robotics and simulation. Its Omniverse platform creates and runs simulations, building on the company’s long history of simulating environments in video games. Nvidia Chief Executive Jensen Huang has described physical AI as the company’s next major growth phase.
Entertainment may arrive first
One of the nearer applications is entertainment, where world models can create interactive, realistic scenes. World Labs, founded by AI pioneer Fei-Fei Li, is developing a model that can generate video game-like 3D environments from a single image.
Runway, a video generation start-up with Hollywood studio deals including Lionsgate, launched a product last month that uses world models for gaming settings. The product can create personalized stories and characters in real time.
These examples show why gaming and interactive media are a natural early testbed. A world model does not only need to produce a clip. It needs to keep a scene coherent as a user acts inside it.
The data and compute problem
The biggest obstacle is scale. The source article states that world models require a huge amount of data and computing power, and remain an unsolved technical challenge.
Companies need physical data about the world to make these systems useful. San Francisco-based Niantic has mapped 10 million locations through games including Pokémon Go, which has 30 million monthly players interacting with a global map.
Niantic ran Pokémon Go for nine years. Even after the game was sold to US-based Scopely in June, players still contribute anonymized data through scans of public landmarks to support the company’s world model. Following the Scopely deal, the company is called Niantic Spatial.
Niantic and Nvidia are also working on ways for world models to fill missing pieces by generating or predicting environments. That capability is central to the larger promise: AI systems that can move from digital outputs into factories, hospitals, robots, vehicles and other physical settings.
The timeline is uncertain. LeCun has said that a new generation of AI systems powering machines with human-level intelligence could take 10 years to achieve. But the direction of investment is already visible, and world models have become one of the main bets for companies seeking the next stage of AI progress.