Nvidia is pushing deeper into physical AI with Cosmos, a family of foundational AI models designed to help machines understand and operate in the real world. The announcement places robots, warehouses, and self-driving cars at the center of the company’s next AI pitch.
Instead of focusing on text generation, Cosmos is built around images, video-like simulation, and 3D representations of physical spaces. Nvidia says that difference matters because robots need to learn from movement, objects, accidents, and human activity, not just written language.
What Nvidia Cosmos is built to do
Nvidia announced Cosmos during a keynote presentation at the annual CES conference in Las Vegas. Nvidia CEO Jensen Huang presented the system as a tool for training humanoids, industrial robots, and self-driving cars.
The core idea is straightforward: language models are trained on large collections of written material, while Cosmos is intended to generate visual and spatial material. In the source article, Cosmos is described as a family of foundational AI models that can generate images and 3D models of the physical world.
That makes the system relevant to machines that must interpret physical conditions. A robot in a factory, a humanoid robot in a home, or a self-driving car all need to make sense of objects, motion, and surroundings. Cosmos is Nvidia’s attempt to provide model-generated training material for that kind of understanding.
Jensen Huang said Cosmos was trained on 20 million hours of real footage of “humans walking, hands moving, manipulating things.” He also framed the goal clearly: “It’s not about generating creative content, but teaching the AI to understand the physical world.”
Why simulation matters for robots
Robots need practice before they can reliably handle messy, varied environments. The source article points to warehouses as one example, with Cosmos generating realistic video footage of boxes falling from shelves. That kind of simulated accident can be used to train a robot to recognize what has happened.
This is useful because physical-world tasks are rarely clean or predictable. Objects fall, hands move, people walk through workspaces, and machines must react to the scene in front of them. Cosmos is meant to help create more of those training situations without relying only on direct real-world collection for every example.
Users can also fine-tune the models with their own data. That detail is important because a warehouse, factory, vehicle system, or robot builder may have a specific environment or task that general training data does not fully cover.
In practical terms, the system connects three ideas that are central to robot training:
- Physical-world generation: Cosmos can create images and 3D models rather than text.
- Scenario training: generated footage can show events such as boxes falling from shelves.
- Custom adaptation: users can fine-tune the models using their own data.
The companies already using Cosmos
Nvidia says a number of companies are already using Cosmos. The list includes humanoid robot startups Agility and Figure AI, along with self-driving car companies such as Uber, Waabi, and Wayve.
That mix shows the breadth of Nvidia’s target market. Humanoid robot companies are interested in machines that can handle human-shaped environments and tasks. Self-driving car companies need systems that can understand roads and physical surroundings. Industrial robot developers need more capable machines for workspaces such as warehouses and factories.
Nvidia is also pairing Cosmos with its existing Isaac robot simulation platform. The company announced software intended to help different kinds of robots learn new tasks more efficiently. The new Isaac feature will let robot builders start with a small number of examples of a desired task, such as grasping a particular object, and then generate large amounts of synthetic training data.
That approach matters because robot training often depends on examples. If a builder can begin with only a small number of demonstrations and expand them into a larger training set, it could reduce the gap between showing a robot what to do and preparing it to perform the task in varied conditions.
Nvidia’s wider robotics push at CES
Cosmos was not presented in isolation. Nvidia used the CES stage to show that it wants to be a major supplier for companies building and deploying humanoid robots. Jensen Huang was joined on stage by life-sized images of 14 different humanoid robots developed by companies including Tesla, Boston Dynamics, Agility, and Figure.
The company’s message is that foundational models and simulation tools can become part of the basic infrastructure for robot development. Cosmos provides generated visual and 3D world data. Isaac provides simulation and synthetic training data workflows. Together, they are aimed at companies trying to make robots learn faster and perform tasks more effectively.
The same announcement cycle also included Project Digits, a $3,000 “personal AI supercomputer” that can run a large language model of up to 200 billion parameters without cloud services from AWS or Microsoft. Nvidia also announced its next-generation RTX Blackwell GPUs and software tools for building AI agents.
Those announcements point to a broader strategy: Nvidia is building around AI models, AI hardware, robot simulation, and tools for agents. Cosmos stands out within that package because it focuses on the physical world, where machines need more than language skills. For humanoid robots, industrial robots, and self-driving cars, the next challenge is not just producing an answer. It is understanding a scene well enough to act.