Google DeepMind is moving part of its robotics work off the cloud and directly onto machines. Its new Gemini Robotics On-Device model is designed to let robots interpret what they see, understand instructions, and generate physical actions without relying on a remote server.
The shift matters because robots do not just answer prompts. They move through physical spaces, handle objects, and change the environment around them. That makes speed, reliability, privacy, and safety central questions for any AI system that controls a robot arm or mobile platform.
Why Local AI Matters For Robots
The earlier Gemini Robotics release used a hybrid setup: a smaller model ran locally on the robot, while a larger model in the cloud handled more complex reasoning. Google DeepMind still describes that earlier release as the “best” version of its robotics technology, but the new on-device version changes the operating model.
With Gemini Robotics On-Device, the local VLA, or vision language action model, can operate as a standalone system. That means a robot can respond without waiting for a cloud model to generate each step.
For robotics, that timing is not a small detail. A chatbot can pause while it produces an answer, but a robot asked to pick up or move an object needs to adapt quickly. Delays can make even simple tasks feel awkward or unreliable.
Carolina Parada, head of robotics at Google DeepMind, says this approach could make robots more dependable in difficult situations. The team sees the on-device model as especially useful where connectivity is weak or unavailable.
What Gemini Robotics On-Device Can Do
The new model draws on Gemini’s multimodal understanding and applies it to robot actions. Parada explains the idea this way: “It’s drawing from Gemini’s multimodal world understanding in order to do a completely new task.”
That capability is important because robots cannot be trained for every possible arrangement of objects, rooms, or human requests. Traditional reinforcement approaches have been slow, while generative AI offers a broader ability to generalize from what the system has learned.
According to the source, Gemini Robotics On-Device is only a little less accurate than the hybrid version. Parada also says many tasks should work without additional customization: “When we play with the robots, we see that they’re surprisingly capable of understanding a new situation.”
The examples show both promise and limits. The model should handle straightforward physical actions such as tying a shoe or folding a shirt. A more involved request, such as making a sandwich, would likely require a more powerful model because of the multi-step reasoning involved.
The SDK Opens The Door To Custom Tasks
This release is also the first Google robotics model that developers can tune for their own uses. Google DeepMind is releasing the model with a full SDK, giving robotics researchers a way to adapt it to new environments and tasks.
The source says the VLA can be tuned with as little as 50 to 100 demonstrations. In this context, a demonstration usually means tele-operating the robot: a person manually controls the machine to complete the action, and that process helps the model learn to do the task autonomously.
That developer access could expose where the stock model performs well and where it fails. As teams try new tasks and settings, they may find actions that require additional tuning or different safety controls.
Google’s training also uses synthetic data, but Parada says real-world data remains important for harder physical behavior. “We still find that in the most complex, dexterous behaviors, we need real data,” she says. “But there is quite a lot that you can do with simulation.”
Privacy And Connectivity Are Big Advantages
A cloud-free robotics model has practical implications beyond speed. If the robot can process visual information locally, less visual data needs to leave the device. The source points to health care as one example where local processing may be better for privacy.
The same local design helps in places where network access is unreliable. A robot that depends on a cloud model may lose key capabilities when the connection drops. An on-device model can continue operating with full autonomy, at least for the kinds of tasks it can handle locally.
That does not make the cloud irrelevant. The earlier hybrid system still has advantages for complex reasoning. But Gemini Robotics On-Device gives developers another option when responsiveness, autonomy, or data locality matters more than maximum reasoning power.
Safety Becomes A Developer Responsibility
Physical robots raise different safety concerns than chatbots or image generators. A generative AI system can produce an incorrect answer, but a robot can also move hardware in the real world. That makes safeguards more than a software quality issue.
The full Gemini Robotics system uses multiple layers. Parada says the larger system includes a model that reasons about whether an action is safe, a VLA that produces action options, and a low-level controller that handles safety-critical limits such as force and speed.
The on-device release is different because it is just a VLA. That means developers using it will need to build their own safety framework around it. Google suggests that early testers connect it to the standard Gemini Live API, which includes a safety layer, and also implement a low-level controller for critical safety checks.
Anyone who wants to try Gemini Robotics On-Device needs to apply for access to Google’s trusted tester program. Parada says there have been many robotics breakthroughs in the past three years, and she notes that the current Gemini Robotics release is still based on Gemini 2.0. The team typically trails Gemini development by one version, while Gemini 2.5 has been cited as a major step forward for chatbot functionality.
For now, the bigger point is clear: Google DeepMind is making robotics AI more local, more tunable, and more practical for environments where a cloud connection cannot be assumed. The next challenge is making sure those machines remain controlled, useful, and safe when they act on their own.