TechCrunch AI July 15, 2024 NEUTRAL

How Gemini helped a Google robot find its way around an office

Google demonstrated Gemini through a robot that could respond to spoken instructions and navigate the DeepMind office space. The demo used vision language models trained on images, videos and text to connect perception with action.

Google has used a robot to show a practical side of its Gemini AI model: taking instructions, interpreting visual information and moving through an office environment to complete a task.

The demonstration featured a robot from Google’s Everybody Robots Division, which was shut down last year. The robot is still around, and Google used it to show how Gemini can help a machine connect language, perception and navigation in the DeepMind office space.

A robot demo built around Gemini

The robot in the demonstration was given a visible personality detail: a yellow bowtie. But the core of the demo was not the accessory. It was the way Google used Gemini to teach the robot how to respond to commands and move through an office.

In one example, a Google employee asked the robot to take him somewhere to draw things. The robot replied that it needed a minute to think, then led the employee to a white board.

That sequence is simple on the surface, but it shows several steps happening together. The robot had to understand the request, connect “draw things” with a likely destination, and then navigate to that place inside the office.

Why vision language models matter here

Google is using vision language models, or VLMs, for this kind of task. The source describes VLMs as models trained on images and videos along with text.

That matters because a robot operating in a physical space cannot rely only on written or spoken language. It also needs perception. It has to interpret what is around it, recognize relevant visual cues and use that information to decide what to do next.

In plain terms, Gemini is being shown as more than a chatbot-style system. In this demonstration, it is part of a setup where AI helps a robot answer questions and perform tasks that require perception.

The whiteboard map test

A second example focused on written directions. The robot was told to follow the directions on a whiteboard. On that whiteboard, a map showed directions to a location called the Blue Area.

The robot followed those directions to a robotics testing area. After arriving, it announced, “I’ve successfully followed the directions on the whiteboard.”

This part of the demonstration is important because it combines multiple abilities in one task:

Understanding an instruction from a person
Using a visual map on a whiteboard
Moving through the DeepMind office space
Identifying the robotics testing area as the destination
Reporting that the task was completed

The demo does not present the robot as a general-purpose office assistant. It shows a narrower point: Gemini and VLMs can help a robot make sense of both language and visual information in order to navigate a real workspace.

What the demonstration suggests

The most notable part of the demo is the link between AI reasoning and physical action. A person can give an open-ended command, such as asking to be taken somewhere to draw, and the robot can infer that a white board is the right destination.

The whiteboard example adds another layer. Instead of simply recognizing a named place, the robot uses visual directions shown in the environment. That makes the demonstration less about memorizing one route and more about responding to information presented in the moment.

The source does not claim that this system is ready for broad deployment, and it does not describe the full technical setup behind the scenes. What it does show is Google using a remaining robot from the Everybody Robots Division to make Gemini’s capabilities easier to see.

For AI, robotics and Google watchers, the key takeaway is straightforward. Gemini is being demonstrated as a model that can support tasks involving text, images, video-trained perception and physical navigation, at least inside the controlled setting of the DeepMind office space.