MIT Tech Review AI December 11, 2024 TERMINATOR

Why Project Astra makes Gemini 2.0 feel more useful

Google DeepMind has introduced Gemini 2.0 and a new version of Project Astra, an experimental universal assistant that can use text, speech, image, and video. A live demo showed a powerful but imperfect system, with privacy, transparency, and release timing still unresolved.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

A multimodal assistant that sees, hears, remembers context, and uses apps raises mild autonomy and privacy concerns, though it is still an experimental demo.

Why Project Astra makes Gemini 2.0 feel more useful

Google DeepMind is pushing generative AI toward something more practical than a chatbot window. With Gemini 2.0 and a new version of Project Astra, the company is showing how an AI system could see, hear, speak, remember recent context, and use other Google apps to help with real-world tasks.

The result is not a finished consumer product. It is a glimpse of where Google wants AI agents to go next: away from single-purpose prompts and toward assistants that can follow along with what a person is doing.

Gemini 2.0 puts agents at the center

Gemini 2.0 is the latest version of Google DeepMind's family of multimodal large language models. The key change is not only that the model is newer, but that it has been redesigned around the ability to control agents.

Google DeepMind says Gemini 2.0 is twice as fast as Gemini 1.5. It also says the model performs better on several standard benchmarks, including MMLU-Pro, a set of multiple-choice questions used to test large language models across subjects such as math, physics, health, psychology, and philosophy.

But raw model performance is no longer the whole story. The source article notes that leading systems from Google DeepMind, OpenAI, and Anthropic are now close enough that the bigger question is what people can actually do with them.

That is where Project Astra matters. Astra uses Gemini 2.0's agent framework to respond through text, speech, image, and video. It can also call on Google apps including Search, Maps, and Lens when it needs outside help.

What Project Astra can do in a live demo

Project Astra was first teased at Google I/O in May. In a closed-door live demo in London's King's Cross district, MIT Technology Review saw a newer version in use.

The team described Astra as a universal assistant, though they are still working out what that term should mean in practice. Greg Wayne, co-lead of the Astra team, framed the goal in simple terms: an AI with eyes, ears, and a voice that can stay with a person and help with whatever they are doing. He also made clear that the system is not there yet.

In one demonstration, Bibo Xu, product manager for Astra, pointed a phone at a cookbook recipe for a chicken curry and asked Astra to identify the spices. The assistant first listed black peppercorns, hot chili powder, and a cinnamon stick. After Xu told it to look again, Astra added ground turmeric and curry leaves.

That moment captured both sides of the technology. Astra made a mistake, but the correction happened through normal speech. The user did not need to restart a session or rewrite a prompt. The conversation simply continued.

Xu also pointed the phone at wine bottles and asked Astra to choose one for the chicken curry. Astra picked a rioja and explained the choice. When asked about price, it said it needed Search, then returned with an answer after looking online.

The appeal is the mix of senses, memory, and apps

Astra's strongest idea is combination. It is not just a voice assistant, image recognizer, search box, or chatbot. It brings several modes together and can use other tools when its own model is not enough.

In the demo, Astra could respond to what the phone camera saw, listen to spoken instructions, answer aloud, and recover from corrections. The system also remembers previous conversations, according to Xu, and keeps track of the previous 10 minutes of video.

Google DeepMind has shown other possible uses in video form. These include reading an email on a phone screen to find a door code, remembering that code later, looking at a passing bus and answering where it goes, and discussing public artwork while a person walks past.

The company has also shown Astra working on a pair of smart glasses, though the source article says that technology is even further down the company's wish list. There is no mention of a release date for Astra.

Google is building a wider agent lineup

Astra is only one part of the broader push. Alongside Gemini 2.0, Google DeepMind also introduced Mariner, an agent built on Gemini that can browse the web for a user; Jules, a Gemini-powered coding assistant; and Gemini for Games, an experimental assistant for chatting and asking for tips while playing video games.

In the same busy stretch, Google DeepMind also announced Veo, a video generation model; Imagen 3, a new version of its image generation model; and Willow, a new kind of chip for quantum computers. CEO Demis Hassabis was in Sweden yesterday receiving his Nobel Prize.

Taken together, the announcements show how Google DeepMind is trying to move generative AI into many settings at once: phones, browsers, coding, games, images, video, and hardware research. Astra stands out because it ties many of those ambitions to an everyday device and an everyday interaction pattern.

Privacy and transparency are still unresolved

Researchers outside Google DeepMind are watching Astra closely. Maria Liakata, who works on large language models at Queen Mary University of London and the Alan Turing Institute, said the way the system combines capabilities is impressive. She also said that multimodal reasoning is cutting-edge, while noting that it is hard to know exactly where Google DeepMind stands because the company has not said much about the technology itself.

Bodhisattwa Majumder, who works on multimodal models and agents at the Allen Institute for AI, raised a related concern. He said more openness would help consumers understand the limits of systems they may soon use. In his view, users should be able to see what a system has learned about them, correct mistakes, or remove private information.

Liakata also pointed to privacy concerns, including the possibility that people could be monitored without consent. A phone that becomes a person's eyes may be useful, but it also changes what the device can observe.

Google DeepMind says privacy, security, and safety are part of its process for new products. Dawn Bloxwich, director of responsible development and innovation at the company, said trusted users will test the technology for months before public release. She also said products need to be designed so they can be changed quickly or pulled back if necessary.

That caution matters because no test group can predict every use or misuse. Project Astra may show one of the clearest paths yet for generative AI agents, but its real impact will depend on how well Google DeepMind handles reliability, consent, control, and the gap between a compelling demo and daily use.