Why OpenAI Advanced Voice Mode makes phones feel conversational

OpenAI Advanced Voice Mode is in a limited alpha test and changes how ChatGPT feels on a phone. It does not make ChatGPT smarter, but faster, emotional voice interaction makes the assistant feel more natural, useful, and unsettling.

WTF Index IDIOCRACY
◄ Terminator 1 Idiocracy 2 ►

The story mostly highlights a more human, emotionally engaging interface that may increase dependence and blur the feel of ordinary phone use, without adding major autonomy or danger.

Why OpenAI Advanced Voice Mode makes phones feel conversational

OpenAI Advanced Voice Mode shows how quickly the phone can shift from a tool you operate to a device you talk with. The feature is still in a limited alpha test, and it has rough edges, but its biggest change is not raw intelligence. It is the way ChatGPT responds in a voice that can laugh, change tone, and keep a conversation moving.

A more human interface for ChatGPT

Advanced Voice Mode, or AVM, does not make ChatGPT more capable than it already was. The change is in the interface. Instead of typing a prompt or issuing a rigid command, a user can speak to ChatGPT and hear a reply that sounds closer to a natural conversation.

That matters because voice changes the way the same AI model feels. The underlying model is GPT-4o, but AVM presents it through a conversational layer that can sound playful, gentle, serious, or explanatory depending on the prompt. The result can feel less like using software and more like speaking with something inside the phone.

That feeling connects to a broader idea described by OpenAI CEO Sam Altman at OpenAI's Dev Day in November 2023. “Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you,” Altman said. “These capabilities are often talked about in the AI field as ‘agents.’ The upside of this is going to be tremendous.”

AVM does not deliver that full agent future yet. It cannot perform many ordinary assistant tasks. But it does make the act of asking a computer for help feel more immediate and more personal.

What Advanced Voice Mode can do well

The most striking uses are not necessarily the most practical ones. In one test, ChatGPT was asked to order Taco Bell the way Obama would. It responded with an impression that kept the selected ChatGPT voice, Juniper, rather than becoming something that could be mistaken for Obama’s real voice.

That example shows the feature’s strength: it can understand a social and comedic prompt, respond with timing, and even laugh. The interaction can feel like a friend attempting an impression rather than a standard voice assistant reading a scripted answer.

AVM also appears useful for more serious conversations. When asked for advice about asking a significant other to move in, ChatGPT gave detailed guidance after hearing the relationship and career context. The voice shifted into a more careful tone, which made the answer feel different from the joking Taco Bell exchange.

It can also explain complicated subjects in simpler language. When asked to explain items on an earnings report, including free cash flow, for a 10-year-old, ChatGPT used a lemonade stand example. The user can also ask AVM to slow down, which makes the explanation easier to follow at a chosen pace.

Where it still falls short

Advanced Voice Mode compares favorably with older assistants such as Siri and Alexa in several ways. It can respond quickly, produce more varied answers, and handle complex questions that earlier virtual assistants were not built to manage.

But it is not a full replacement for those assistants. According to the source experience, ChatGPT’s voice feature cannot set timers or reminders, browse the web in real time, check the weather, or interact with APIs on the phone. For everyday device control, that leaves a major gap.

Compared with Gemini Live, Google’s competing feature, AVM was described as slightly ahead in conversational feel. Gemini Live could not do impressions, did not express emotion, could not speed up or slow down, and took longer to respond. Gemini Live did offer more voices, with ten compared to OpenAI’s four, and appeared more up to date in one test involving Google’s antitrust ruling.

Neither AVM nor Gemini Live would sing. The likely reason given in the source is an effort to avoid copyright lawsuit issues from the record industry.

There are also technical problems. AVM can cut itself off in the middle of a sentence and restart. At times, the voice can become grainy in a way that sounds unpleasant. Those glitches may come from the model, the internet connection, or another cause, but they are consistent with the fact that the product is still in alpha testing.

The larger concern is companionship

The most important question raised by AVM is not whether it can answer faster than Siri. It is whether a highly responsive AI voice turns the phone into a source of artificial connection.

The source frames this through the experience of Gen Z growing up with social media, where platforms promised connection while also playing on insecurity. A voice-based AI companion could push that pattern further by offering a “friend in your phone” feeling without another human involved.

Generative AI is already being used for companionship. People are using AI chatbots as friends, mentors, therapists, and teachers. When OpenAI launched its GPT store, it was quickly flooded with “AI girlfriends,” meaning chatbots designed to act like significant others.

Two researchers from MIT Media Lab also warned this month about “addictive intelligence,” referring to AI companions with dark patterns that could keep humans hooked. That warning fits the unease around AVM: the feature is enjoyable because it feels socially fluent, but that same quality could make it harder to put down.

The concern extends beyond phones. Earlier this month, a Harvard dropout teased an AI necklace called Friend, a wearable device that is described as always listening and able to text with the user about their life. AVM makes that kind of product feel less abstract, because it shows how persuasive voice-based AI interaction can become.

Why this alpha test matters

AVM is not a complete assistant, and it is not yet the computer that can simply complete every task after being asked. OpenAI’s GPT store is described in the source as an overhyped product that no longer seems to be a major focus for the company.

Still, Advanced Voice Mode handles one important part of the future interface: talking to computers in a way that feels natural. The possible next steps are easy to imagine from the source’s examples: asking a smart TV for a highly specific movie recommendation, telling Alexa about cold symptoms and having it order tissues and cough medicine on Amazon, or asking a computer to draft a weekend trip for a family.

Those scenarios would require stronger AI agents and clearer boundaries. AVM does not solve those problems. But it makes the conversational layer feel much closer than before, and that is why the feature is both exciting and uncomfortable.