MIT Tech Review AI July 30, 2024 NEUTRAL

OpenAI brings real-time voice to ChatGPT Plus users

OpenAI is rolling out a new ChatGPT voice bot powered by GPT-4o, starting with a small group of ChatGPT Plus users. The feature is designed for more natural conversation, while OpenAI says it has added safeguards around voices, copyrighted audio, and harmful content.

WTF Index NEUTRAL

◄ Terminator 1 Idiocracy 1 ►

A routine product rollout makes AI assistants more natural and capable, but the article emphasizes limited access and safeguards rather than clear harm.

OpenAI brings real-time voice to ChatGPT Plus users

OpenAI is moving ChatGPT closer to a hands-free assistant with a new voice-enabled chatbot that can hold spoken conversations in real time. The rollout begins with a limited group of paying ChatGPT Plus users, with broader access for all ChatGPT Plus subscribers planned for the fall.

The feature is powered by GPT-4o, OpenAI’s model that combines voice, text, and vision capabilities. For now, the most important change is voice: users can speak to ChatGPT, interrupt it, and receive spoken answers that are meant to feel more fluid than earlier chatbot interactions.

What the new ChatGPT voice bot can do

The new voice mode is part of OpenAI’s push toward a more capable AI assistant. The goal is not only to answer questions, but to make the exchange feel closer to a natural conversation.

According to the source article, the ChatGPT voice bot can detect what different tones of voice convey, respond when a user interrupts, and answer queries in real time. It has also been trained to sound more natural and to use voice to express a wide range of emotions.

That matters because voice assistants have usually been limited by rigid turn-taking. A user speaks, waits, and then receives an answer. OpenAI’s new system is positioned as a step beyond that pattern, with a model that can react during the flow of conversation rather than only after a neatly completed prompt.

The comparison point is the familiar category of voice assistants such as Siri and Alexa. OpenAI’s pitch, as described in the source, is a system with far more capabilities and more fluent exchanges. The company is presenting this as part of a broader march toward more fully capable AI agents.

Who gets access first

The rollout is limited at the start. OpenAI is making the voice chatbot available first to a “small group of users” who pay for ChatGPT Plus. The company says it will notify people in that first wave inside the ChatGPT app and provide instructions for using the new model.

A ChatGPT Plus subscription costs $20 a month. OpenAI says the voice bot will become available to all ChatGPT Plus subscribers in the fall.

The staged launch suggests OpenAI wants feedback before opening access more widely. The company is introducing a feature that depends on fast responses, voice generation, and safety controls, so the early rollout gives it a smaller environment in which to watch how the system performs.

The launch also comes later than first planned. The voice feature was announced in May, but OpenAI delayed its release by a month. The company said it needed more time to improve safety features, including the model’s ability to detect and refuse unwanted content. It also said it was preparing its infrastructure for real-time responses at a scale of millions of users.

Why safety is central to the release

Voice changes the risk profile of a chatbot. Text can be harmful or misleading, but generated audio can also sound like a person, imitate styles of speech, or be used in ways that raise concerns about deepfakes and copyrighted material.

OpenAI says it tested GPT-4o’s voice capabilities with more than 100 external red-teamers. These testers were asked to look for flaws in the model. According to OpenAI, they spoke a total of 45 languages and represented 29 countries.

The company says it has added several safety mechanisms. One major safeguard is the use of four preset voices created in collaboration with voice actors. OpenAI says GPT-4o will not impersonate or generate other people’s voices.

That detail follows a public controversy around a voice called “Sky.” When OpenAI first introduced GPT-4o, it faced backlash because the voice sounded a lot like Scarlett Johansson. Johansson released a statement saying OpenAI had contacted her for permission to use her voice for the model and that she declined. She said she was shocked to hear a voice “eerily similar” to hers in the model’s demo.

OpenAI denied that the voice was Johansson’s, but it paused the use of Sky. The episode shows why voice choice is not a minor design detail. A chatbot voice can quickly become an identity issue when it resembles a real person, especially one who says she did not consent.

Copyright filters and content limits

OpenAI is also facing several lawsuits over alleged copyright infringement. In the context of the new voice feature, the company says it has adopted filters that recognize and block requests to generate music or other copyrighted audio.

The company also says it has applied the same safety mechanisms used in its text-based model to GPT-4o. Those systems are meant to prevent the model from breaking laws and generating harmful content.

These controls are important because a voice model can be asked to produce more than ordinary spoken answers. Users may try to generate songs, recognizable audio, or other material that raises legal and ethical problems. OpenAI’s answer, based on the source, is to limit what the model can produce and to constrain voice generation to preset options.

For users, that means the first version of the voice bot is not an unrestricted audio generator. It is a conversational assistant with defined boundaries around voice identity, copyrighted audio, and unwanted content.

What is still coming later

OpenAI has signaled that more advanced features are planned, but they are not part of the current release. The company plans to add video and screen sharing at an unspecified later date.

Those features could make the assistant more useful in practical tasks. In the May demo, OpenAI employees pointed phone cameras at a piece of paper and asked the model to help solve math equations. They also shared computer screens and asked the model for help with coding problems.

For now, those capabilities remain outside the rollout. The current release focuses on voice conversation, real-time replies, and the controlled introduction of GPT-4o’s audio capabilities to ChatGPT Plus users.

The larger direction is clear from the features OpenAI is emphasizing. ChatGPT is becoming less like a box for typed prompts and more like an assistant that can listen, respond, and eventually see what a user sees. The first test is whether a limited voice rollout can make that interaction feel useful, natural, and safe enough to expand.