TechCrunch AI October 1, 2024 TERMINATOR

OpenAI Pitches Realtime API as DevDay Turns Back to Developers

OpenAI used its 2024 DevDay to announce a public beta of the Realtime API, plus vision fine-tuning, prompt caching and model distillation. The event also served as a platform reset after executive departures and fundraising developments, with OpenAI trying to keep developers building on its AI models.

WTF Index TERMINATOR

◄ Terminator 1 Idiocracy 0 ►

A mostly routine developer-platform launch, with a mild Terminator lean from realistic voice agents and unresolved disclosure concerns.

OpenAI Pitches Realtime API as DevDay Turns Back to Developers

OpenAI’s 2024 DevDay arrived after a turbulent stretch for the company, but the message to developers was straightforward: keep building on its platform. The company introduced several API features aimed at making AI apps faster, more capable and cheaper to run, while also trying to show that recent leadership changes would not slow product work.

The biggest announcement was the public beta of the Realtime API, a tool for developers who want to build low-latency speech-to-speech experiences into their own apps. It is not ChatGPT’s Advanced Voice Mode, but the source describes it as close enough to make voice a central theme of this year’s developer push.

Realtime API moves voice into developer apps

The Realtime API gives developers a way to create apps that can listen and respond with AI-generated voice in nearly real time. OpenAI is offering six voices for developers to use, but those voices are separate from the ones available in ChatGPT.

The company is also limiting voice choice in an important way. Developers cannot use third party voices through the feature, a decision OpenAI framed around avoiding copyright problems. The voice ambiguously based on Scarlett Johansson’s is not available anywhere.

During a briefing before the event, OpenAI’s head of developer experience, Romain Huet, demonstrated a trip planning app built with the Realtime API. In the demo, a user spoke with an AI assistant about an upcoming trip to London and received low-latency replies. Because the Realtime API can access tools, the app also marked restaurant locations on a map as it answered.

Huet also showed the API being used in a phone scenario, where an AI voice spoke with a human about ordering food for an event. The API itself cannot call restaurants or shops directly, unlike Google’s infamous Duo, but it can connect with calling APIs like Twilio.

Realistic AI calls raise disclosure questions

The phone demo also surfaced a policy gap. OpenAI is not adding automatic disclosures that identify the speaker as an AI model in calls like the one shown. That matters because the AI-generated voices sound quite realistic.

For now, the responsibility appears to sit with developers who build on the API. They would need to add disclosure themselves if they want users or call recipients to know that an AI system is speaking. The source also notes that such disclosure could be required by a new California law.

That choice puts developers in a more visible role. OpenAI is providing the technical layer, but app makers may need to decide how to present AI voice interactions clearly inside their own products and workflows.

Vision fine-tuning, caching and distillation broaden the platform

DevDay was not only about voice. OpenAI also announced vision fine-tuning in its API, letting developers use images as well as text to fine-tune applications of GPT-4o. The intended result is better performance for tasks that involve visual understanding.

OpenAI’s head of product API, Olivier Godement, told TechCrunch that developers will not be allowed to upload copyrighted imagery, images depicting violence, or other material that violates OpenAI’s safety policies. The source gives Donald Duck as an example of copyrighted imagery that would not be allowed.

Another addition was prompt caching. The feature lets developers cache frequently used context between API calls, which OpenAI says can reduce costs and improve latency. OpenAI says developers can save 50% with prompt caching, while Anthropic promises a 90% discount for a similar feature it launched several months ago.

OpenAI also introduced model distillation. The feature lets developers use larger models, including o1-preview and GPT-4o, to fine-tune smaller models such as GPT-4o mini. Smaller models generally cost less to run than larger ones, and distillation is meant to help improve their performance.

As part of that model distillation work, OpenAI is launching a beta evaluation tool. Developers can use it to measure a fine-tune’s performance within OpenAI’s API.

OpenAI’s developer case is also a competitive case

The announcements landed as OpenAI tries to reassure developers that its platform remains the best place to build AI apps. Company leaders say more than 3 million developers are building with OpenAI’s AI models, but the source also makes clear that the market is increasingly competitive.

OpenAI said it has cut developer API access costs by 99% in the last two years. The source notes that the company was likely pushed in that direction by competitors such as Meta and Google, which have continued to undercut prices.

The company also had to address its own internal changes. OpenAI chief product officer Kevin Weil said the departures of chief technology officer Mira Murati and chief research officer Bob McGrew would not affect OpenAI’s progress. He described both as major contributors to the company’s current position and said OpenAI would not slow down.

That message matters because last year’s DevDay was followed by turmoil, and this year’s event came after executive departures and major fundraising developments. DevDay therefore had two jobs: ship new developer tools and project continuity.

What OpenAI did not announce

Some of the biggest signals from DevDay came from what was missing. OpenAI did not share news about the GPT Store, which was announced during last year’s DevDay. The last update in the source is that OpenAI had been piloting a revenue share program with some of the most popular creators of GPTs, but had not shared much since.

OpenAI also said it was not releasing any new AI models during this year’s DevDay. Developers waiting for OpenAI o1, not the preview or mini version, will have to wait longer. The same is true for Sora, the company’s video generation model.

That made this DevDay less about a new flagship model and more about the developer layer around existing models. Realtime API, vision fine-tuning, prompt caching and model distillation all point toward the same goal: making OpenAI’s tools more practical for app builders already trying to manage speed, cost and performance.