Ars Technica AI January 14, 2025 TERMINATOR

Why AI-enabled Alexa is still waiting on trust

Amazon wants to turn Alexa into an AI agent that can handle practical tasks, but hallucinations, latency, reliability, and cost remain major barriers. The challenge is not just adding a large language model; it is rebuilding a live assistant used across 500 million consumer devices.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

The story centers on a more agentic Alexa taking actions across services and devices, with hallucinations and reliability posing control risks, though deployment remains constrained.

Why AI-enabled Alexa is still waiting on trust

Amazon’s plan for an AI-enabled Alexa is bigger than a more conversational voice assistant. The company wants Alexa to become an agent that can take action across services and devices, but the rollout has been slowed by the technical demands of making that system reliable at enormous scale.

Amazon wants Alexa to move beyond simple commands

Alexa is already embedded within 500 million consumer devices worldwide, but Amazon’s leaders want to push it beyond the familiar uses of playing music, setting alarms, and answering basic requests. The target is a more proactive assistant that can work like a personalized concierge.

In Amazon’s vision, that could mean suggesting restaurants or configuring bedroom lights based on a person’s sleep cycles. Those examples show why the redesign is difficult: the assistant would no longer be limited to retrieving a known answer or triggering a simple command. It would need to understand context, choose the right service, and perform an action correctly.

Rohit Prasad, who leads the artificial general intelligence (AGI) team at Amazon, told the Financial Times that the assistant still has several hurdles to clear before launch. The main issues are hallucinations, latency, and reliability.

“Hallucinations have to be close to zero,” said Prasad. “It’s still an open problem in the industry, but we are working extremely hard on it.”

Hallucinations are a product problem, not just a model problem

For a chatbot, a fabricated answer can be embarrassing. For an agentic voice assistant that may call other services, control devices, or personalize actions, the stakes are different. The assistant has to be fast, accurate, and predictable in a setting where users expect voice interactions to feel immediate.

That expectation collides with the nature of today’s generative AI. The source describes generative AI as statistical software that predicts words based on speech and language patterns. That makes free-flowing dialogue possible, but it also makes fabricated answers a risk.

One former senior member of the Alexa team said LLMs can produce answers that are “completely invented some of the time.” At Amazon’s scale, the former team member said that could happen large numbers of times per day and harm the company’s brand and reputation.

The challenge is also about preserving what Alexa already does well enough for users to trust it. Former staff pointed to the difficulty of keeping Alexa’s consistency and functionality while adding creativity and more natural conversation. Amazon also plans to hire experts to shape the AI’s personality, voice, and diction so it remains familiar to Alexa users, according to one person familiar with the matter.

The old Alexa and the new Alexa have to work together

Several former workers said a major reason for the delay is the difficulty of switching and combining Alexa’s older predefined algorithms with large language models. The original Alexa software was built on technology acquired from British start-up Evi in 2012. It worked as a question-answering machine that searched within a defined universe of facts, such as the day’s weather or a specific song in a music library.

The new Alexa is different. It uses a bouquet of AI models to recognize and translate voice queries, generate responses, and identify policy violations, including inappropriate responses and hallucinations. Creating software that translates between the legacy systems and the new AI models has been a major obstacle in the Alexa-LLM integration.

Amazon said it was “working hard to enable even more proactive and capable assistance” for Alexa. It also said a technical implementation of this scale, across a live service and a suite of devices used by customers around the world, was unprecedented and not as simple as overlaying a LLM on to the Alexa service.

That distinction matters. A voice assistant is not only a chat interface. To operate as an agent, Prasad said Alexa’s “brain” has to call hundreds of third-party software and services. Those applications receive billions of requests a week, which means Amazon has to make actions reliable, fast, and cost-effective at the same time.

“Sometimes we underestimate how many services are integrated into Alexa, and it’s a massive number. These applications get billions of requests a week, so when you’re trying to make reliable actions happen at speed… you have to be able to do it in a very cost-effective way,” he added.

Amazon is leaning on multiple models and still testing safeguards

Amazon’s model strategy includes its own in-house software, including the latest Nova models, as well as Claude, the AI model from start-up Anthropic. Amazon has invested $8 billion in Anthropic over the course of the past 18 months.

Prasad, the former chief architect of Alexa, said the release of Amazon Nova models was partly driven by the need for speed, cost, and reliability so AI applications such as Alexa can “get to that last mile, which is really hard.”

Anthropic’s chief executive, Dario Amodei, framed the broader agent problem in similar terms when speaking to the FT last year.

“[T]he most challenging thing about AI agents is making sure they’re safe, reliable, and predictable,” Anthropic’s chief executive, Dario Amodei, told the FT last year.

He added that agent-like AI software needs to reach the point “where… people can actually have trust in the system,” and said, “Once we get to that point, then we’ll release these systems.”

One current employee said more steps are still needed, including child safety filters and testing custom Alexa integrations such as smart lights and the Ring doorbell. The same employee said, “The reliability is the issue—getting it to be working close to 100 percent of the time,” adding that this is why Amazon, Apple, and Google are shipping slowly and incrementally.

Developers are waiting for a clearer path

The uncertainty is not limited to Amazon’s internal teams. Numerous third parties that build Alexa “skills” or features said they did not know when the new generative AI-enabled device would arrive or how they should build new functions for it.

Thomas Lindgren, co-founder of Swedish content developer Wanderword, said, “We’re waiting for the details and understanding.” He added, “When we started working with them they were a lot more open… then with time, they’ve changed.”

Another partner said Amazon had initially put “pressure” on developers to prepare for the next generation of Alexa, but that activity had gone quiet.

The commercial question remains unresolved too. The Alexa team, which was hit by major lay-offs in 2023, still faces the challenge of making money from the assistant. Jared Roesch, co-founder of generative AI group OctoAI, said making the assistants “cheap enough to run at scale” will be a major task.

The core issue is trust. Amazon is trying to make Alexa more capable without losing the speed and reliability that voice assistant users expect. Until hallucinations, latency, integration, safety, and cost are under control, the AI-enabled Alexa remains a promise waiting on execution.