AWS outage puts AI coding tools under sharper scrutiny

Amazon Web Services had a 13-hour interruption in mid-December after engineers allowed its Kiro AI coding tool to make changes. Amazon says the incidents were user error, not AI error, and says it has added safeguards including mandatory peer review and staff training.

WTF Index TERMINATOR
◄ Terminator 3 Idiocracy 1 ►

Agentic coding tools were linked to service disruptions after being allowed to make production-impacting changes, raising concerns about autonomy and control.

AWS outage puts AI coding tools under sharper scrutiny

Amazon Web Services is facing internal questions after errors involving its own AI tools were tied to at least two service disruptions, including a 13-hour interruption in mid-December.

The December incident involved Kiro, AWS’s AI coding tool. According to four people familiar with the matter, engineers allowed the tool to make certain changes before one AWS system used by customers was interrupted.

What happened in December

The affected AWS system lets customers explore the costs of its services. Amazon posted an internal postmortem about the “outage” of that system after the incident.

The people familiar with the matter said Kiro, described as an agentic tool that can take autonomous actions on behalf of users, decided that the best course of action was to “delete and recreate the environment.”

Amazon said the December incident was an “extremely limited event” affecting only a single service in parts of mainland China. The company also said it was a “coincidence that AI tools were involved” and that “the same issue could occur with any developer tool or manual action.”

Amazon’s explanation centers on human control and permissions. The company said Kiro, by default, “requests authorisation before taking any action,” but that the engineer involved in the December incident had “broader permissions than expected—a user access control issue, not an AI autonomy issue.”

Why employees are concerned

Multiple Amazon employees told the FT that this was the second recent disruption in which one of the group’s AI tools had been at the center of a service problem.

“We’ve already seen at least two production outages [in the past few months],” said one senior AWS employee. “The engineers let the AI [agent] resolve an issue without intervention. The outages were small but entirely foreseeable.”

The earlier disruption involved Amazon Q Developer, three employees said. Amazon Q Developer is an AI-enabled chatbot that the group had used to help engineers write code before Kiro.

Amazon said the second incident did not affect a “customer facing AWS service.” It also said that, in both instances, the problem was not the AI system itself.

“In both instances, this was user error, not AI error,” Amazon said.

The company added that it had not seen evidence that mistakes were more common with AI tools.

The permissions question

The incidents point to a practical issue for AI coding tools in production environments: what an AI assistant is allowed to do once it is operating as part of an engineer’s workflow.

Employees said the group’s AI tools were treated as an extension of an operator and given the same permissions. In the two cases described, the engineers involved did not require a second person’s approval before making changes, as would normally be the case.

That distinction matters because Amazon’s position is not that Kiro acted without authorization by default. Instead, Amazon says the December incident involved broader-than-expected permissions for the engineer, making it a user access control issue.

After the December incident, Amazon said AWS implemented safeguards including mandatory peer review and staff training. Those changes address the process around AI-assisted engineering work rather than presenting the incident as a failure of Kiro alone.

A bigger test for AI coding assistants

AWS launched Kiro in July. The company said the coding assistant would go beyond “vibe coding,” which allows users to quickly build applications, and instead write code based on a set of specifications.

AWS is also seeking to build and deploy AI tools, including agents capable of taking independent actions based on human instructions. Like many Big Tech companies, Amazon wants to sell this technology to outside customers.

That makes the internal outages more than routine engineering incidents. They show the tension between efficiency gains and operational risk when AI coding tools are introduced into systems where permissions, review steps, and change controls matter.

Some Amazon employees said they remained skeptical of the usefulness of AI tools for much of their work because of the risk of error. They also said the company had set a target for 80 percent of developers to use AI for coding tasks at least once a week and was closely tracking adoption.

Amazon said it was seeing strong customer growth for Kiro and wanted customers and employees to benefit from efficiency gains.

How severe were the outages?

The reported disruptions were far smaller than a separate 15-hour AWS outage in October 2025, which forced multiple customers’ apps and websites offline, including OpenAI’s ChatGPT.

Still, the December interruption lasted 13 hours for one AWS system used by customers to explore service costs. The internal debate is not only about the scale of the outage, but about whether AI coding tools should be allowed to make production changes without additional checks.

Amazon’s answer is that the issue was user error and access control. Some employees’ answer is that the risk was foreseeable when engineers allowed an AI agent to resolve an issue without intervention.

The result is a sharper focus on how companies deploy agentic AI tools inside critical engineering workflows. The facts in this case suggest that the tool, the user, the permission model, and the review process all matter when AI coding assistants move from code suggestions to operational action.