The Decoder July 13, 2024 TERMINATOR

Safety timeline for GPT-4 Omni puts OpenAI under scrutiny

OpenAI reportedly completed GPT-4 Omni safety testing in just one week before a May launch date. The company says it did not cut corners, while some employees and safety staff described pressure, stress and a process now being reconsidered.

WTF Index TERMINATOR

◄ Terminator 3 Idiocracy 0 ►

The story centers on rushed safety testing for a powerful AI model, raising concerns about control and catastrophic risk oversight.

Safety timeline for GPT-4 Omni puts OpenAI under scrutiny

OpenAI is facing renewed scrutiny over how quickly it moved GPT-4 Omni through safety checks before launch. According to reporting cited by The Decoder, the company compressed safety testing for its latest AI model into just one week, while some employees questioned whether speed had taken priority over depth.

The account centers on a tension now familiar in the AI industry: how quickly a major model can be shipped, and how much confidence a company can have in the safety work behind that decision. In this case, OpenAI says the work was extensive. Some people involved or familiar with the process describe a compressed timeline that left teams under pressure.

What was reported about GPT-4 Omni testing

The Washington Post reports that some OpenAI employees criticized the company for emphasizing launch speed over a more thorough process. According to three sources familiar with the matter, members of the safety team felt pressured to move faster on the new catastrophic risk testing protocol so the company could meet a May launch date set by leadership.

One anonymous source summed up that frustration bluntly: "We basically failed at the process." Another insider pointed to the company’s launch planning as a sign that the schedule was already moving ahead before safety work had begun, saying, "They planned the launch after-party prior to knowing if it was safe to launch."

Those claims do not mean that no testing happened. The source article also says an unnamed representative of the preparedness team acknowledged that the testing was completed, but on a compressed timeline. That distinction matters: the criticism is not simply that OpenAI skipped the process, but that the process was squeezed into a window some insiders viewed as too tight.

OpenAI’s response

OpenAI spokeswoman Lindsey Held rejected the idea that the company weakened its safety work. She said OpenAI "didn’t cut corners on our safety process, though we recognize the launch was stressful for our teams." She also said the company conducted "extensive internal and external" testing to meet regulatory obligations.

The company’s position, as presented in the source, is that the launch pressure did not erase the required safety steps. At the same time, the preparedness team representative quoted in the report indicated that OpenAI is now reconsidering the method used for Omni. The representative said OpenAI is "rethinking our whole way of doing it" and called the Omni approach "just not the best way to do it".

That leaves a narrow but important gap between two claims. OpenAI says the process was not bypassed. Critics inside or close to the process say the timing made the process weaker than it should have been. For users, developers and policymakers watching the company, the central question is whether a completed test is enough if the schedule leaves little room for review, debate or unexpected findings.

The delayed voice feature adds context

The Decoder also points to the delayed release of GPT-4 Omni’s voice functionality as another sign of pressure around the launch. Voice capability is now slated for fall because safety tests are still ongoing.

That delay followed confusing communications that led many users to expect voice capabilities immediately when Omni launched. The result was a mismatch between what some users believed was coming and what OpenAI ultimately made available at launch.

In practical terms, the delayed voice feature shows how model launches are no longer only about releasing a single text system. A model can arrive with some capabilities available and others held back while additional safety work continues. That can be a responsible choice, but it also makes launch messaging harder. If expectations are unclear, users may see a delay as evidence that the original rollout was rushed.

Departures sharpen the safety debate

The report also sits alongside recent departures by several high-ranking safety researchers from OpenAI. Some of those researchers have openly criticized the company’s safety practices.

William Saunders, who left in February 2024, said in a podcast that OpenAI had become more of a product company. He added that he "didn't want to end up working on the Titanic of AI."

Those departures do not prove that GPT-4 Omni was unsafe. They do, however, intensify the debate over OpenAI’s internal priorities. When safety researchers leave and publicly criticize the company, questions about launch timelines gain more weight because they become part of a broader pattern of concern.

Two readings of OpenAI’s strategy

The Decoder frames the larger issue in two possible ways. One reading is that OpenAI is acting recklessly and negligently, accepting social risks in pursuit of commercial success. Under this interpretation, the compressed GPT-4 Omni testing timeline is a warning sign about product pressure overpowering safety discipline.

The other reading is that OpenAI management may believe current safety concerns around generative AI are exaggerated. In that view, the emergence of AGI remains completely unclear, and AI safety may function partly as a marketing issue rather than a near-term technical emergency.

The source points to GPT-2 as an earlier example of how safety framing can draw attention. In 2019, OpenAI described GPT-2 as too dangerous for public release, which brought massive attention to the company. A few months later, two students replicated GPT-2’s level of performance. Compared with today’s freely available models, GPT-2’s performance is described in the source as very low.

That history complicates the current discussion. If safety warnings are overstated, companies risk turning caution into theater. If safety concerns are understated, fast launches can expose the public to risks that were not adequately examined. GPT-4 Omni’s reported one-week testing window has become a flashpoint because it sits directly between those two fears.

For now, the clearest fact is that OpenAI says it completed extensive testing and did not cut corners, while some insiders describe a stressful, compressed process that the company is now rethinking. The unresolved question is whether that process was merely difficult, or whether it revealed a deeper conflict between rapid AI deployment and the safety systems meant to govern it.