Ars Technica AI July 12, 2024 NEUTRAL

Why OpenAI’s AGI levels put reasoning AI in the spotlight

OpenAI has reportedly shown employees a five-level framework for tracking progress toward AGI, with today’s systems placed at Level 1. Executives reportedly said the company is close to Level 2, called “Reasoners,” but the framework remains a work in progress and is not a scientific consensus measure.

OpenAI is trying to give shape to one of the most contested ideas in artificial intelligence: progress toward AGI. According to Bloomberg, the company recently shared a five-tier framework with employees that starts with today’s conversational AI and ends with systems that could do the work of an organization.

The framework puts reasoning AI at the center of the next step. OpenAI reportedly believes its current technology, including GPT-4o powering ChatGPT, sits at Level 1, while executives told staff the company is nearing Level 2, called “Reasoners.”

What OpenAI’s five levels describe

The system was shared on Tuesday during an all-hands meeting, according to an OpenAI spokesperson who spoke with Bloomberg. OpenAI plans to share the levels with investors, and the company also plans to collect feedback from employees, investors, and board members as it refines the framework.

Bloomberg lists OpenAI’s “Stages of Artificial Intelligence” this way:

Level 1: Chatbots, AI with conversational language
Level 2: Reasoners, human-level problem solving
Level 3: Agents, systems that can take actions
Level 4: Innovators, AI that can aid in invention
Level 5: Organizations, AI that can do the work of an organization

In this framing, OpenAI’s current products are still in the chatbot stage. The next level would mark a shift from fluent conversation toward problem-solving ability. A Level 2 system would reportedly handle basic problem-solving at the level of a human with a doctorate degree, without access to external tools.

During the same meeting, OpenAI leadership reportedly demonstrated a research project using GPT-4. Researchers involved believed the project showed signs of approaching human-like reasoning ability, according to someone familiar with the discussion who spoke with Bloomberg.

Why reasoning AI matters to the AGI story

OpenAI has described AGI as its primary goal. In the source article’s framing, AGI is a hypothetical concept: an AI system able to perform novel tasks like a human without specialized training. That definition is broad, and the article notes that the term remains nebulous.

Reasoning AI matters because it sits between the conversational systems people use today and the more ambitious systems OpenAI’s framework imagines. A chatbot can respond in natural language. A “Reasoner,” as OpenAI’s Level 2 label suggests, would be judged by a different standard: whether it can solve problems in a more human-like way.

That distinction is important for how the company communicates its progress. If OpenAI can point to a level-based framework, it can describe the path to AGI as a sequence of stages instead of a single distant claim. But the same structure also makes the framework vulnerable to overinterpretation, because several of the levels describe technologies that do not yet exist.

The upper stages are especially ambitious. Level 3 “Agents” could work autonomously on tasks for days. Level 4 systems would help generate novel innovations. Level 5 would represent AI capable of doing the work of an organization.

The framework is useful, but limited

The five-level model may help OpenAI explain its internal expectations and long-term goals. It gives employees, investors, and board members a shared vocabulary for discussing capability. It also creates a simple way to place products like ChatGPT and GPT-4o inside a broader roadmap.

But a clear ladder does not automatically make AI progress easy to measure. The source article stresses that there is no consensus in the AI research community on how to measure progress toward AGI, or even whether AGI is a well-defined or achievable goal.

That makes the framework best understood as a communications tool rather than a settled technical standard. It can show what OpenAI wants to build and how the company wants to discuss advancement. It cannot, by itself, prove that a system has crossed a meaningful scientific threshold.

Ars Technica asked OpenAI about the ranking system and the accuracy of the Bloomberg report. A company spokesperson said they had “nothing to add.”

Other AI labs are also trying to rank progress

OpenAI is not the only company trying to classify AI capability. Bloomberg notes that OpenAI’s model resembles the levels of autonomous driving used by automakers. The comparison is useful because both approaches turn a complex technical landscape into a sequence of stages.

Other AI labs have been working on similar questions. In November 2023, researchers at Google DeepMind proposed their own five-level framework for assessing AI advancement. Anthropic has also published “AI Safety Levels,” or ASLs, for the Claude AI assistant.

Those systems do not measure exactly the same thing. Anthropic’s ASLs focus more explicitly on safety and catastrophic risks, including ASL-2, which refers to “systems that show early signs of dangerous capabilities.” OpenAI’s reported levels focus on general capabilities, from conversational language to organization-level work.

Still, all of these systems face a shared problem. They try to categorize emerging and hypothetical capabilities before there is broad agreement on what should count as meaningful progress. A level system can make discussion easier, but it can also make uncertain progress look more linear than it really is.

What to watch next

The key question is not only whether OpenAI can build stronger reasoning AI. It is also whether the company can define its milestones in a way that withstands scrutiny beyond internal meetings and investor conversations.

The reported Level 2 claim is the immediate focus because it is the next step in OpenAI’s own framework. If the company presents more evidence around “Reasoners,” the important details will be how the capability is measured, what the system can do without external tools, and whether outside researchers can interpret the results in a consistent way.

For now, OpenAI’s AGI levels offer a revealing look at how the company wants to frame its future. They make the roadmap easier to understand, but they do not resolve the deeper debate over AGI, reasoning AI, or how to measure progress toward systems that remain largely hypothetical.