MIT Tech Review AI January 23, 2025 TERMINATOR

OpenAI Operator brings AI agents to everyday browser tasks

OpenAI has released Operator, a web app that can carry out simple online tasks in a browser for ChatGPT Pro users in the US. Powered by Computer-Using Agent, or CUA, the tool shows how AI agents are shifting from generating answers to taking actions on websites.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 1 ►

A browser-using AI agent increases autonomy in everyday online tasks, though the described launch is limited and mostly practical.

OpenAI Operator brings AI agents to everyday browser tasks

OpenAI has introduced Operator, its first AI agent, as a web app that can perform simple online tasks in a browser. The launch puts OpenAI into a fast-forming race with Anthropic and Google DeepMind over AI systems that can use ordinary computer interfaces on a user's behalf.

What Operator Is Designed To Do

Operator is built to handle practical web tasks that usually require clicking through pages, filling fields, choosing options, and waiting for confirmations. The examples shown include booking concert tickets, placing an online grocery order, and making a restaurant reservation.

The app is powered by a new model called Computer-Using Agent, shortened to CUA and pronounced “coo-ah”. CUA is built on top of OpenAI’s multimodal large language model GPT-4o.

At launch, Operator is available at operator.chatgpt.com to people in the US who subscribe to ChatGPT Pro, OpenAI’s premium $200-a-month service. OpenAI says it plans to make the tool available to other users in the future.

The broader idea is straightforward: instead of only producing text or images, an AI system can now attempt to complete a task inside a browser. That is a meaningful shift because many online services are still designed around interfaces that people operate directly, not around specialized software connections.

How CUA Uses The Screen

Operator works in a way that resembles how a person handles a website. It takes screenshots of a computer screen, reads the pixels, identifies interface elements such as buttons, text boxes, menus, and dropdowns, then decides what action to take next.

After one action, it scans the screen again and continues. This loop lets the model move through multi-step tasks on websites that use familiar graphical interfaces.

Reiichiro Nakano, a scientist at OpenAI, contrasted this with older ways of connecting AI models to software. “Traditionally the way models have used software is through specialized APIs,” he says. He adds that relying on APIs leaves many apps and most websites outside the model's reach: “But if you create a model that can use the same interface that humans use on a daily basis, it opens up a whole new range of software that was previously inaccessible.”

CUA also breaks larger jobs into smaller steps and can backtrack when it gets stuck. OpenAI says the model was trained with techniques similar to those used for its reasoning models, o1 and o3.

Why This Launch Matters

OpenAI is not alone in pursuing this direction. Anthropic has Computer Use, a version of Claude 3.5 Sonnet that can carry out simple tasks on a computer. Google DeepMind has Mariner, a web-browsing agent built on Gemini 2.0.

OpenAI claims CUA outperforms those rival tools on industry benchmarks. On OSWorld, which tests tasks such as merging PDF files or manipulating an image, CUA scores 38.1% compared with Computer Use’s 22.0%. Humans score 72.4% on the same benchmark.

On WebVoyager, which tests browser task performance, CUA scores 87%, Mariner scores 83.5%, and Computer Use scores 56%. Mariner can only carry out tasks in a browser, so it does not score on OSWorld.

For now, Operator itself is limited to browser tasks. OpenAI plans to make CUA’s wider abilities available later through an API that developers can use to build their own apps, similar to how Anthropic released Computer Use in December.

Ali Farhadi, CEO of the Allen Institute for AI, sees computer use as a natural first step for agents. “Moving from generating text and images to doing things is the right direction,” he says. “It unlocks business, solves new problems.” He also says the computer screen is “constrained enough that the current state of the technology can actually work,” while still being useful enough that people might adopt it.

Still Experimental, Not Superintelligence

The launch followed online rumors about what OpenAI might reveal. One rumor pointed to an agent-based app after details about Operator appeared on social media before release. Another suggested OpenAI was about to reveal a new superintelligence and brief officials for newly inaugurated President Trump.

The Operator announcement confirms the first rumor, not the second. OpenAI gave MIT Technology Review a preview of Operator, and the takeaway is more practical than sensational: the tool offers an early look at AI models that can take action, but it is not presented as a finished system.

Yash Kumar, a researcher at OpenAI, put the limits plainly: “It’s still early, it still makes mistakes.” OpenAI CEO Sam Altman also pushed back on the superintelligence speculation in a January 20 post: “twitter hype is out of control again,” and “pls chill and cut your expectations 100x!”

OpenAI says it has tested CUA's safety with red teams. Those tests explored cases where users request unacceptable tasks, where websites include hidden instructions meant to derail the model, and where the model itself breaks down. Casey Chu, another researcher on the team, says, “We’ve trained the model to stop and ask the user for information before doing anything with external side effects.”

What The Demo Shows

To use Operator, a person types instructions into a text box. The task then runs in a remote browser on an OpenAI server, rather than in the browser on the user's own computer. OpenAI says this makes the system more efficient.

That cloud-based approach also allows Operator to run multiple tasks at once, according to Kumar. In a live demo, he asked it to book a table for two at 6.30 p.m. at Octavia in San Francisco using OpenTable. As Operator began working through the website, Kumar said, “As you can see, my hands are off the keyboard.”

During the same demo, Operator was also asked to find four tickets for a Kendrick Lamar show on StubHub and to use a photo of a handwritten shopping list to add items to Instacart. Kumar moved between Operator’s tabs while the tasks continued.

OpenAI is collaborating with businesses including OpenTable, StubHub, Instacart, DoorDash, and Uber. The source does not make the exact nature of those collaborations clear, but Operator appears to suggest preset websites for certain tasks.

The most important signal is not that Operator can do every task perfectly. It is that major AI companies are now building systems around the same target: software that can see a screen, make choices, and ask for help when a task needs confirmation. That makes the browser a central testing ground for the next stage of AI agents.