OpenAI Pushes ChatGPT Agent From Answers Into Action

OpenAI is rolling out ChatGPT agent, a general purpose AI agent that can use its own computer to complete tasks such as navigating calendars, creating editable presentations and running code. The system combines earlier agentic capabilities from Operator and Deep Research, while adding safeguards because OpenAI says the product may carry new risks.

WTF Index TERMINATOR
◄ Terminator 3 Idiocracy 1 ►

A general-purpose agent that can navigate websites, use apps, and run code increases autonomy and control risks, even with safeguards.

OpenAI Pushes ChatGPT Agent From Answers Into Action

OpenAI is moving ChatGPT further beyond conversation. The company is launching ChatGPT agent, a general purpose AI agent inside ChatGPT that is designed to carry out computer-based work for users, not only respond to prompts.

The new tool can navigate a user's calendar, create editable presentations and slideshows, and run code. OpenAI says users can control it through ordinary natural-language prompts, with access beginning Thursday for subscribers to OpenAI's Pro, Plus, and Team plans.

A More Active Version Of ChatGPT

ChatGPT agent brings together capabilities OpenAI had previously separated across different agentic products. It includes Operator's ability to click through websites and Deep Research's ability to pull together information from dozens of websites into a compact research report.

The result is meant to be a unified agentic system. Instead of asking ChatGPT a question and receiving only an answer, users can ask it to take steps across tools, websites, and files. To use it, subscribers select "agent mode" from ChatGPT's tools dropdown.

OpenAI described the system as ChatGPT using its own computer to do work. That framing matters because the product is aimed at tasks that require a sequence of actions: searching, deciding what to do next, using tools, and producing a finished output.

What The Agent Can Do

The examples OpenAI gives show why this launch is broader than a chatbot feature. The company says users could ask ChatGPT agent to "plan and buy ingredients to make Japanese breakfast for four," or to "analyze three competitors and create a slide deck."

Those tasks require more than generating text. The agent has to interpret the request, move through websites, gather information, plan an order of operations, and use available tools. In the slide deck example, it also needs to turn the research into an editable presentation.

ChatGPT agent can also access ChatGPT connectors. That means users can connect apps such as Gmail and GitHub so the agent can find information relevant to a prompt. OpenAI also says the agent has access to a terminal and can use APIs to reach certain apps.

In practical terms, the product is designed for workflows where the user wants an outcome rather than a paragraph. Research, scheduling context, code execution, presentation creation, and app-connected retrieval all sit inside that broader idea.

Why OpenAI Says This Version Is More Capable

OpenAI says ChatGPT agent is more capable than its earlier agent offerings. The company points to benchmark results for the model behind the tool, including scores on Humanity's Last Exam and FrontierMath.

On Humanity's Last Exam, which OpenAI describes as a difficult test with thousands of questions across more than one hundred subjects, the ChatGPT agent model scores 41.6% on pass@1. OpenAI says that is roughly double the performance of o3 and o4-mini on the same test.

On FrontierMath, OpenAI says ChatGPT agent scores 27.4% when it can use tools such as a terminal for code execution. The previous state-of-the-art score cited in the source comes from o4-mini, which scored 6.3%.

These benchmark claims support OpenAI's argument that the system is not just a repackaging of earlier agent tools. The company is presenting ChatGPT agent as a more capable model paired with a broader toolset, intended to handle complicated computer-based tasks more reliably than prior agentic products.

The Safety Tradeoff

The same abilities that make ChatGPT agent useful also make it more sensitive from a safety standpoint. OpenAI says it developed the product with safety in mind because the tool has new capabilities that could be more dangerous if misused.

In a safety report for ChatGPT agent, OpenAI says it has designated the model as "high capability" in biological and chemical weapon domains. In OpenAI's Preparedness Framework, that means a model has the ability to "amplify existing pathways to severe harm."

OpenAI says it does not have direct evidence of that harm, but it is taking a precautionary approach. The company says it has activated safeguards intended to reduce those risks while users interact with the product.

One safeguard is a real-time monitor. OpenAI says it runs a classifier across every prompt entered into ChatGPT agent to determine whether the request is related to biology. If it is, the agent's response is then checked by a second monitor to decide whether the content could be used to evoke a biological threat.

OpenAI has also disabled ChatGPT's memory feature for this agent. In other parts of ChatGPT, memory allows the chatbot to use information from previous chats. For ChatGPT agent, OpenAI says bad actors could exploit memory through prompt injection attacks to exfiltrate sensitive data. The company says it may revisit adding memory later.

The Real-World Test Comes Next

ChatGPT agent arrives at a moment when AI agents are a major focus across Silicon Valley. Companies including OpenAI, Google, and Perplexity have introduced agent products that promise to offload more work from users.

So far, however, early AI agents have struggled with complex tasks and have often looked less compelling than the broad vision described by tech executives. OpenAI's launch is therefore both a product release and a test of whether a more capable agent can perform well outside benchmarks.

The open question is how durable ChatGPT agent will be in real use. The source notes that agent technology has been relatively brittle when interacting with the real world. OpenAI's claim is that this version is stronger, more unified, and better equipped to deliver on the promise of AI agents.

For users, the immediate change is simple: ChatGPT is becoming a place where work can be delegated, not just discussed. Whether that shift becomes routine will depend on how well ChatGPT agent handles the messy, multi-step tasks people actually need done.