TechCrunch AI March 11, 2025 TERMINATOR

New OpenAI tools push AI agents closer to real work

OpenAI has released the Responses API and Agents SDK to help developers build AI agents for search, file retrieval, and computer-use tasks. The launch also shows how much work remains before AI agents become reliable, widely used business tools.

WTF Index TERMINATOR

◄ Terminator 2 Idiocracy 0 ►

The story mildly leans toward more autonomous AI agents that can search, use files, and operate computers, though it is mostly a developer-tool launch.

New OpenAI tools push AI agents closer to real work

OpenAI is trying to move AI agents from impressive demonstrations into software that businesses can actually build on. Its new Responses API gives developers access to tools for web search, file search, and computer use, while a separate open-source Agents SDK is meant to help teams connect models to internal systems and monitor what agents do.

What OpenAI Released

On Tuesday, OpenAI released new tools aimed at developers and enterprises building AI agents. In the source article, AI agents are described as automated systems that can independently complete tasks.

The main release is the Responses API. It lets businesses create custom AI agents that can search the web, scan company files, and navigate websites in a way that resembles OpenAI's Operator product.

The Responses API effectively replaces the Assistants API. OpenAI plans to sunset the Assistants API in the first half of 2026, which makes this release more than a small developer update. It signals where OpenAI wants businesses to build next.

The company is also releasing the Agents SDK, an open-source toolkit. It gives developers free tools to integrate models with internal systems, add safeguards, and monitor agent activity for debugging and optimization. The Agents SDK follows OpenAI's Swarm, a framework for multi-agent orchestration that the company released late last year.

Why AI Agents Are Under Pressure

The timing matters because expectations around AI agents have risen faster than clear evidence of everyday usefulness. The tech industry has struggled to show people what AI agents are, and even the definition of an agent remains unsettled.

That gap between promise and practical value has created pressure on major AI companies. The source article points to Butterfly Effect, a Chinese startup that went viral earlier this week for an AI agent platform called Manus. Users quickly found that Manus did not deliver on many of the company's promises.

For OpenAI, that makes reliability and repeated use central questions. Olivier Godement, OpenAI's API product head, told TechCrunch, "It's pretty easy to demo your agent," and added, "To scale an agent is pretty hard, and to get people to use it often is very hard."

OpenAI has already shown consumer-facing agent ideas inside ChatGPT. Earlier this year, it introduced Operator, which navigates websites on a user's behalf, and deep research, which compiles research reports. Both showed what agentic technology can do, but the source article says they still left quite a bit to be desired on autonomy.

The Building Blocks Inside The Responses API

The Responses API gives developers access to components that power agent-like applications. OpenAI wants developers to build Operator- and deep research-style products that feel more autonomous than what is available today.

One major component is web search. Developers can use the same AI models, in preview, that sit under OpenAI's ChatGPT Search web search tool: GPT-4o search and GPT-4o mini search. These models can browse the web for answers and cite sources while generating responses.

OpenAI claims the search models are highly factually accurate. On the SimpleQA benchmark, which measures whether models can answer short, fact-seeking questions, GPT-4o search scores 90% and GPT-4o mini search scores 88%. GPT-4.5, described in the source as a much larger, recently released model, scores 63%.

The Responses API also includes file search. That utility can scan across files in a company's databases to retrieve information. OpenAI says it will not train models on those files.

Another component is the Computer-Using Agent model, or CUA. This is the model that powers Operator. It generates mouse and keyboard actions, which lets developers automate computer-use tasks such as data entry and app workflows.

Enterprises can optionally run the CUA model locally on their own systems. The CUA model is being released in research preview. By contrast, the consumer version available through Operator can only take actions on the web.

The Limits Are Still Clear

OpenAI's release does not make the hardest agent problems disappear. The source article is explicit that the Responses API will not solve all of the technical problems affecting AI agents today.

Search-backed AI tools can be more accurate than traditional AI models because they can look up information. But web search does not make AI hallucinations a solved problem. GPT-4o search still gets 10% of factual questions wrong.

The source article also notes other weaknesses. AI search tools tend to struggle with short, navigational queries such as "Lakers score today," and recent reports suggest ChatGPT's citations are not always reliable.

Computer-use automation has its own risks. In a blog post provided to TechCrunch, OpenAI said the CUA model is "not yet highly reliable for automating tasks on operating systems" and can make "inadvertent" mistakes.

That matters for enterprise AI agents because mistakes in a business workflow can have consequences. A tool that searches files, enters data, or navigates applications must do more than work in a controlled demonstration. It has to behave predictably enough for developers and users to trust it repeatedly.

From Agent Demos To Agent Products

The broader message of the release is that OpenAI wants to package agent capabilities as developer infrastructure. Instead of only showing agents inside ChatGPT, the company is giving businesses APIs and tooling to build their own versions.

That shift could make AI agents more specific to real workflows. A company may not need a general agent that can do everything. It may need a system that can search the right files, use approved models, follow safeguards, and let developers inspect what happened when something goes wrong.

Godement said he hopes OpenAI can bridge the gap between AI agent demos and products this year. He also said that, in his opinion, "agents are the most impactful application of AI that will happen."

That view echoes OpenAI CEO Sam Altman's January statement that 2025 is the year AI agents enter the workforce. Whether that proves true or not, the Responses API and Agents SDK show that OpenAI is trying to move the discussion from hype toward usable tools for developers and enterprises.