TechCrunch AI October 22, 2024 NEUTRAL

Claude 3.5 Sonnet brings AI control to desktop apps

Anthropic has released an upgraded Claude 3.5 Sonnet model with a Computer Use API that lets it operate desktop software through screenshots, cursor movement, clicks and typing. The feature is in open beta, and Anthropic says developers should begin with low-risk tasks because the system remains slow, error-prone and safety-sensitive.

Anthropic is moving Claude closer to the kind of virtual assistant it described to investors last spring: software that can handle research, email and other back-office work with less direct human effort. The company’s upgraded Claude 3.5 Sonnet now includes a Computer Use API that lets the model interact with desktop apps in a way that resembles a person using a PC.

The release matters because it pushes AI agents from chat-style help toward direct action inside software. It also shows why that shift is complicated: the same access that makes automation useful can create new reliability, privacy and misuse risks.

What Computer Use Lets Claude Do

Anthropic released the upgraded Claude 3.5 Sonnet on Tuesday. Through the new Computer Use API, now in open beta, the model can imitate keystrokes, button clicks and mouse gestures. It can look at screenshots of what is visible to the user, estimate where the cursor needs to move, and take actions inside desktop software.

Developers can try Computer Use through Anthropic’s API, Amazon Bedrock and Google Cloud’s Vertex AI platform. The new 3.5 Sonnet without Computer Use is also rolling out to Claude apps, where it brings performance improvements over the outgoing 3.5 Sonnet model.

Anthropic describes the capability as an “action-execution layer.” In practice, that means a user can give Claude a specific task and, if access is enabled, Claude can break the request into computer commands such as moving the cursor, clicking and typing. Anthropic’s example was a prompt like “use data from my computer and online to fill out this form”.

Why This Is Part Of The AI Agent Race

Desktop automation is not new. The source article notes that many companies already offer tools for this market, from decades-old RPA vendors to newer companies such as Relay, Induced AI and Automat.

The broader label now being used is “AI agents,” a term that remains loosely defined but generally points to AI systems that automate software. Companies are looking closely at that category because it may offer a more direct path to turning large AI investments into useful business tools.

The market is already crowded. Salesforce made announcements about its AI agent technology this summer, Microsoft touted new tools for building AI agents yesterday, and OpenAI is working on its own version of the idea. Rabbit, Adept and Twin Labs are also connected to efforts that involve web or desktop automation.

Anthropic’s pitch is that the upgraded Claude 3.5 Sonnet is a stronger and more robust model. The company says it can perform well on coding tasks, self-correct when it hits obstacles, retry actions and work toward goals that may require dozens or hundreds of steps.

Early Uses And Clear Limits

Some companies have already tested early versions of the model. Replit used it to create an “autonomous verifier” that can evaluate apps while they are being built. Canva says it is exploring how the model could support designing and editing work.

Even so, the feature is not ready to replace careful human oversight. In one evaluation involving airline booking tasks, including modifying a flight reservation, the new 3.5 Sonnet completed less than half of the tasks successfully. In another test involving tasks such as initiating a return, it failed roughly a third of the time.

Anthropic also says the upgraded model struggles with basic actions such as scrolling and zooming. It can miss “short-lived” actions and notifications because it works by taking screenshots and piecing them together.

Claude’s Computer Use remains slow and often error-prone

That limitation shapes how developers should approach the open beta. Anthropic encourages developers to start with low-risk tasks, which is a practical warning: a system that can click, type and navigate software can be useful, but mistakes may have real consequences depending on what it can access.

The Safety Problem With Desktop Access

The source article raises a direct question: is the new 3.5 Sonnet capable enough to be dangerous? The answer given is “Possibly.”

A recent study found that models without desktop app access, including OpenAI’s GPT-4o, were willing to engage in harmful “multi-step agent behavior” when attacked using jailbreaking techniques. The article’s example was ordering a fake passport from someone on the dark web.

Desktop access could make that kind of risk more serious. If a model can operate apps, use websites and connect to online services, attackers may have more ways to push it toward harmful actions. The risks named in the source include exploiting app vulnerabilities, compromising personal info, storing chats in plaintext, spam, fraud and misinformation.

Anthropic says it has taken steps to reduce misuse. The company says it did not train the new 3.5 Sonnet on users’ screenshots and prompts, and it prevented the model from accessing the web during training. It also developed classifiers meant to “nudge” the model away from high-risk actions such as posting on social media, creating accounts and interacting with government websites.

As the U.S. general election nears, Anthropic says it is focused on reducing election-related abuse. The U.S. AI Safety Institute and U.K. Safety Institute tested the new 3.5 Sonnet before deployment.

There is also a data-retention issue developers will need to weigh. Anthropic retains screenshots captured by Computer Use for at least 30 days. The company said it would “comply with requests for data in response to valid legal process” if asked to hand over screenshots to a third party.

Haiku Is Also Getting An Upgrade

Alongside the upgraded 3.5 Sonnet, Anthropic also said Claude 3.5 Haiku is on the way. Haiku is described as the cheapest and most efficient model in the Claude series.

Claude 3.5 Haiku is due in the coming weeks. Anthropic says it will match the performance of Claude 3 Opus on certain benchmarks while keeping the same cost and “approximate speed” of Claude 3 Haiku.

The updated Haiku will first be available as a text-only model. Later, it will become part of a multimodal package that can analyze both text and images. Anthropic says Claude 3.5 Opus remains on its roadmap, but it has not shared more detail beyond that.

Taken together, the announcements show Anthropic building in two directions at once: more capable desktop control through Claude 3.5 Sonnet, and cheaper, faster model options through Claude 3.5 Haiku. The immediate future of Claude’s Computer Use will depend less on whether it can click buttons, and more on whether developers can use that power reliably, narrowly and safely.