Computer Use Agents

Computer use agents can see a screen and control a computer — clicking, typing, scrolling, and navigating any application without needing an API. This unlocks automation for legacy systems, web UIs, and anything that lacks a formal interface, but introduces unique security and reliability requirements that demand careful architecture.

This section covers platform-agnostic computer use concepts and implementations. For Claude-specific computer use — browser automation, desktop pipelines, and Claude Code as a local agent — see Master Claude → Computer Use.

In This Section

What is Computer Use

How screenshot-based AI agents perceive and control UIs — the perception loop, action types, and what makes computer use fundamentally different from tool calling.

Major Implementations

Claude Computer Use, OpenAI Operator, Amazon Nova Act, Open Interpreter, Browser Use — how they compare on reliability, cost, and integration.

Sandboxing & Security

Why computer use agents must run in isolated environments — VM/container sandboxing, allow-lists, budget limits, and approval gates for irreversible actions.

Use Cases & Limitations

Where computer use agents genuinely win (legacy system automation, browser tasks), where they fail, and the performance/cost reality check.