Computer Use Agents
Computer use agents can see a screen and control a computer — clicking, typing, scrolling, and navigating any application without needing an API. This unlocks automation for legacy systems, web UIs, and anything that lacks a formal interface, but introduces unique security and reliability requirements that demand careful architecture.
In This Section
What is Computer Use
How screenshot-based AI agents perceive and control UIs — the perception loop, action types, and what makes computer use fundamentally different from tool calling.
Major Implementations
Claude Computer Use, OpenAI Operator, Amazon Nova Act, Open Interpreter, Browser Use — how they compare on reliability, cost, and integration.
Sandboxing & Security
Why computer use agents must run in isolated environments — VM/container sandboxing, allow-lists, budget limits, and approval gates for irreversible actions.
Use Cases & Limitations
Where computer use agents genuinely win (legacy system automation, browser tasks), where they fail, and the performance/cost reality check.