Computer Use Safety & Sandboxing
Computer use agents operate with significant autonomy over your system. A mistake can delete files, submit forms with wrong data, or make unintended purchases. This page covers Claude's built-in safety behaviours, why sandboxing is essential, and how to implement approval gates for high-risk actions.
Claude's Built-In Safety Behaviours
Claude applies its standard safety principles to computer use. Actions it will typically refuse:
- Deleting system files or mass-deleting user data without explicit confirmation
- Sending emails, messages, or posts without human approval (when instructed to require approval)
- Making financial transactions without confirmation
- Installing software with elevated permissions without consent
- Bypassing authentication or accessing accounts the task did not explicitly authorise
These refusals are model-level ā they apply based on Claude's training, not configuration. You cannot instruct Claude to override them. For borderline cases, Claude will typically pause and ask for confirmation rather than proceeding.
Why Sandboxed VMs Are Strongly Recommended
Despite Claude's built-in safety behaviours, sandboxing is the critical structural safeguard:
- Mistakes happen: Claude may click the wrong button, misread a dialog, or misinterpret the task scope ā especially in complex UIs
- Prompt injection risk: Web pages and documents Claude reads may contain instructions attempting to hijack its behaviour ā running inside a sandbox limits the blast radius
- Irreversible actions: File deletion, form submissions, and data mutations cannot always be undone ā sandbox isolation prevents these from affecting real systems
- Credential isolation: A sandboxed environment prevents Claude from accessing credentials stored in your real browser or OS keychain
Setting Up a Sandboxed Environment
Three viable sandboxing options, in order of isolation strength:
Docker container (recommended for most use cases)
- Run a Docker image with Xvfb + browser + your automation script
- Container is destroyed after each task ā clean state every run
- Mount only the specific directories Claude needs access to
- No access to host file system beyond mounted volumes
- Anthropic provides a reference Docker image for computer use
Cloud VM (for long-running or parallel tasks)
- Provision a cloud VM (EC2, GCE, Azure VM) per task
- Start from a clean image; terminate after task completion
- No credentials or sensitive data on the VM at start
- Network access can be restricted via VPC security groups
Virtual machine (for persistent environments)
- Snapshot the VM before each task; restore if something goes wrong
- Isolates from host OS; can be reset to a known-good state
- Higher overhead than containers but provides stronger OS-level isolation
Human Approval Gates
For high-stakes actions, pause the automation loop and require human confirmation before proceeding. Implement at the action execution layer:
HIGH_RISK_PATTERNS = [
"submit", # Form submission
"send", # Email/message sending
"delete", # File/data deletion
"confirm", # Confirmation dialogs
"purchase", # Transaction completion
"pay",
]
def requires_approval(action: dict, context: str) -> bool:
"""Check if an action requires human approval."""
action_type = action.get("action", "")
# Check if this is a click on a high-risk element
if action_type in ("left_click", "double_click"):
context_lower = context.lower()
return any(pattern in context_lower for pattern in HIGH_RISK_PATTERNS)
return False
def request_human_approval(action: dict, context: str) -> bool:
"""Pause and ask human to approve/deny."""
print(f"\nā ļø HIGH-RISK ACTION DETECTED")
print(f"Action: {action}")
print(f"Context: {context}")
approval = input("Approve? (yes/no): ").strip().lower()
return approval == "yes"Scope Restriction: Limiting Access
Apply least-privilege principles to computer use deployments:
- Domain allowlist: For browser automation, restrict Claude to specific domains. Intercept navigation actions and block requests to non-allowlisted domains.
- File system scope: Mount only the directories Claude needs in the sandbox container ā do not expose the full file system
- Network isolation: Block outbound network access except to required endpoints via container/VM networking rules
- Credential hygiene: Use dedicated task-specific accounts with minimal permissions ā never run computer use agents under admin accounts or with access to production credentials
- Time limits: Set a wall-clock timeout on the entire automation run ā a hung or stuck agent should not run indefinitely
Checklist: Do You Understand This?
- Claude refuses high-risk actions by default (delete, send, purchase without confirmation) ā model-level, not configurable
- Sandboxing is mandatory for production: Docker container (recommended), cloud VM, or VM with snapshot/restore
- Approval gates: intercept high-risk action types at the execution layer before sending the action to the OS
- Scope restriction: domain allowlist, minimal file system mount, network isolation, task-specific low-privilege accounts
- Never run computer use agents with admin credentials or access to production systems without explicit sandboxing