Advanced

Computer Use Safety & Sandboxing

Computer use agents operate with significant autonomy over your system. A mistake can delete files, submit forms with wrong data, or make unintended purchases. This page covers Claude's built-in safety behaviours, why sandboxing is essential, and how to implement approval gates for high-risk actions.

Claude's Built-In Safety Behaviours

Claude applies its standard safety principles to computer use. Actions it will typically refuse:

Deleting system files or mass-deleting user data without explicit confirmation
Sending emails, messages, or posts without human approval (when instructed to require approval)
Making financial transactions without confirmation
Installing software with elevated permissions without consent
Bypassing authentication or accessing accounts the task did not explicitly authorise

These refusals are model-level — they apply based on Claude's training, not configuration. You cannot instruct Claude to override them. For borderline cases, Claude will typically pause and ask for confirmation rather than proceeding.

Why Sandboxed VMs Are Strongly Recommended

Despite Claude's built-in safety behaviours, sandboxing is the critical structural safeguard:

Mistakes happen: Claude may click the wrong button, misread a dialog, or misinterpret the task scope — especially in complex UIs
Prompt injection risk: Web pages and documents Claude reads may contain instructions attempting to hijack its behaviour — running inside a sandbox limits the blast radius
Irreversible actions: File deletion, form submissions, and data mutations cannot always be undone — sandbox isolation prevents these from affecting real systems
Credential isolation: A sandboxed environment prevents Claude from accessing credentials stored in your real browser or OS keychain

Setting Up a Sandboxed Environment

Three viable sandboxing options, in order of isolation strength:

Docker container (recommended for most use cases)

Run a Docker image with Xvfb + browser + your automation script
Container is destroyed after each task — clean state every run
Mount only the specific directories Claude needs access to
No access to host file system beyond mounted volumes
Anthropic provides a reference Docker image for computer use

Cloud VM (for long-running or parallel tasks)

Provision a cloud VM (EC2, GCE, Azure VM) per task
Start from a clean image; terminate after task completion
No credentials or sensitive data on the VM at start
Network access can be restricted via VPC security groups

Virtual machine (for persistent environments)

Snapshot the VM before each task; restore if something goes wrong
Isolates from host OS; can be reset to a known-good state
Higher overhead than containers but provides stronger OS-level isolation

Human Approval Gates

For high-stakes actions, pause the automation loop and require human confirmation before proceeding. Implement at the action execution layer:

HIGH_RISK_PATTERNS = [
    "submit",      # Form submission
    "send",        # Email/message sending
    "delete",      # File/data deletion
    "confirm",     # Confirmation dialogs
    "purchase",    # Transaction completion
    "pay",
]

def requires_approval(action: dict, context: str) -> bool:
    """Check if an action requires human approval."""
    action_type = action.get("action", "")
    # Check if this is a click on a high-risk element
    if action_type in ("left_click", "double_click"):
        context_lower = context.lower()
        return any(pattern in context_lower for pattern in HIGH_RISK_PATTERNS)
    return False

def request_human_approval(action: dict, context: str) -> bool:
    """Pause and ask human to approve/deny."""
    print(f"\n⚠️  HIGH-RISK ACTION DETECTED")
    print(f"Action: {action}")
    print(f"Context: {context}")
    approval = input("Approve? (yes/no): ").strip().lower()
    return approval == "yes"

Scope Restriction: Limiting Access

Apply least-privilege principles to computer use deployments:

Domain allowlist: For browser automation, restrict Claude to specific domains. Intercept navigation actions and block requests to non-allowlisted domains.
File system scope: Mount only the directories Claude needs in the sandbox container — do not expose the full file system
Network isolation: Block outbound network access except to required endpoints via container/VM networking rules
Credential hygiene: Use dedicated task-specific accounts with minimal permissions — never run computer use agents under admin accounts or with access to production credentials
Time limits: Set a wall-clock timeout on the entire automation run — a hung or stuck agent should not run indefinitely

Checklist: Do You Understand This?

Claude refuses high-risk actions by default (delete, send, purchase without confirmation) — model-level, not configurable
Sandboxing is mandatory for production: Docker container (recommended), cloud VM, or VM with snapshot/restore
Approval gates: intercept high-risk action types at the execution layer before sending the action to the OS
Scope restriction: domain allowlist, minimal file system mount, network isolation, task-specific low-privilege accounts
Never run computer use agents with admin credentials or access to production systems without explicit sandboxing