🧠 All Things AI
Intermediate

Anthropic Claude API

Anthropic's Claude API is the production choice for long-context accuracy, reliable tool use, agentic tasks (computer use, coding agents), and safety-critical workloads. This page covers the API surface, model tiers, and key capabilities.

Claude Model Family (2025–2026)

ModelContextInput ($/1M)Best for
Claude Haiku 4.5200K$0.80Fast routing, classification, high-volume simple tasks
Claude Sonnet 4.5200K$3.00Production default — coding, analysis, writing, agentic
Claude Opus 4.6200K$15.00Hardest tasks; extended thinking; research-grade quality

Messages API

Claude uses a Messages API (distinct from OpenAI's Chat Completions, though similar in structure):

POST https://api.anthropic.com/v1/messages

{
  "model": "claude-sonnet-4-5",
  "max_tokens": 1024,
  "system": "You are an expert software engineer.",
  "messages": [
    {"role": "user", "content": "Review this Python function for bugs."}
  ]
}

Key differences from OpenAI: the system prompt is a top-level field (not a message with role: system); max_tokens is required; the API key goes in the x-api-key header.

Tool Use

Claude's tool use is consistently rated among the most reliable in production. Tools are defined with a JSON schema and Claude decides when to call them:

{
  "tools": [{
    "name": "get_customer_data",
    "description": "Retrieve customer record by ID",
    "input_schema": {
      "type": "object",
      "properties": {
        "customer_id": {"type": "string"}
      },
      "required": ["customer_id"]
    }
  }]
}

Claude returns a tool_use block when it decides to call a tool. Your code executes it and returns a tool_result message; Claude then generates a final response incorporating the result.

Extended Thinking

Claude Opus 4.6 and Claude Sonnet 4.5 support extended thinking — allocating a "thinking budget" of tokens for the model to reason before answering:

{
  "model": "claude-opus-4-6",
  "max_tokens": 16000,
  "thinking": {
    "type": "enabled",
    "budget_tokens": 10000
  },
  "messages": [{"role": "user", "content": "Prove that sqrt(2) is irrational."}]
}

The response includes a thinking block (collapsible in Claude.ai) before the final answer. Thinking tokens are billed at the standard output rate. Best used for maths, formal reasoning, complex code architecture decisions.

Computer Use API

Claude's computer use capability allows it to control a desktop or browser environment by taking screenshots, clicking, typing, and navigating. Enabled via a built-in tool set:

  • computer — take screenshots, click, type, scroll
  • bash — run shell commands
  • text_editor — view and edit files

Computer use requires running a sandboxed environment (VM or container) and implementing a control loop that feeds screenshots back to Claude. See the Computer Use section under Build for detailed patterns.

Long Context: 200K Tokens Effectively

Claude's 200K context window is reliable at the edges — quality does not degrade significantly on information placed deep in the context (unlike some competitors). Effective use patterns:

  • Document analysis — Upload full documents (PDFs, text files) directly to the context; ask questions about them
  • Codebase review — Pass multiple source files; Claude can reason across them simultaneously
  • Conversation memory — Include compressed prior conversation summaries to maintain long-running context

Prompt Caching

Anthropic's prompt caching saves repeated long context (system prompts, documents, conversation history):

{
  "system": [
    {
      "type": "text",
      "text": "You are an expert...",
      "cache_control": {"type": "ephemeral"}
    }
  ]
}

Cache hits cost 10% of the normal input price. Cache creation costs 125%. Break-even is typically reached after 2 cache hits — after which savings compound. Most valuable when you have a large system prompt or document reused across many calls.

Message Batches API

Like OpenAI's Batch API, Anthropic's Message Batches API offers 50% cost reduction for asynchronous workloads with 24-hour completion. Good for:

  • Large-scale evaluation runs
  • Bulk data labelling and classification
  • Overnight analysis pipelines

Claude via AWS Bedrock

All Claude models are available via AWS Bedrock, which adds:

  • AWS IAM authentication (no separate Anthropic API key needed)
  • VPC private endpoints for data isolation
  • HIPAA and SOC2 compliance under AWS's certifications
  • Pay via AWS consolidated billing
  • Model invocation in the same region as your other AWS infrastructure

The API differs from the direct Anthropic API — use the Bedrock SDK (@aws-sdk/client-bedrock-runtime or boto3), not the Anthropic SDK directly.

Checklist: Do You Understand This?

  • What is the key structural difference between the Claude Messages API and OpenAI Chat Completions?
  • How do you enable extended thinking, and what token budget is appropriate?
  • What three tools does Claude use for computer use?
  • When does prompt caching reach break-even, and what is the per-token discount on cache hits?
  • Name three enterprise reasons to access Claude via AWS Bedrock rather than directly.
  • What is the Messages Batches API and when should you use it?