Intermediate

Claude API

The Anthropic API is how you integrate Claude into your own applications. It exposes Claude's capabilities via HTTP endpoints, with official SDKs for Python and TypeScript. This page covers authentication, core endpoints, rate limits, and streaming.

Authentication

All API requests require an API key passed in the x-api-key header. To get an API key:

  1. Sign in to the Anthropic Console at console.anthropic.com
  2. Navigate to Settings → API Keys
  3. Create a new key — give it a descriptive name (e.g. "production-app", "dev-local")
  4. Copy the key immediately — it is only shown once

Best practices for API key management:

  • Never hardcode API keys in source code — use environment variables (ANTHROPIC_API_KEY)
  • Create separate keys for each environment (dev, staging, production)
  • Rotate keys if they are ever exposed or committed to a repository
  • Set spending limits per key in the Console to contain runaway costs

Core Endpoints

EndpointMethodPurpose
/v1/messagesPOSTCreate a message — the main endpoint for all completions
/v1/modelsGETList available models and their metadata
/v1/messages/batchesPOSTMessage Batches API — async bulk processing at 50% lower cost

The Messages Request

A basic /v1/messages request requires:

  • model: The model ID — use a dated snapshot for production (e.g. claude-sonnet-4-6-20251022)
  • max_tokens: The maximum number of tokens in the response. Claude will stop before this limit if it finishes naturally.
  • messages: An array of message objects, each with a role (user or assistant) and content.

Optional but frequently used:

  • system: A system prompt — instructions that apply to the entire conversation
  • temperature: Controls randomness (0.0–1.0; 0 = deterministic, 1 = creative)
  • tools: Define functions Claude can call (tool use / function calling)
  • stream: Set to true to receive a streaming response

Streaming vs Non-Streaming

Non-streaming (default)

The API waits until Claude finishes generating, then returns the full response in one HTTP response.

  • Simpler to implement
  • Better for batch processing, background jobs
  • Higher perceived latency for users waiting on-screen

Streaming

Returns server-sent events (SSE) as tokens are generated — each event is a delta containing new text.

  • Required for real-time chat interfaces
  • Much lower perceived latency (user sees output start immediately)
  • More complex to implement — need to handle SSE and stream errors

The Anthropic SDKs

Anthropic provides official SDKs that wrap the HTTP API:

  • Python: anthropic package (install: pip install anthropic). Handles authentication, request building, streaming, retries, and error handling. Supports both sync and async (AsyncAnthropic).
  • TypeScript/Node.js: @anthropic-ai/sdk package. Full type safety, streaming support, tool use helpers.

Both SDKs read the ANTHROPIC_API_KEY environment variable automatically — you don't need to pass the key manually when it's set.

Rate Limits and 429 Errors

The Anthropic API enforces rate limits at two levels:

  • Requests per minute (RPM): Maximum number of API calls per minute. Depends on your usage tier.
  • Tokens per minute (TPM): Maximum input + output tokens processed per minute. For large-context applications, TPM is often the binding constraint.

When you exceed a limit, the API returns a 429 Too Many Requests response. Handle these correctly:

  • Read the retry-after response header — it tells you how many seconds to wait
  • Implement exponential backoff with jitter for retries — don't retry immediately in a tight loop
  • The Anthropic SDK has built-in retry logic with sensible defaults — let it handle 429s in most cases
  • For sustained high-volume workloads, request a rate limit increase via the Anthropic Console

Prompt Caching

The Anthropic API supports prompt caching — if a large portion of your prompt (system prompt, reference documents) is repeated across many calls, you can mark it as cacheable. Cached tokens are served at significantly lower cost (typically 90% discount on input tokens) and improve latency.

To use caching: mark the cacheable content with a cache_control: {"type": "ephemeral"} parameter in your message content. The first call computes and caches the content; subsequent calls with the same prefix hit the cache. Cache entries expire after 5 minutes of inactivity.

Checklist: Do You Understand This?

  • API keys are created in the Anthropic Console — use environment variables, never hardcode
  • The core endpoint is POST /v1/messages — requires model, max_tokens, and messages array
  • Streaming uses server-sent events — needed for real-time UI; simpler apps can use non-streaming
  • Official SDKs exist for Python and TypeScript — handle auth, retries, and streaming
  • 429 errors mean you've hit rate limits — use exponential backoff and read retry-after headers
  • Prompt caching can dramatically reduce costs for applications with repeated large contexts

Page built: 01 Jun 2026