Intermediate

Claude API

The Anthropic API is how you integrate Claude into your own applications. It exposes Claude's capabilities via HTTP endpoints, with official SDKs for Python and TypeScript. This page covers authentication, core endpoints, rate limits, and streaming.

Authentication

All API requests require an API key passed in the x-api-key header. To get an API key:

Sign in to the Anthropic Console at console.anthropic.com
Navigate to Settings → API Keys
Create a new key — give it a descriptive name (e.g. "production-app", "dev-local")
Copy the key immediately — it is only shown once

Best practices for API key management:

Never hardcode API keys in source code — use environment variables (ANTHROPIC_API_KEY)
Create separate keys for each environment (dev, staging, production)
Rotate keys if they are ever exposed or committed to a repository
Set spending limits per key in the Console to contain runaway costs

Core Endpoints

Endpoint	Method	Purpose
`/v1/messages`	POST	Create a message — the main endpoint for all completions
`/v1/models`	GET	List available models and their metadata
`/v1/messages/batches`	POST	Message Batches API — async bulk processing at 50% lower cost

The Messages Request

A basic /v1/messages request requires:

model: The model ID — use a dated snapshot for production (e.g. claude-sonnet-4-6-20251022)
max_tokens: The maximum number of tokens in the response. Claude will stop before this limit if it finishes naturally.
messages: An array of message objects, each with a role (user or assistant) and content.

Optional but frequently used:

system: A system prompt — instructions that apply to the entire conversation
temperature: Controls randomness (0.0–1.0; 0 = deterministic, 1 = creative)
tools: Define functions Claude can call (tool use / function calling)
stream: Set to true to receive a streaming response

Streaming vs Non-Streaming

Non-streaming (default)

The API waits until Claude finishes generating, then returns the full response in one HTTP response.

Simpler to implement
Better for batch processing, background jobs
Higher perceived latency for users waiting on-screen

Streaming

Returns server-sent events (SSE) as tokens are generated — each event is a delta containing new text.

Required for real-time chat interfaces
Much lower perceived latency (user sees output start immediately)
More complex to implement — need to handle SSE and stream errors

The Anthropic SDKs

Anthropic provides official SDKs that wrap the HTTP API:

Python: anthropic package (install: pip install anthropic). Handles authentication, request building, streaming, retries, and error handling. Supports both sync and async (AsyncAnthropic).
TypeScript/Node.js: @anthropic-ai/sdk package. Full type safety, streaming support, tool use helpers.

Both SDKs read the ANTHROPIC_API_KEY environment variable automatically — you don't need to pass the key manually when it's set.

Rate Limits and 429 Errors

The Anthropic API enforces rate limits at two levels:

Requests per minute (RPM): Maximum number of API calls per minute. Depends on your usage tier.
Tokens per minute (TPM): Maximum input + output tokens processed per minute. For large-context applications, TPM is often the binding constraint.

When you exceed a limit, the API returns a 429 Too Many Requests response. Handle these correctly:

Read the retry-after response header — it tells you how many seconds to wait
Implement exponential backoff with jitter for retries — don't retry immediately in a tight loop
The Anthropic SDK has built-in retry logic with sensible defaults — let it handle 429s in most cases
For sustained high-volume workloads, request a rate limit increase via the Anthropic Console

Prompt Caching

The Anthropic API supports prompt caching — if a large portion of your prompt (system prompt, reference documents) is repeated across many calls, you can mark it as cacheable. Cached tokens are served at significantly lower cost (typically 90% discount on input tokens) and improve latency.

To use caching: mark the cacheable content with a cache_control: {"type": "ephemeral"} parameter in your message content. The first call computes and caches the content; subsequent calls with the same prefix hit the cache. Cache entries expire after 5 minutes of inactivity.

Checklist: Do You Understand This?

API keys are created in the Anthropic Console — use environment variables, never hardcode
The core endpoint is POST /v1/messages — requires model, max_tokens, and messages array
Streaming uses server-sent events — needed for real-time UI; simpler apps can use non-streaming
Official SDKs exist for Python and TypeScript — handle auth, retries, and streaming
429 errors mean you've hit rate limits — use exponential backoff and read retry-after headers
Prompt caching can dramatically reduce costs for applications with repeated large contexts