Advanced

Multi-Agent Systems

Multi-agent systems use multiple Claude instances working together — one orchestrator directing others, or multiple specialised subagents running in parallel. This architecture enables tasks too large for a single context window, parallel execution, and specialised expertise per agent.

User

Task Request

Orchestrator

Claude (Planner)

Subagents

Research Agent

Writer Agent

Reviewer Agent

Tools

Web Search

File Write

Diff Check

Orchestrator delegates; subagents execute — each has its own tool access

Orchestrator Pattern

The orchestrator is a Claude instance whose job is to plan and delegate — it does not execute tasks itself; it breaks the work into subtasks and dispatches them to subagents:

Orchestrator receives the high-level task
It plans: "To accomplish X, I need to do A, B, and C in order"
It calls a tool (e.g., invoke_agent) with the subtask definition
It receives the subagent's result and decides the next step
It synthesises the final result from all subagent outputs

The orchestrator can use any Claude model — it spends most tokens on planning and synthesis, so a capable model (Sonnet or Opus) is appropriate. Subagents can use faster/cheaper models for execution tasks.

Subagent Pattern: Specialised Agents

Subagents are specialised for a specific domain or task type. Each subagent has:

A focused system prompt for its domain (e.g., "You are a code reviewer. Your only job is to review the provided code for security issues.")
Only the tools relevant to its task (no extraneous tool access)
A narrow scope — it should not need to make judgment calls outside its domain

Specialisation improves quality: a subagent focused on one task with a tightly scoped system prompt performs better than a general agent asked to switch between tasks mid-conversation.

Handoffs: Passing Context Between Agents

When one agent hands off to another, it must pass the relevant context. Patterns:

Summary handoff: The sending agent summarises the relevant state before passing to the next agent. Compact but lossy — good for sequential pipelines.
Full context handoff: Pass the full conversation history or relevant documents. More reliable but consumes more context window in the receiving agent.
Structured state object: Define a shared state schema. Each agent reads from and writes to the same state object. Most maintainable for complex workflows.

Always be explicit about what the receiving agent needs to know. "Here is the full context for your task" is not sufficient — state what the previous step concluded and what the next step must accomplish.

Parallel Subagents

For independent subtasks, run subagents in parallel rather than sequentially. In Python:

import asyncio
import anthropic

client = anthropic.AsyncAnthropic()

async def run_subagent(task: str, system: str) -> str:
    response = await client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": task}]
    )
    return response.content[0].text

async def run_parallel_agents(tasks: list[dict]) -> list[str]:
    coroutines = [
        run_subagent(task["task"], task["system"])
        for task in tasks
    ]
    return await asyncio.gather(*coroutines)

Running parallel agents consumes more API quota simultaneously — ensure your rate limits accommodate concurrent requests. Anthropic's API supports concurrent requests; the limit depends on your tier.

State Management: Memory vs Explicit Passing

Multi-agent systems must decide how to share state:

Pass explicitly: Each agent call includes all needed context in the prompt. Simple, predictable, but verbose. Best for shallow pipelines (2–3 agents).
Shared external store: Agents read/write to a database or key-value store via tools. Enables complex state without bloating prompts. Best for long pipelines or many agents.
In-memory state object (Python dict/class): Your orchestration code maintains state; agents get only the fields they need. Good middle ground for most systems.

Avoid relying on Claude to remember state between separate API calls — each call is stateless. State lives in your application code, not in the model.

Checklist: Do You Understand This?

Orchestrator: plans and delegates — does not execute tasks itself; calls subagents via tools
Subagents: specialised, narrow scope, only tools relevant to their domain
Handoffs: always explicit — summarise what was concluded and what must be done next
Parallel subagents: use async/asyncio.gather for independent tasks — consumes parallel API quota
State lives in application code — do not rely on cross-call memory; pass state explicitly or use an external store