Intermediate

Multi-Agent Systems

A single agent with many tools hits a practical ceiling: too many tools degrades decision quality, a single context window limits parallelism, and a monolithic agent is hard to test and maintain. Multi-agent systems solve this by distributing work across specialised agents that each do one thing well. This page covers when multiple agents are justified, the five fundamental coordination patterns, how agents communicate, and the failure modes that multi-agent architectures introduce.

User / Task

Complex Request

Orchestrator

Claude Planner

Subagents

Research Agent

Writer Agent

Validator Agent

Tools

Web Search

File Write

Code Run

Orchestrator delegates to specialised subagents — each with focused tools and context

When Multiple Agents Are Justified

Use multiple agents when: The task has distinct, separable subtasks — each requiring different tools, different context, or different prompting strategies. When one subtask's output is another's input and they can run in parallel. When a single agent would need 10+ tools (splitting reduces selection errors). When you need independent verification (one agent produces, another critiques).

Do NOT use multiple agents when: A single well-prompted agent with 4–6 tools will handle the task. When the “parallelism” is illusory — the subtasks are actually sequential and interdependent. When the overhead of inter-agent communication and coordination exceeds the benefit. When you are prototyping — single agent first, always.

Rule of thumb: 72% of enterprise AI projects in 2025 involve multi-agent architectures, but the majority of those that fail do so because of coordination overhead rather than capability limits. Start single-agent; add agents only when a specific bottleneck demands it.

Five Coordination Patterns

1. Orchestrator-Worker (Hierarchical)

A central orchestrator agent receives the user's goal, breaks it into subtasks, delegates each to a specialist worker agent, collects results, and synthesises the final output. Workers do not communicate with each other — all coordination flows through the orchestrator.

Best for: Complex tasks with well-defined subtasks that can be parallelised (research + write + review)

Example: User asks “research competitor X and write a briefing”. Orchestrator delegates to: SearchAgent (find articles), DataAgent (pull financials), WriterAgent (draft briefing), ReviewAgent (fact-check draft)

Failure mode: Orchestrator becomes a bottleneck; poor task decomposition propagates to all workers

2. Sequential Pipeline

Agents are arranged in a fixed chain: Agent A processes input, passes output to Agent B, which passes to Agent C. Each agent has a specialised prompt and tool set for its stage. The pipeline is deterministic — the sequence does not change based on intermediate results.

Best for: Document processing workflows with fixed stages (ingest → extract → classify → enrich → store)

Example: Contract processing: ExtractAgent (parse PDF) → ClassifyAgent (type + jurisdiction) → RiskAgent (flag clauses) → SummaryAgent (executive summary)

Failure mode: An error in any stage corrupts all downstream stages; no branching for exceptions

3. Parallel Fan-Out (Scatter-Gather)

A coordinator dispatches the same task (or related tasks) to multiple agents simultaneously. All agents run concurrently. A gather step collects and consolidates all results before producing a final output.

Best for: Tasks where the same question should be answered from multiple perspectives simultaneously (multi-source research, ensemble scoring, A/B draft generation)

Example: Market research: three SearchAgents run concurrently searching different sources (news / academic / social). GatherAgent synthesises all three streams.

Failure mode: Gather step must handle partial failures (some agents succeed, some fail); results may contradict each other

4. Handoff (Dynamic Routing)

An agent decides at runtime to transfer control to a different specialist agent based on what it has learned so far. Unlike sequential pipeline (fixed order), handoffs are dynamic — the routing decision is made by the LLM, not pre-coded. The original agent passes its accumulated context to the receiving agent.

Best for: Customer service scenarios where the type of issue determines which specialist handles it

Example: TriageAgent classifies the user's request → hands off to RefundAgent, TechnicalAgent, or BillingAgent based on classification, passing the full conversation context

Key implementation detail: The receiving agent must be initialised with the handing-off agent's context — not just the latest message. OpenAI Agents SDK has first-class handoff support with automatic context propagation.

5. Critic-Revise (Evaluator Loop)

A generator agent produces output; a critic agent evaluates it against defined criteria; if the output fails, it is returned to the generator with the critique for revision. The loop continues until quality criteria are met or a max iteration limit is reached.

Best for: High-quality output tasks where first-pass quality is insufficient (code generation + testing, legal document drafting + review)

Example: CodingAgent writes a function → TestAgent runs tests and returns failures → CodingAgent revises → repeat up to 3 cycles

Critical constraint: Max iteration limit is mandatory — without it the loop can run indefinitely on hard problems. 3–5 cycles is typically sufficient; beyond that, escalate to human review.

Inter-Agent Communication

How agents pass information to each other is a design decision with significant cost and reliability implications.

Communication method	How it works	Best for	Downside
Direct message passing	Agent A's output is directly passed as Agent B's input message	Sequential pipelines; simple orchestrator-worker	Tightly coupled — changing Agent A's output format breaks Agent B
Shared state object	All agents read/write to a centralised state object (LangGraph StateGraph)	Complex DAG workflows; agents that need access to multiple prior results	State schema must be defined upfront; concurrent writes need locking
Message queue / async	Agents publish/subscribe to a message bus (Redis, SQS); decoupled async execution	High-volume, long-running workflows; agents with variable latency	More infrastructure; harder to trace and debug
External memory / DB	Agents write results to a shared database; downstream agents query what they need	Parallel fan-out where downstream agents need selective access to upstream results	Requires schema discipline; agents must know what keys to read

Context Propagation

The most common multi-agent mistake is losing context at agent boundaries. When Agent A hands off to Agent B, Agent B needs to know:

The original user goal (not just the immediate task)

What Agent A already tried and what the results were

Any constraints or preferences the user stated

The reason for the handoff (why is this coming to me now?)

Design pattern: Define a standardised handoff object — a structured summary that every agent populates before passing control. Fields: original_goal, steps_completed, key_findings, reason_for_handoff, constraints. The receiving agent's system prompt is initialised from this object.

Framework Implementation

LangGraph — graph-based state machine

Define each agent as a node. Edges between nodes are either fixed (sequential pipeline) or conditional (routing based on state values). A centralised StateGraph object holds all shared state and is checkpointed to storage between steps — enabling resumability and human-in-the-loop pauses.

v1.0 stable Oct 2025. Best for: production agents requiring explicit control flow, parallel subgraph execution, and LangSmith tracing.

OpenAI Agents SDK — agents + handoffs

Defines agents as objects with a system prompt, tool list, and handoff list. Handoffs are first-class — specifying a handoff target causes the SDK to automatically transfer context and switch the active agent. Built-in tracing. Released March 2025 as the production replacement for the experimental Swarm.

Best for: teams on OpenAI stack who want handoff patterns without building custom routing logic.

AutoGen / Microsoft Agent Framework — conversational multi-agent

Agents communicate by sending messages to each other in a conversation. Structured conversation patterns (two-agent, group chat, nested chat) handle common coordination scenarios. Microsoft merged AutoGen with Semantic Kernel in October 2025 for enterprise deployments.

Best for: research, iterative problem-solving, enterprise environments using Azure AI.

Failure Modes

Context loss at agent boundaries

The receiving agent does not have enough context to continue the work sensibly — it re-does steps already completed, contradicts prior decisions, or asks the user to re-explain. Fix: standardised handoff objects; test each agent boundary as an independent unit with realistic context inputs.

Coordination overhead exceeds benefit

The latency of orchestrating multiple agents (extra LLM calls for planning, waiting for parallel results, synthesising outputs) makes the multi-agent system slower and more expensive than a single agent would have been. Measure wall-clock latency and total cost before and after splitting into multiple agents.

Infinite critic-revise loops

A Critic-Revise pattern without a hard iteration limit will loop indefinitely if the critic never approves (e.g. because the criteria are impossible to satisfy). Always implement a maximum loop count and a fallback — after N iterations, either accept the best attempt or escalate to human review.

Partial failure in scatter-gather

In parallel fan-out, some agents succeed and some fail. If the gather step requires all results, one slow or failed agent blocks the whole system. Implement timeouts per parallel agent and a partial-results policy — decide in advance whether the system can produce output with 3/4 results vs requiring all 4.

Observability gap

Multi-agent systems are substantially harder to debug than single agents because failures can be in any agent or at any transition. Without distributed tracing that spans all agents in a run, failures are nearly impossible to diagnose from final output alone. Instrument every agent with a shared trace ID from the start.

2025–2026 Developments

OpenAI Agents SDK with first-class handoffs — March 2025

OpenAI released the production Agents SDK in March 2025 as a direct replacement for Swarm. It introduced handoffs as a first-class primitive — defining which agents can receive control transfers, with automatic context propagation and built-in tracing through the OpenAI dashboard.

Google's Agent-to-Agent (A2A) protocol — 2025

Google published the Agent-to-Agent (A2A) protocol — an open standard for inter-agent communication that enables agents built on different frameworks or by different vendors to interoperate. This addresses the “walled garden” problem where agents built on LangGraph cannot directly communicate with agents built on AutoGen. Adoption was still nascent by end of 2025 but accelerating.

Microsoft merges AutoGen + Semantic Kernel — October 2025

Microsoft merged its AutoGen framework with Semantic Kernel into a unified Microsoft Agent Framework, targeting enterprise deployments on Azure AI. This created a single, supported path for enterprise teams wanting multi-agent capabilities on the Microsoft stack, replacing the fragmented AutoGen vs Semantic Kernel choice.

Checklist: Do You Understand This?

Can you name the five multi-agent coordination patterns and give an example of each?
When should you use multiple agents, and when should you stick with a single agent?
What four pieces of context must a receiving agent have when control is handed off to it?
Can you describe the standardised handoff object pattern and what fields it contains?
What is the difference between shared state (LangGraph) and direct message passing for inter-agent communication?
Why must a Critic-Revise loop always have a maximum iteration limit?
Can you name five failure modes specific to multi-agent architectures?
What is Google's A2A protocol and what problem does it solve?