Tool Calling Patterns
Tool calling (also called function calling) is the mechanism by which an LLM requests the execution of a function and incorporates the result into its reasoning. It is what makes an agent capable of taking actions beyond text generation β querying databases, calling APIs, reading files, running code. This page covers how tool calling works at the protocol level, how to design tools the LLM uses reliably, how to handle parallel and sequential calls, and how to manage errors.
How Tool Calling Works
The flow is a structured round-trip between your application and the LLM:
Tool Schema Design
The quality of the tool description is the single biggest factor in whether the LLM calls the tool correctly. The LLM cannot read your code β it relies entirely on the schema and description to understand what the tool does, when to call it, and what arguments to provide.
Anatomy of a Good Tool Definition
| Field | Purpose | Common mistake |
|---|---|---|
| name | Identifies the tool. Used by the LLM to select it. | Vague names like tool1 or helper. Use verbs: search_web, get_order_status, send_email |
| description | Tells the LLM when to use this tool vs others, and what it returns. | Too short. Include: what it does, when to use it, what it returns, and its limitations |
| parameters | JSON Schema defining each argument: type, description, constraints, required vs optional. | Missing description on individual parameters β the LLM then guesses what to put in each field |
| required array | Marks which parameters the LLM must always provide. | Marking everything required when some params have sensible defaults β creates unnecessary friction |
Example: Good vs Poor Schema
{
"name": "search",
"description": "Search for things",
"parameters": {
"type": "object",
"properties": {
"q": { "type": "string" }
},
"required": ["q"]
}
}q.{
"name": "search_product_catalog",
"description": "Search the product catalog by keyword. Returns up to 10 matching products with name, SKU, price, and stock status. Use this when the user asks about product availability or pricing. Do NOT use for order history.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search keywords, e.g. 'blue running shoes size 10'"
},
"max_results": {
"type": "integer",
"description": "Max products to return (1-10). Default: 5",
"default": 5
}
},
"required": ["query"]
}
}Schema Design Rules
get_order and get_invoice, the description of each must explain when to use it vs the other.Parallel vs Sequential Tool Calls
Modern LLMs (GPT-4o, Claude 3.x, Gemini 1.5) can request multiple tool calls in a single response β before seeing any of the results. This is called parallel tool calling.
| Mode | When LLM uses it | Your application must | Best for |
|---|---|---|---|
| Parallel | Multiple independent tool calls can be issued before seeing results | Execute all calls concurrently; collect all results; inject all results before next LLM call | Independent lookups (search + get user + get order simultaneously) |
| Sequential | Each call depends on the result of the previous | Execute one tool, inject result, LLM decides next call | Dependent steps (search β read top result β summarise that result) |
parallel_tool_calls: false to force sequential calls. Anthropic supports parallel tool calls natively. If your tools have side effects that conflict (e.g. two writes to the same resource), you should either disable parallelism or implement locking at the tool layer.Error Handling
Tool calls will fail. The network will time out. The API will return a 404. The input will fail validation. How you surface errors to the LLM determines whether it can recover gracefully or gets stuck.
is_error: true (Anthropic) or use the role: "tool" message with error content (OpenAI).Result Injection and Context Management
Tool results accumulate in the context window as the agent runs. For long-running agents with many tool calls, this becomes a context management problem.
Provider Differences
| Provider | Tool format | Parallel calls | Notes |
|---|---|---|---|
| OpenAI | tools array with function type + JSON Schema | Yes (default on; disable with parallel_tool_calls: false) | Most mature implementation; tool_choice forces specific tool or disables tool calling |
| Anthropic | tools array with input_schema (JSON Schema) | Yes (native support) | Error results use is_error: true; tool_choice supports auto, any, tool; programmatic tool calling (2025) |
| Gemini | function_declarations inside tools | Yes | Different field naming convention; LiteLLM or LangChain normalises across providers |
Controlling Tool Selection
auto (default): The LLM decides whether to call a tool or respond in text. Use for general agents where the LLM should decide based on context.required / any: Force the LLM to call at least one tool. Use when you need structured output and want to prevent the LLM from responding in free text.none: Disable tool calling entirely for this call. Use when you want a text response summarising the tool results collected so far, without risking another tool call.Advanced Patterns (2025)
Dynamic tool loading / Tool search
When you have hundreds of tools, providing them all in every call is expensive (tokens) and degrades selection quality. Instead, maintain a tool registry with descriptions, embed them, and retrieve the most relevant tools per query. Inject only the top-5 relevant tools for each call. Anthropic introduced a native Tool Search Tool in 2025 that implements this pattern β the LLM can search for tools by description before calling them.
Structured output via forced tool call
LLMs produce unreliable JSON in free-text mode. To get reliable structured output, define a βtoolβ whose schema matches your desired output format (e.g. extract_customer_info with fields for name, email, intent), force the LLM to call it with tool_choice: required, and treat the arguments as your output. This is more reliable than asking the LLM to βrespond in JSON.β
Programmatic tool calling (Anthropic, 2025)
Anthropic introduced programmatic tool calling β Claude can invoke tools in a code execution environment rather than individual API round-trips. This enables parallel tool execution from within a single reasoning step, reducing latency for agents with many concurrent tool calls.
Failure Modes
Wrong tool selected
The LLM calls a tool that is not appropriate for the situation β usually because descriptions are too similar or too vague. Fix: add explicit disambiguation language to each description; reduce the number of tools visible per call.
Hallucinated arguments
The LLM invents argument values it was not given (e.g. fabricates an order ID). Fix: validate arguments before executing; add parameter descriptions that clarify where the values should come from ("Use the order ID from the user's message, not from your knowledge").
Ignoring tool results
The LLM calls a tool, receives the result, then answers from its training knowledge anyway β ignoring what the tool returned. Fix: strengthen the system prompt instruction to use tool results; inspect traces to confirm results are being injected correctly.
Executing without validation
Blindly executing the LLM's tool call arguments without checking them first. For write tools (delete, send, update), always validate arguments in your application layer before execution β treat LLM-provided arguments as untrusted user input.
Checklist: Do You Understand This?
- Can you describe the four-step tool calling round-trip (define β LLM calls β execute β inject result)?
- What are the four fields of a tool definition, and what is the most common mistake in each?
- Can you explain the difference between parallel and sequential tool calls, and when each is appropriate?
- How should you handle a tool error β what must you always include in the response?
- Can you name the three
tool_choicemodes and when to use each? - What is the βstructured output via forced tool callβ pattern and why is it more reliable than asking for JSON?
- Can you name four failure modes in tool calling and their fixes?
- How do Anthropic and OpenAI differ in their tool format and error signalling?