🧠 All Things AI
Beginner

Prompting for Structured Output

Structured output means getting AI to respond in a specific, machine-readable format — JSON, tables, checklists, CSV, XML, or YAML — instead of free-form prose. This is the bridge between AI reasoning and real-world automation: software can parse structured data, but it cannot reliably parse a paragraph.

Why Structured Output Matters

Free-form text is great for reading. It is terrible for software. If you ask an AI to extract product details and it responds with a paragraph, your code has to figure out where the product name ends and the price begins — every single time, with every variation in wording.

Structured output solves this by giving you:

  • Software integration — parse the response directly in code without guesswork
  • Consistency — same field names, same types, same ordering across requests
  • Automation — feed outputs directly into databases, spreadsheets, APIs, and workflows
  • Error reduction — API-level schema enforcement can reduce parsing errors by up to 90%

Common Formats

Each format has a sweet spot. Choosing the right one depends on who (or what) will consume the output.

FormatBest ForConsumed By
JSONAPI responses, data extraction, database inputCode / software
Markdown TableComparisons, feature matrices, summariesHumans (docs, GitHub, Notion)
ChecklistTask lists, acceptance criteria, quality checksHumans (project management)
CSVBulk data, spreadsheet import, datasetsExcel, Sheets, pandas
XMLDocument interchange, enterprise systems, Claude promptingCode / enterprise tools
YAMLConfig files, DevOps artifacts, human-editable dataDocker, K8s, CI/CD

Universal Rules for Any Format

These five rules apply regardless of which format you are requesting. Follow all five and your success rate will jump dramatically.

1. Name the format explicitly

Say "Return a JSON object" or "Return a Markdown table" — do not leave the format ambiguous. The model will guess, and it will guess differently every time.

2. Define the schema

List every field (or column), its type, and what it means. Do not make the model guess your schema — it will invent one, and it will not match yours.

3. Provide an example

A single complete example of the desired output teaches the model more about your expectations than paragraphs of instructions.

4. Forbid extras

Tell the model not to add explanatory text, code fences, or additional fields. Say: "Return ONLY the JSON. No introduction, no explanation, no markdown wrappers."

5. Specify null handling

Tell the model what to do when data is missing: "If a value is not mentioned, use null." Without this, the model will sometimes omit fields, sometimes use empty strings, and sometimes write "N/A".

Prompting for JSON

JSON is the most important structured output format. It is how software talks to software, and it is how your AI responses become usable in code. Master this one first.

Good prompt:
Extract the following fields from the text below and return
ONLY a JSON object. Do not include any explanation or
markdown code fences.
Schema:
{
"company_name": string,
"founded_year": integer or null,
"headquarters": string or null,
"employee_count": integer or null
}
If a field is not mentioned in the text, use null.
Text: "Anthropic, based in San Francisco, was founded in 2021."
Expected output:
{
"company_name": "Anthropic",
"founded_year": 2021,
"headquarters": "San Francisco",
"employee_count": null
}

JSON-specific tips:

  • Show the full nested example if you need nested structures — do not just describe them
  • Keep schemas flat when possible; deep nesting increases failure rates
  • Explicitly forbid markdown wrappers: "Do not wrap in ```json code blocks"
  • For arrays, always show the array form even with a single element to prevent the model from returning a bare object instead

Prompting for Tables

Markdown tables are the best format when humans will read the output directly. They render in GitHub, Notion, Obsidian, and most documentation tools.

Good prompt:
Compare the following AI models and return a Markdown table.
Include exactly these columns: Model | Provider | Best For | Context Window
One row per model. Sort by context window descending.
Return ONLY the table. No introduction, no explanation.

Table-specific tips:

  • Specify column names exactly — models will invent their own if not told
  • Specify what each row represents ("one row per product")
  • Specify sort order if it matters
  • Research shows Markdown tables achieve ~16% higher accuracy than CSV when the AI also needs to interpret the data

Prompting for Checklists

Good prompt:
Create a deployment checklist for a Node.js application.
Format as a Markdown checklist with unchecked boxes.
Group items under: Pre-deployment, Deployment, Post-deployment.
Maximum 5 items per group. Return only the checklist.

Checklist tips:

  • Specify grouping — without it, you get an endless flat list
  • Set item count limits — unconstrained, models produce very long lists
  • Specify checked vs unchecked state if completion status matters

Prompting for CSV

Good prompt:
Generate a CSV of 10 fictional employees.
Columns: first_name,last_name,department,salary,start_date
Format dates as YYYY-MM-DD. Salary is an integer in USD.
Include the header row. Return ONLY the CSV.

CSV tips:

  • Warn the model if field values might contain commas — instruct it to quote those fields
  • Specify whether to include a header row
  • For real data extraction, prefer JSON then convert — JSON handles edge cases better

Prompting for XML and YAML

XML is especially effective with Claude, which was trained to understand and produce XML-tagged content. Providing a skeleton structure (with empty tags) and asking the model to fill it in is highly reliable.

XML skeleton prompt:
Extract meeting details and return as XML using this structure:
<meeting>
<title></title>
<date></date>
<attendees>
<person></person>
</attendees>
<action_items>
<item></item>
</action_items>
</meeting>

YAML is ideal for generating configuration files (Docker, Kubernetes, CI/CD). Always ask for 2-space indentation explicitly, and warn the model to quote values that contain special YAML characters (colons, brackets).

Common Pitfalls

These are the failure modes you will encounter most often. Knowing them in advance saves hours of debugging.

Markdown contamination

The model wraps JSON in ```json ... ``` code blocks or adds "Here is the JSON you requested:" before it. This breaksJSON.parse() immediately.

Fix: "Return only the raw JSON. No markdown code fences, no text before or after."

Schema drift

The model returns customer_name in one response and customerName in the next, or reorganizes nested structures unpredictably.

Fix: Define exact field names in the prompt and provide a complete example.

Hallucinated fields

The model adds fields you never asked for — confidence_score,source, notes — that do not exist in your schema.

Fix: "Include ONLY the fields listed in the schema. Do not add additional fields."

Type coercion errors

The model returns "30" (string) instead of 30 (integer), or "true" instead of true (boolean).

Fix: Specify types explicitly in both the schema description and the example.

Missing required fields

The model silently drops fields when it has no information, instead of returning null.

Fix: "Never omit a field. If a value is unknown, return null for that field."

Truncated output

Long JSON structures get cut off mid-generation due to token limits, resulting in unclosed brackets and invalid JSON.

Fix: Keep schemas lean. For large datasets, batch into multiple smaller requests.

API-Level Schema Enforcement (2024-2025)

The biggest development in structured output is that major AI providers now offerconstrained decoding — the model is mathematically prevented from generating tokens that would violate your schema. This is a fundamental shift from "hope the model returns valid JSON" to "the API guarantees it."

ProviderFeatureGuarantee
OpenAIStructured Outputs (GPT-4o+)Full schema compliance via JSON Schema
OpenAIJSON Mode (older, weaker)Valid JSON syntax only — no schema guarantee
AnthropicStructured Outputs (Claude Sonnet 4.5, Opus 4.1)Full schema compliance via constrained decoding
AnthropicTool use (all Claude models)Schema compliance via tool definitions
Googleresponse_schema (Gemini 2.0+)Full schema compliance via JSON Schema

How constrained decoding works: The system builds a grammar from your schema. At each token generation step, it computes which tokens are valid given the partial output so far, and masks all others. The model can only pick from valid tokens, making schema violations impossible.

If you are just chatting (not using APIs), you cannot use constrained decoding — you rely on careful prompting. But if you are building software that calls AI APIs, always use the schema enforcement features. They eliminate entire categories of bugs.

The Tool Use Trick

Before native structured output features existed, developers discovered a reliable workaround: define a "fake" tool/function with the desired schema, and ask the model to "call" it. Since function call arguments must match the tool's JSON Schema, you get schema-compliant output — without actually calling any external tool.

This trick still works across all major providers and all models, including older ones that lack native structured output support. It remains the most universally supported pattern for reliable structured output.

Tips for Maximum Reliability

Use API enforcement when available
Native Structured Outputs > Tool/function calling > JSON Mode > Prompt-only. Each step down requires more engineering effort for the same reliability.
Always validate in code
Even with API enforcement, validate semantically. Constrained decoding guarantees structure, not meaning — the model can return a syntactically valid but semantically wrong value.
Keep schemas simple
Flat schemas are more reliable than deeply nested ones. If you need complex structures, consider breaking extraction into multiple simpler requests.
Use few-shot examples
One or two complete input → output examples teach the model more than paragraphs of instructions.
Reason first, format second
A 2025 research finding showed that requiring JSON output can reduce reasoning accuracy by up to 27% compared to natural language — because the model devotes attention to syntax instead of thinking. For complex reasoning tasks, let the model think in prose first, then ask it to format the answer.

Real-World Use Cases

Data extraction from documents

Invoices → {vendor, amount, due_date, line_items[]}. Resumes → {name, skills[], experience[]}. Clinical notes → structured diagnoses and medications.

Meeting note structuring

Transform a transcript into {attendees[], decisions[], action_items[{owner, task, due_date}]} — no more manually scanning notes for who committed to what.

Product catalog generation

Given unstructured product descriptions, extract name, price, specifications, and tags for database ingestion in e-commerce systems.

Report generation

Define report sections as a schema: executive summary, key findings, recommendations, data tables. Every report follows the same structure regardless of content.

Form population

Extract structured fields from unstructured documents to auto-fill loan applications, insurance claims, or HR onboarding forms.

Putting It All Together

Here is a complete, well-structured prompt that applies all the rules from this page:

Complete prompt:
You are a data extraction assistant. Extract product information
from the text below and return ONLY a JSON array.
Rules:
- Return raw JSON. No markdown, no explanation, no code fences.
- Include ONLY the fields listed below. No extra fields.
- If a value is not mentioned, use null.
- Always return an array, even if there is only one product.
Schema per item:
{
"name": string,
"price_usd": number or null,
"category": "electronics" | "clothing" | "food" | "other",
"in_stock": boolean
}
Example:
Input: "The XPS 15 laptop costs $1,299 and is available now."
Output: [{&quot;name&quot;: &quot;XPS 15&quot;, &quot;price_usd&quot;: 1299, &quot;category&quot;: &quot;electronics&quot;, &quot;in_stock&quot;: true]
Now extract from:
[your text here]

This prompt hits all five universal rules: names the format (JSON array), defines the schema (four typed fields), provides an example, forbids extras, and specifies null handling. Use it as a template and adapt the schema to your needs.

Checklist: Do You Understand This?

  • Can you explain why structured output is more useful than free-form text for automation?
  • Can you name the five universal rules for prompting any structured format?
  • Can you write a prompt that reliably produces JSON with a defined schema?
  • Can you list three common pitfalls and their fixes?
  • Can you explain the difference between prompt-based and API-level schema enforcement?
  • Can you describe the "tool use trick" for getting structured output?
  • Given a real task, can you choose the right output format from the table above?