Beginner

Prompting for Structured Output

Structured output means getting AI to respond in a specific, machine-readable format — JSON, tables, checklists, CSV, XML, or YAML — instead of free-form prose. This is the bridge between AI reasoning and real-world automation: software can parse structured data, but it cannot reliably parse a paragraph.

Why Structured Output Matters

Free-form text is great for reading. It is terrible for software. If you ask an AI to extract product details and it responds with a paragraph, your code has to figure out where the product name ends and the price begins — every single time, with every variation in wording.

Structured output solves this by giving you:

Software integration — parse the response directly in code without guesswork
Consistency — same field names, same types, same ordering across requests
Automation — feed outputs directly into databases, spreadsheets, APIs, and workflows
Error reduction — API-level schema enforcement can reduce parsing errors by up to 90%

Example: Contact ExtractionAsk AI to extract contact info from an email — return as JSON

name*stringFull name

email*stringEmail address

companystringCompany if mentioned

phonestringPhone if mentioned

action_required*booleanDoes this need a reply?

Common Formats

Each format has a sweet spot. Choosing the right one depends on who (or what) will consume the output.

Format	Best For	Consumed By
JSON	API responses, data extraction, database input	Code / software
Markdown Table	Comparisons, feature matrices, summaries	Humans (docs, GitHub, Notion)
Checklist	Task lists, acceptance criteria, quality checks	Humans (project management)
CSV	Bulk data, spreadsheet import, datasets	Excel, Sheets, pandas
XML	Document interchange, enterprise systems, Claude prompting	Code / enterprise tools
YAML	Config files, DevOps artifacts, human-editable data	Docker, K8s, CI/CD

Universal Rules for Any Format

These five rules apply regardless of which format you are requesting. Follow all five and your success rate will jump dramatically.

1. Name the format explicitly

Say "Return a JSON object" or "Return a Markdown table" — do not leave the format ambiguous. The model will guess, and it will guess differently every time.

2. Define the schema

List every field (or column), its type, and what it means. Do not make the model guess your schema — it will invent one, and it will not match yours.

3. Provide an example

A single complete example of the desired output teaches the model more about your expectations than paragraphs of instructions.

4. Forbid extras

Tell the model not to add explanatory text, code fences, or additional fields. Say: "Return ONLY the JSON. No introduction, no explanation, no markdown wrappers."

5. Specify null handling

Tell the model what to do when data is missing: "If a value is not mentioned, use null." Without this, the model will sometimes omit fields, sometimes use empty strings, and sometimes write "N/A".

Prompting for JSON

JSON is the most important structured output format. It is how software talks to software, and it is how your AI responses become usable in code. Master this one first.

Good prompt:

Extract the following fields from the text below and return

ONLY a JSON object. Do not include any explanation or

markdown code fences.

Schema:

{

"company_name": string,

"founded_year": integer or null,

"headquarters": string or null,

"employee_count": integer or null

}

If a field is not mentioned in the text, use null.

Text: "Anthropic, based in San Francisco, was founded in 2021."

Expected output:

{

"company_name": "Anthropic",

"founded_year": 2021,

"headquarters": "San Francisco",

"employee_count": null

}

JSON-specific tips:

Show the full nested example if you need nested structures — do not just describe them
Keep schemas flat when possible; deep nesting increases failure rates
Explicitly forbid markdown wrappers: "Do not wrap in ```json code blocks"
For arrays, always show the array form even with a single element to prevent the model from returning a bare object instead

Prompting for Tables

Markdown tables are the best format when humans will read the output directly. They render in GitHub, Notion, Obsidian, and most documentation tools.

Good prompt:

Compare the following AI models and return a Markdown table.

Include exactly these columns: Model | Provider | Best For | Context Window

One row per model. Sort by context window descending.

Return ONLY the table. No introduction, no explanation.

Table-specific tips:

Specify column names exactly — models will invent their own if not told
Specify what each row represents ("one row per product")
Specify sort order if it matters
Research shows Markdown tables achieve ~16% higher accuracy than CSV when the AI also needs to interpret the data

Prompting for Checklists

Good prompt:

Create a deployment checklist for a Node.js application.

Format as a Markdown checklist with unchecked boxes.

Group items under: Pre-deployment, Deployment, Post-deployment.

Maximum 5 items per group. Return only the checklist.

Checklist tips:

Specify grouping — without it, you get an endless flat list
Set item count limits — unconstrained, models produce very long lists
Specify checked vs unchecked state if completion status matters

Prompting for CSV

Good prompt:

Generate a CSV of 10 fictional employees.

Columns: first_name,last_name,department,salary,start_date

Format dates as YYYY-MM-DD. Salary is an integer in USD.

Include the header row. Return ONLY the CSV.

CSV tips:

Warn the model if field values might contain commas — instruct it to quote those fields
Specify whether to include a header row
For real data extraction, prefer JSON then convert — JSON handles edge cases better

Prompting for XML and YAML

XML is especially effective with Claude, which was trained to understand and produce XML-tagged content. Providing a skeleton structure (with empty tags) and asking the model to fill it in is highly reliable.

XML skeleton prompt:

Extract meeting details and return as XML using this structure:

</attendees>

<action_items>

</action_items>

</meeting>

YAML is ideal for generating configuration files (Docker, Kubernetes, CI/CD). Always ask for 2-space indentation explicitly, and warn the model to quote values that contain special YAML characters (colons, brackets).

Common Pitfalls

These are the failure modes you will encounter most often. Knowing them in advance saves hours of debugging.

Markdown contamination

The model wraps JSON in ```json ... ``` code blocks or adds "Here is the JSON you requested:" before it. This breaksJSON.parse() immediately.

Fix: "Return only the raw JSON. No markdown code fences, no text before or after."

Schema drift

The model returns customer_name in one response and customerName in the next, or reorganizes nested structures unpredictably.

Fix: Define exact field names in the prompt and provide a complete example.

Hallucinated fields

The model adds fields you never asked for — confidence_score,source, notes — that do not exist in your schema.

Fix: "Include ONLY the fields listed in the schema. Do not add additional fields."

Type coercion errors

The model returns "30" (string) instead of 30 (integer), or "true" instead of true (boolean).

Fix: Specify types explicitly in both the schema description and the example.

Missing required fields

The model silently drops fields when it has no information, instead of returning null.

Fix: "Never omit a field. If a value is unknown, return null for that field."

Truncated output

Long JSON structures get cut off mid-generation due to token limits, resulting in unclosed brackets and invalid JSON.

Fix: Keep schemas lean. For large datasets, batch into multiple smaller requests.

API-Level Schema Enforcement (2024-2025)

The biggest development in structured output is that major AI providers now offerconstrained decoding — the model is mathematically prevented from generating tokens that would violate your schema. This is a fundamental shift from "hope the model returns valid JSON" to "the API guarantees it."

Provider	Feature	Guarantee
OpenAI	Structured Outputs (GPT-4o+)	Full schema compliance via JSON Schema
OpenAI	JSON Mode (older, weaker)	Valid JSON syntax only — no schema guarantee
Anthropic	Structured Outputs (Claude Sonnet 4.5, Opus 4.1)	Full schema compliance via constrained decoding
Anthropic	Tool use (all Claude models)	Schema compliance via tool definitions
Google	response_schema (Gemini 2.0+)	Full schema compliance via JSON Schema

How constrained decoding works: The system builds a grammar from your schema. At each token generation step, it computes which tokens are valid given the partial output so far, and masks all others. The model can only pick from valid tokens, making schema violations impossible.

If you are just chatting (not using APIs), you cannot use constrained decoding — you rely on careful prompting. But if you are building software that calls AI APIs, always use the schema enforcement features. They eliminate entire categories of bugs.

The Tool Use Trick

Before native structured output features existed, developers discovered a reliable workaround: define a "fake" tool/function with the desired schema, and ask the model to "call" it. Since function call arguments must match the tool's JSON Schema, you get schema-compliant output — without actually calling any external tool.

This trick still works across all major providers and all models, including older ones that lack native structured output support. It remains the most universally supported pattern for reliable structured output.

Tips for Maximum Reliability

Use API enforcement when available
Native Structured Outputs > Tool/function calling > JSON Mode > Prompt-only. Each step down requires more engineering effort for the same reliability.

Always validate in code
Even with API enforcement, validate semantically. Constrained decoding guarantees structure, not meaning — the model can return a syntactically valid but semantically wrong value.

Keep schemas simple
Flat schemas are more reliable than deeply nested ones. If you need complex structures, consider breaking extraction into multiple simpler requests.

Use few-shot examples
One or two complete input → output examples teach the model more than paragraphs of instructions.

Reason first, format second
A 2025 research finding showed that requiring JSON output can reduce reasoning accuracy by up to 27% compared to natural language — because the model devotes attention to syntax instead of thinking. For complex reasoning tasks, let the model think in prose first, then ask it to format the answer.

Real-World Use Cases

Data extraction from documents

Invoices → {vendor, amount, due_date, line_items[]}. Resumes → {name, skills[], experience[]}. Clinical notes → structured diagnoses and medications.

Meeting note structuring

Transform a transcript into {attendees[], decisions[], action_items[{owner, task, due_date}]} — no more manually scanning notes for who committed to what.

Product catalog generation

Given unstructured product descriptions, extract name, price, specifications, and tags for database ingestion in e-commerce systems.

Report generation

Define report sections as a schema: executive summary, key findings, recommendations, data tables. Every report follows the same structure regardless of content.

Form population

Extract structured fields from unstructured documents to auto-fill loan applications, insurance claims, or HR onboarding forms.

Putting It All Together

Here is a complete, well-structured prompt that applies all the rules from this page:

Complete prompt:

You are a data extraction assistant. Extract product information

from the text below and return ONLY a JSON array.

Rules:

- Return raw JSON. No markdown, no explanation, no code fences.

- Include ONLY the fields listed below. No extra fields.

- If a value is not mentioned, use null.

- Always return an array, even if there is only one product.

Schema per item:

{

"name": string,

"price_usd": number or null,

"category": "electronics" | "clothing" | "food" | "other",

"in_stock": boolean

}

Example:

Input: "The XPS 15 laptop costs $1,299 and is available now."

Output: [{"name": "XPS 15", "price_usd": 1299, "category": "electronics", "in_stock": true]

Now extract from:

[your text here]

This prompt hits all five universal rules: names the format (JSON array), defines the schema (four typed fields), provides an example, forbids extras, and specifies null handling. Use it as a template and adapt the schema to your needs.

Checklist: Do You Understand This?

Can you explain why structured output is more useful than free-form text for automation?
Can you name the five universal rules for prompting any structured format?
Can you write a prompt that reliably produces JSON with a defined schema?
Can you list three common pitfalls and their fixes?
Can you explain the difference between prompt-based and API-level schema enforcement?
Can you describe the "tool use trick" for getting structured output?
Given a real task, can you choose the right output format from the table above?