Beginner

Prompt Patterns

A prompt pattern is a reusable strategy for structuring your interaction with an AI model. Each pattern is designed for a specific type of task — from simple questions to complex multi-step reasoning. Knowing which pattern to reach for is what separates effective AI users from those who get inconsistent results.

Pattern Overview

Here are the major prompt patterns, ordered from simplest to most advanced. You do not need to memorize them all — start with the first three, then learn the others as you encounter tasks that need them.

Pattern	Core Idea	Best For
Zero-Shot	Just ask — no examples	Simple, well-defined tasks
Few-Shot	Show examples first	Consistent formatting, domain-specific behavior
Chain-of-Thought	Think step by step	Math, logic, analytical reasoning
Chain-of-Questions	Ask sub-questions first	Multi-hop factual questions
Self-Consistency	Multiple paths, majority vote	Math, logic with uncertain answers
Tree of Thoughts	Explore and backtrack	Creative problem-solving, puzzles
ReAct	Think, act, observe	Tool-using agents, research tasks
Rubric-Based	Define success criteria	Evaluation, quality control
Critique & Refine	Generate, critique, improve	Writing, code, iterative improvement
Prompt Chaining	Break into sequential steps	Complex multi-stage workflows
Meta-Prompting	AI writes the prompt	Optimizing prompts at scale

Zero-Shot

No examples — just ask

→

Few-Shot

2–5 examples first

→

Chain-of-Thought

"Think step by step"

→

ReAct

Reason + take action

Prompt patterns from simplest to most powerful — start with zero-shot and escalate when you need better results

Zero-Shot Prompting

Zero-shot means giving the model a task with no examples. You simply describe what you want and let the model figure it out from its training.

Example:

Classify this review as Positive, Negative, or Neutral:

"The battery life is incredible but the screen is dim."

When to use it: Simple tasks where the model's default behavior is sufficient — classification, translation, summarization, straightforward Q&A. Modern large models (GPT-4, Claude, Gemini) have strong zero-shot capabilities out of the box.

When to skip it: If the model gives inconsistent formatting, misinterprets the task, or you need domain-specific output styles — switch to few-shot.

Few-Shot Prompting

Few-shot means providing 1-8 examples of input-output pairs before your actual query. The model uses these examples to learn the pattern — tone, format, depth, and structure — through what is called in-context learning.

Example:

Review: "Great product, fast shipping!" → Sentiment: Positive

Review: "Broke after one week." → Sentiment: Negative

Review: "It works as described." → Sentiment: Neutral

Review: "The battery is amazing but the screen is dim." → Sentiment: ?

Why it works: Examples calibrate the model far more precisely than instructions alone. Showing two examples of your exact desired format produces higher consistency than paragraphs of explanation.

Best practices:

1-3 examples is the sweet spot for most tasks
Make examples diverse — cover different edge cases
Only show desired behavior — never include "bad examples"
Keep example formatting identical to what you want in the output

When to skip it: When you are token-constrained (examples consume context window), when zero-shot already gives good results, or when too many examples cause the model to overfit to the demonstrated patterns.

Chain-of-Thought (CoT)

Chain-of-thought prompting instructs the model to reason step by step before giving a final answer. This forces the model to show its work, which reduces errors on reasoning-heavy tasks.

Zero-shot CoT (simplest form):

Roger has 5 tennis balls. He buys 2 cans of 3 balls each.

How many tennis balls does he have now?

Let's think step by step.

Model response:

Step 1: Roger starts with 5 balls.

Step 2: He buys 2 cans × 3 balls = 6 balls.

Step 3: 5 + 6 = 11 balls total.

Answer: 11

When to use it: Multi-step math, logic puzzles, analytical reasoning, code debugging, any task where the answer depends on intermediate steps.

Important: Diminishing Returns (2025 Research)

A 2025 study from Wharton's Generative AI Lab found that chain-of-thought is not universally beneficial with modern models:

Non-reasoning models

Modest improvements (4-14% gains), but CoT introduced more variability — sometimes causing errors on questions the model previously answered correctly. Response time increased 35-600%.

Reasoning models (o3, o4, Gemini 2.5)

Minimal gains (2-3%) and in some cases decreased performance. These models already reason internally without explicit prompting — adding "think step by step" is redundant.

Takeaway: Use CoT selectively for genuinely complex reasoning tasks, not as a blanket technique. For simple questions or with reasoning models, skip it.

Chain-of-Questions (Self-Ask)

Instead of reasoning through statements, the model explicitly generates sub-questions it needs to answer before tackling the main question. This is particularly effective for multi-hop factual queries where the answer requires combining information from different domains.

Example:

Q: Who was president when the inventor of the telephone was born?

Sub-Q1: Who invented the telephone? → Alexander Graham Bell

Sub-Q2: When was Bell born? → 1847

Sub-Q3: Who was US president in 1847? → James K. Polk

Final answer: James K. Polk

When to use it: Compositional questions, multi-hop reasoning, any question that naturally decomposes into sub-questions. Especially useful when combined with search tools (the model can look up each sub-answer).

Self-Consistency

Self-consistency generates multiple reasoning paths for the same question (using randomness in generation), then selects the most common final answer via majority voting. It is essentially "ask the same question several times and go with the consensus."

Example (5 reasoning paths):

Q: "When I was 6, my sister was half my age. Now I'm 70. How old is she?"

Path 1 → 67 Path 2 → 67 Path 3 → 35 Path 4 → 67 Path 5 → 67

Majority vote: 67 (correct)

When to use it: Math problems, logic puzzles, commonsense reasoning — any task where the model might take a wrong reasoning path but the correct answer is more likely overall. Benchmark improvements are significant: 12-18% gains on standard math datasets.

Trade-off: Requires multiple API calls per question, so it costs 3-5x more. Use it when accuracy matters more than cost.

Tree of Thoughts (ToT)

Tree of Thoughts extends chain-of-thought by maintaining multiple parallel reasoning branches. At each step, the model generates several candidate "thoughts," evaluates which ones are most promising, and can backtrack from dead ends — something standard CoT cannot do.

Example — Game of 24 (use 4, 5, 6, 10 to make 24):

Branch A: 10 - 4 = 6 → 6 + 6 = 12 → dead end, backtrack

Branch B: 5 × 4 = 20 → 20 + 10 - 6 = 24 → success!

Branch C: 6 - 5 = 1 → unlikely to reach 24, prune

When to use it: Creative problem-solving, puzzles, strategic planning, tasks where the first approach might be wrong and exploration is needed. In benchmarks, ToT achieved 74% solve rate vs. 9% for standard CoT on the Game of 24.

When to skip it: Simple linear reasoning tasks, token-sensitive applications (ToT is expensive), or when the problem does not benefit from exploring multiple paths.

ReAct (Reasoning + Acting)

ReAct alternates between three phases: Thought (reasoning about the current state), Action (calling an external tool — search, calculator, database, API), and Observation (processing the tool's result). This loop repeats until the task is complete.

Example:

Q: What is the current GDP of the country where the Eiffel Tower is?

Thought: I need to find which country has the Eiffel Tower.

Action: Search("Eiffel Tower location") → France

Thought: Now I need France's current GDP.

Action: Search("France GDP 2025") → ~$3.1 trillion

Answer: France's GDP is approximately $3.1 trillion.

ReAct is the foundational pattern behind AI agents. If you have used an AI assistant that searches the web, runs code, or calls APIs, it was using a ReAct-style loop. Agent frameworks like LangChain, CrewAI, and AutoGen are all built on this pattern.

When to use it: Tasks requiring up-to-date information, fact-checking, multi-step research, any scenario where the model needs to interact with external tools.

When to skip it: Tasks that only need the model's internal knowledge, or when tool call latency is unacceptable.

Rubric-Based Prompting

Rubric-based prompting gives the model explicit evaluation criteria that define what a good response looks like — with specific dimensions and quality levels. This pattern works both for generating content (the model aims to meet the rubric) and for evaluating content (the model scores against the rubric).

Example:

Write a product description. It will be evaluated on:

1. Accuracy — all claims must be factually correct

2. Persuasiveness — include at least 2 emotional appeals

3. Brevity — under 100 words

4. Call-to-action — end with a clear CTA

When to use it: Content quality assurance, LLM-as-a-judge evaluations, any task where "good" needs precise, measurable definition. Particularly powerful when combined with the critique-and-refine pattern.

When to skip it: Exploratory or creative tasks where rigid criteria would limit useful outputs, or quick informal interactions.

Critique & Refine (Self-Refine)

Critique and refine is a three-step loop: (1) generate an initial output, (2) critique it against specific criteria, (3) revise based on the critique. This loop can repeat multiple times, with each iteration improving quality.

Example:

Step 1 — Generate: "Write an email declining a meeting invitation."

Step 2 — Critique: "Review your email. Is it professional? Does it suggest an alternative time? Is it under 5 sentences?"

Step 3 — Refine: "Now rewrite the email addressing any issues found in the critique."

Research from Google (2025) found that self-refinement reduced code errors by 30%. The key is providing a specific checklist for the critique step — vague instructions like "make it better" are unreliable.

When to use it: Writing tasks, code generation, any output that benefits from iteration. Works especially well when you provide a rubric for the critique step.

When to skip it: Simple factual queries, when latency/cost of multiple passes is a problem, or for tasks where one pass is reliably good enough.

Prompt Chaining

Prompt chaining breaks a complex task into a sequence of smaller, focused prompts where each prompt's output feeds into the next. Unlike chain-of-thought (which reasons in a single prompt), chaining uses separate AI calls for each step.

Example — Contract analysis pipeline:

Prompt 1: "Extract all dates and monetary amounts from this contract."

→ structured data

Prompt 2: "Given these extracted terms, identify clauses with penalties over $10,000."

→ filtered list

Prompt 3: "Summarize the high-risk clauses for a legal review."

→ executive summary

When to use it: Multi-step workflows (extract → analyze → format), tasks where the model loses focus in long prompts, data transformation pipelines, or when different steps need different instructions or even different models.

When to skip it: Simple single-step tasks, when latency from multiple API calls is unacceptable, or when all the context from earlier steps is needed simultaneously in later steps.

Meta-Prompting

Meta-prompting uses the AI to generate, improve, or optimize prompts themselves. You ask the model to write a better prompt for a given task, then use that improved prompt for your actual work.

Example:

"I need to classify customer support tickets into: Billing, Technical,

Feature Request, or Complaint. Write me an optimized prompt

with the best examples and instructions for this task."

This has been industrialized with tools like DSPy, which programmatically optimizes prompts by bootstrapping few-shot examples from data. DSPy has shown accuracy improvements from 46% to 64% on evaluation tasks through automated prompt optimization.

When to use it: When you cannot get good results and want the model to suggest improvements, when building production AI systems at scale, or when optimizing for cost by finding shorter but equally effective prompts.

Emerging Patterns (2025-2026)

The field continues to evolve. Here are patterns gaining traction:

Context Engineering

The major paradigm shift of 2025. Rather than optimizing individual prompts, context engineering designs the entire information architecture surrounding the model — system prompts, dynamically injected context (RAG), conversation history management, tool definitions, and output schemas. This is now considered the real competitive advantage in production AI systems.

Adaptive Thinking

Models like Claude now offer adaptive thinking, where the model dynamically decideshow much to reason based on task complexity. Instead of you prescribing "think step by step," the model self-regulates its reasoning depth. This often outperforms explicit chain-of-thought prompting.

Skeleton of Thought

The model first generates an outline (skeleton) of its response, then fills in each section. This produces better-structured outputs and can achieve up to 2.4x speedup by enabling parallel generation of sections.

Defensive Prompting

Wrapping user inputs in structured, guarded templates that limit model misbehavior even under adversarial input. This is a standard security practice in production systems — think of it as prompt-level input validation.

How to Choose the Right Pattern

Use this decision guide:

Is the task simple and well-defined?
→ Start with zero-shot. If results are inconsistent, add examples (few-shot).

Does it require step-by-step reasoning?
→ Use chain-of-thought (but skip it for reasoning models that already think internally).

Does it combine facts from multiple domains?
→ Use chain-of-questions.

Does accuracy matter more than cost?
→ Use self-consistency (multiple paths + majority vote).

Does it need external information or tools?
→ Use the ReAct pattern.

Is the output quality critical and needs iteration?
→ Use critique & refine, ideally with a rubric.

Is the task too complex for a single prompt?
→ Use prompt chaining.

Are you building prompts for a production system?
→ Use meta-prompting or DSPy to optimize systematically.

Combining Patterns

Patterns are not mutually exclusive — in practice, you often combine them:

Few-shot + CoT: Provide examples that include step-by-step reasoning
ReAct + CoT: The thought phase of ReAct is essentially chain-of-thought
Rubric + Critique: Use the rubric as the checklist for the critique step
Prompt chaining + Few-shot: Each step in the chain uses few-shot examples tuned for that specific subtask
Self-consistency + CoT: Generate multiple CoT reasoning paths, then take the majority vote

Checklist: Do You Understand This?

Can you explain the difference between zero-shot and few-shot prompting?
Can you describe when chain-of-thought is helpful and when it is not?
Can you explain the ReAct pattern and why it matters for AI agents?
Can you name a scenario where self-consistency would be worth the extra cost?
Can you describe how critique-and-refine works and when to use it?
Given a new task, can you choose the right pattern from the decision guide?