Prompt Patterns
A prompt pattern is a reusable strategy for structuring your interaction with an AI model. Each pattern is designed for a specific type of task — from simple questions to complex multi-step reasoning. Knowing which pattern to reach for is what separates effective AI users from those who get inconsistent results.
Pattern Overview
Here are the major prompt patterns, ordered from simplest to most advanced. You do not need to memorize them all — start with the first three, then learn the others as you encounter tasks that need them.
| Pattern | Core Idea | Best For |
|---|---|---|
| Zero-Shot | Just ask — no examples | Simple, well-defined tasks |
| Few-Shot | Show examples first | Consistent formatting, domain-specific behavior |
| Chain-of-Thought | Think step by step | Math, logic, analytical reasoning |
| Chain-of-Questions | Ask sub-questions first | Multi-hop factual questions |
| Self-Consistency | Multiple paths, majority vote | Math, logic with uncertain answers |
| Tree of Thoughts | Explore and backtrack | Creative problem-solving, puzzles |
| ReAct | Think, act, observe | Tool-using agents, research tasks |
| Rubric-Based | Define success criteria | Evaluation, quality control |
| Critique & Refine | Generate, critique, improve | Writing, code, iterative improvement |
| Prompt Chaining | Break into sequential steps | Complex multi-stage workflows |
| Meta-Prompting | AI writes the prompt | Optimizing prompts at scale |
Zero-Shot Prompting
Zero-shot means giving the model a task with no examples. You simply describe what you want and let the model figure it out from its training.
When to use it: Simple tasks where the model's default behavior is sufficient — classification, translation, summarization, straightforward Q&A. Modern large models (GPT-4, Claude, Gemini) have strong zero-shot capabilities out of the box.
When to skip it: If the model gives inconsistent formatting, misinterprets the task, or you need domain-specific output styles — switch to few-shot.
Few-Shot Prompting
Few-shot means providing 1-8 examples of input-output pairs before your actual query. The model uses these examples to learn the pattern — tone, format, depth, and structure — through what is called in-context learning.
Why it works: Examples calibrate the model far more precisely than instructions alone. Showing two examples of your exact desired format produces higher consistency than paragraphs of explanation.
Best practices:
- 1-3 examples is the sweet spot for most tasks
- Make examples diverse — cover different edge cases
- Only show desired behavior — never include "bad examples"
- Keep example formatting identical to what you want in the output
When to skip it: When you are token-constrained (examples consume context window), when zero-shot already gives good results, or when too many examples cause the model to overfit to the demonstrated patterns.
Chain-of-Thought (CoT)
Chain-of-thought prompting instructs the model to reason step by step before giving a final answer. This forces the model to show its work, which reduces errors on reasoning-heavy tasks.
When to use it: Multi-step math, logic puzzles, analytical reasoning, code debugging, any task where the answer depends on intermediate steps.
Important: Diminishing Returns (2025 Research)
A 2025 study from Wharton's Generative AI Lab found that chain-of-thought is not universally beneficial with modern models:
Non-reasoning models
Modest improvements (4-14% gains), but CoT introduced more variability — sometimes causing errors on questions the model previously answered correctly. Response time increased 35-600%.
Reasoning models (o3, o4, Gemini 2.5)
Minimal gains (2-3%) and in some cases decreased performance. These models already reason internally without explicit prompting — adding "think step by step" is redundant.
Takeaway: Use CoT selectively for genuinely complex reasoning tasks, not as a blanket technique. For simple questions or with reasoning models, skip it.
Chain-of-Questions (Self-Ask)
Instead of reasoning through statements, the model explicitly generates sub-questions it needs to answer before tackling the main question. This is particularly effective for multi-hop factual queries where the answer requires combining information from different domains.
When to use it: Compositional questions, multi-hop reasoning, any question that naturally decomposes into sub-questions. Especially useful when combined with search tools (the model can look up each sub-answer).
Self-Consistency
Self-consistency generates multiple reasoning paths for the same question (using randomness in generation), then selects the most common final answer via majority voting. It is essentially "ask the same question several times and go with the consensus."
When to use it: Math problems, logic puzzles, commonsense reasoning — any task where the model might take a wrong reasoning path but the correct answer is more likely overall. Benchmark improvements are significant: 12-18% gains on standard math datasets.
Trade-off: Requires multiple API calls per question, so it costs 3-5x more. Use it when accuracy matters more than cost.
Tree of Thoughts (ToT)
Tree of Thoughts extends chain-of-thought by maintaining multiple parallel reasoning branches. At each step, the model generates several candidate "thoughts," evaluates which ones are most promising, and can backtrack from dead ends — something standard CoT cannot do.
When to use it: Creative problem-solving, puzzles, strategic planning, tasks where the first approach might be wrong and exploration is needed. In benchmarks, ToT achieved 74% solve rate vs. 9% for standard CoT on the Game of 24.
When to skip it: Simple linear reasoning tasks, token-sensitive applications (ToT is expensive), or when the problem does not benefit from exploring multiple paths.
ReAct (Reasoning + Acting)
ReAct alternates between three phases: Thought (reasoning about the current state), Action (calling an external tool — search, calculator, database, API), and Observation (processing the tool's result). This loop repeats until the task is complete.
ReAct is the foundational pattern behind AI agents. If you have used an AI assistant that searches the web, runs code, or calls APIs, it was using a ReAct-style loop. Agent frameworks like LangChain, CrewAI, and AutoGen are all built on this pattern.
When to use it: Tasks requiring up-to-date information, fact-checking, multi-step research, any scenario where the model needs to interact with external tools.
When to skip it: Tasks that only need the model's internal knowledge, or when tool call latency is unacceptable.
Rubric-Based Prompting
Rubric-based prompting gives the model explicit evaluation criteria that define what a good response looks like — with specific dimensions and quality levels. This pattern works both for generating content (the model aims to meet the rubric) and for evaluating content (the model scores against the rubric).
When to use it: Content quality assurance, LLM-as-a-judge evaluations, any task where "good" needs precise, measurable definition. Particularly powerful when combined with the critique-and-refine pattern.
When to skip it: Exploratory or creative tasks where rigid criteria would limit useful outputs, or quick informal interactions.
Critique & Refine (Self-Refine)
Critique and refine is a three-step loop: (1) generate an initial output, (2) critique it against specific criteria, (3) revise based on the critique. This loop can repeat multiple times, with each iteration improving quality.
Research from Google (2025) found that self-refinement reduced code errors by 30%. The key is providing a specific checklist for the critique step — vague instructions like "make it better" are unreliable.
When to use it: Writing tasks, code generation, any output that benefits from iteration. Works especially well when you provide a rubric for the critique step.
When to skip it: Simple factual queries, when latency/cost of multiple passes is a problem, or for tasks where one pass is reliably good enough.
Prompt Chaining
Prompt chaining breaks a complex task into a sequence of smaller, focused prompts where each prompt's output feeds into the next. Unlike chain-of-thought (which reasons in a single prompt), chaining uses separate AI calls for each step.
When to use it: Multi-step workflows (extract → analyze → format), tasks where the model loses focus in long prompts, data transformation pipelines, or when different steps need different instructions or even different models.
When to skip it: Simple single-step tasks, when latency from multiple API calls is unacceptable, or when all the context from earlier steps is needed simultaneously in later steps.
Meta-Prompting
Meta-prompting uses the AI to generate, improve, or optimize prompts themselves. You ask the model to write a better prompt for a given task, then use that improved prompt for your actual work.
This has been industrialized with tools like DSPy, which programmatically optimizes prompts by bootstrapping few-shot examples from data. DSPy has shown accuracy improvements from 46% to 64% on evaluation tasks through automated prompt optimization.
When to use it: When you cannot get good results and want the model to suggest improvements, when building production AI systems at scale, or when optimizing for cost by finding shorter but equally effective prompts.
Emerging Patterns (2025-2026)
The field continues to evolve. Here are patterns gaining traction:
Context Engineering
The major paradigm shift of 2025. Rather than optimizing individual prompts, context engineering designs the entire information architecture surrounding the model — system prompts, dynamically injected context (RAG), conversation history management, tool definitions, and output schemas. This is now considered the real competitive advantage in production AI systems.
Adaptive Thinking
Models like Claude now offer adaptive thinking, where the model dynamically decideshow much to reason based on task complexity. Instead of you prescribing "think step by step," the model self-regulates its reasoning depth. This often outperforms explicit chain-of-thought prompting.
Skeleton of Thought
The model first generates an outline (skeleton) of its response, then fills in each section. This produces better-structured outputs and can achieve up to 2.4x speedup by enabling parallel generation of sections.
Defensive Prompting
Wrapping user inputs in structured, guarded templates that limit model misbehavior even under adversarial input. This is a standard security practice in production systems — think of it as prompt-level input validation.
How to Choose the Right Pattern
Use this decision guide:
→ Start with zero-shot. If results are inconsistent, add examples (few-shot).
→ Use chain-of-thought (but skip it for reasoning models that already think internally).
→ Use chain-of-questions.
→ Use self-consistency (multiple paths + majority vote).
→ Use the ReAct pattern.
→ Use critique & refine, ideally with a rubric.
→ Use prompt chaining.
→ Use meta-prompting or DSPy to optimize systematically.
Combining Patterns
Patterns are not mutually exclusive — in practice, you often combine them:
- Few-shot + CoT: Provide examples that include step-by-step reasoning
- ReAct + CoT: The thought phase of ReAct is essentially chain-of-thought
- Rubric + Critique: Use the rubric as the checklist for the critique step
- Prompt chaining + Few-shot: Each step in the chain uses few-shot examples tuned for that specific subtask
- Self-consistency + CoT: Generate multiple CoT reasoning paths, then take the majority vote
Checklist: Do You Understand This?
- Can you explain the difference between zero-shot and few-shot prompting?
- Can you describe when chain-of-thought is helpful and when it is not?
- Can you explain the ReAct pattern and why it matters for AI agents?
- Can you name a scenario where self-consistency would be worth the extra cost?
- Can you describe how critique-and-refine works and when to use it?
- Given a new task, can you choose the right pattern from the decision guide?