Human-in-the-Loop
Human-in-the-loop (HITL) is the intentional design of AI workflows to include human oversight at critical decision points. It is not a workaround for unreliable AI — it is a first-class design primitive for building trustworthy, accountable, and safe systems. This page covers when to add human checkpoints, the five HITL interaction patterns, and how to implement them in production frameworks.
Why HITL Is a Design Choice, Not a Failure
A fully autonomous agent that never asks for human input is not always the goal. For tasks that are irreversible, high-stakes, customer-facing, or compliance-sensitive, a human checkpoint is the correct design. HITL workflows are how trust is built over time: start with more human oversight, measure agent reliability, and automate checkpoints away only as confidence grows.
Design HITL checkpoints for:
- Irreversible actions (delete, send, charge, publish)
- Actions with high blast radius (affects many users or records)
- Decisions where agent confidence is demonstrably low
- Compliance-required sign-offs (financial, medical, legal)
- First-time action patterns (agent doing something it has not done before)
- Customer-facing communications where tone and accuracy matter
Skip HITL when:
- Action is fully reversible and low-impact (read, search, draft)
- Agent reliability for this action type is demonstrated and measured
- Volume is too high for human review to be economically viable
- Latency requirements make synchronous human approval impossible
- Humans are rubber-stamping every approval without reading — this degrades safety
The Five HITL Interaction Patterns
1. Approve / Reject
Agent proposes an action with all relevant context. Human approves (agent proceeds as planned) or rejects (agent stops or tries an alternative). This is the most common HITL pattern.
Example: agent drafts a contract clause → legal reviewer approves before it is emailed to client
2. Approve with Edit
Human can modify the agent's proposed action before it executes. The modified version is passed back to the agent. This is more powerful than binary approve/reject because it allows correction without full restart.
Example: agent generates an email draft → human edits the subject line and tone → agent sends the edited version
3. Provide Input (Clarification)
Agent pauses because it lacks information needed to proceed. Human provides the missing input. Agent resumes with that input injected into context. This is different from approval — the agent is not proposing an action, it is requesting a fact.
Example: agent encounters an ambiguous identifier → asks human "did you mean project A or project B?" → human answers → agent continues
4. Override / Escalate
Agent detects it cannot safely complete the task and escalates to a human to take over. Unlike the patterns above, the human does not just approve — they complete the task themselves or delegate differently.
Example: agent processing a medical data request realises it crosses a compliance boundary → escalates to compliance officer rather than attempting to proceed
5. Async Review (Monitor)
Agent completes the task without waiting for approval, but every action is logged and delivered to a human reviewer asynchronously. Human can audit, flag, or reverse actions after the fact. This is not a checkpoint — it is an audit trail with rollback capability.
Example: AI agent triages and tags 500 support tickets/hour → human reviewer scans a dashboard and can re-tag or escalate after the fact
Implementation: Framework Patterns
LangGraph — interrupt()
LangGraph provides a first-class interrupt() primitive that pauses graph execution at any node, persists state to a checkpoint store, and waits for external input. When the human responds, the graph resumes exactly where it paused — no replay of earlier steps.
LangGraph HITL flow:
- Graph node calls
interrupt(value)with proposed action and context - Graph execution suspends; state is checkpointed to a store (Redis, Postgres, memory)
- Application delivers the interrupt value to a human via UI, Slack, email, etc.
- Human responds: approve, edit, or reject
- Application calls
graph.update_state(thread_id, human_response) - Graph resumes from the checkpoint; human response is in state for the next node
The graph is stateless between suspend and resume — only the checkpoint store maintains state. This makes HITL workflows resumable across server restarts.
OpenAI Agents SDK
The OpenAI Agents SDK (March 2025) provides a human_input_callback hook. When a tool flagged as requiring approval is called, the SDK invokes the callback, passing the tool name and arguments. The callback returns the human's decision, which the SDK uses to continue or abort.
HumanLayer SDK
HumanLayer is an open-source SDK designed specifically for HITL middleware. It provides decorators that intercept tool function calls, route them to configurable approval channels (Slack, email, web UI), and block until a response is received. It integrates with LangChain, LlamaIndex, and plain Python agents.
HumanLayer pattern (conceptual):
@require_approval(channel="slack", channel_id="#approvals")
def send_email(to: str, subject: str, body: str):
...
The decorator intercepts the call, posts to Slack with approve/reject buttons, and blocks until a human responds. No graph framework required.
Design Rules for HITL Systems
What to always include in an approval request:
- What the agent plans to do — specific action with arguments, not just category
- Why it plans to do it — the reasoning step that led to this action
- Estimated impact — how many records, users, or systems are affected
- Reversibility — can this be undone, and how?
- Deadline — when will the request expire (timeout = cancel, not proceed)
HITL failure modes to avoid
- Approval fatigue: too many checkpoints → humans rubber-stamp without reading → HITL provides false safety
- Timeout = proceed: if timeout triggers execution, an attacker or slow reviewer creates a window to delay review and let harmful actions run
- Insufficient context in request: reviewer cannot make an informed decision, approves by default
- No escalation path: reviewer cannot delegate or escalate; bottleneck on single person
- HITL on wrong actions: adding approval to low-risk reads while leaving destructive actions ungated
Synchronous vs Asynchronous HITL
| Dimension | Synchronous | Asynchronous |
|---|---|---|
| Agent waits for | Human to respond before continuing | Nothing — continues while human reviews |
| Best for | Irreversible actions; user-facing conversational agents | Audit/monitoring; high-volume tasks with rollback support |
| Human latency impact | Directly delays task completion | No delay to task; review happens afterward |
| Framework support | LangGraph interrupt(), HumanLayer, Temporal signal | Logging + dashboard (LangSmith, Langfuse) + rollback APIs |
Checklist: Do You Understand This?
- What are the five HITL interaction patterns and what is the key difference between "Approve/Reject" and "Provide Input"?
- How does LangGraph's
interrupt()work — what happens to graph state when execution is suspended? - What is approval fatigue and why does it undermine the safety guarantee of HITL?
- Why should a timeout on an approval request result in cancellation rather than proceeding?
- What five pieces of information should always be included in a HITL approval request?
- When would you choose asynchronous HITL (monitor/audit) over synchronous HITL (blocking approval)?