Intermediate

Human-in-the-Loop

Human-in-the-loop (HITL) is the intentional design of AI workflows to include human oversight at critical decision points. It is not a workaround for unreliable AI — it is a first-class design primitive for building trustworthy, accountable, and safe systems. This page covers when to add human checkpoints, the five HITL interaction patterns, and how to implement them in production frameworks.

Agent Step

AI processes task

→

Approval Gate

Human reviews

→

Approve / Reject

With feedback

→

Continue

Or revise + retry

Human-in-the-loop — approval gates at critical points in the agent workflow

Why HITL Is a Design Choice, Not a Failure

A fully autonomous agent that never asks for human input is not always the goal. For tasks that are irreversible, high-stakes, customer-facing, or compliance-sensitive, a human checkpoint is the correct design. HITL workflows are how trust is built over time: start with more human oversight, measure agent reliability, and automate checkpoints away only as confidence grows.

Design HITL checkpoints for:

Irreversible actions (delete, send, charge, publish)
Actions with high blast radius (affects many users or records)
Decisions where agent confidence is demonstrably low
Compliance-required sign-offs (financial, medical, legal)
First-time action patterns (agent doing something it has not done before)
Customer-facing communications where tone and accuracy matter

Skip HITL when:

Action is fully reversible and low-impact (read, search, draft)
Agent reliability for this action type is demonstrated and measured
Volume is too high for human review to be economically viable
Latency requirements make synchronous human approval impossible
Humans are rubber-stamping every approval without reading — this degrades safety

The Five HITL Interaction Patterns

1. Approve / Reject

Agent proposes an action with all relevant context. Human approves (agent proceeds as planned) or rejects (agent stops or tries an alternative). This is the most common HITL pattern.

Example: agent drafts a contract clause → legal reviewer approves before it is emailed to client

2. Approve with Edit

Human can modify the agent's proposed action before it executes. The modified version is passed back to the agent. This is more powerful than binary approve/reject because it allows correction without full restart.

Example: agent generates an email draft → human edits the subject line and tone → agent sends the edited version

3. Provide Input (Clarification)

Agent pauses because it lacks information needed to proceed. Human provides the missing input. Agent resumes with that input injected into context. This is different from approval — the agent is not proposing an action, it is requesting a fact.

Example: agent encounters an ambiguous identifier → asks human "did you mean project A or project B?" → human answers → agent continues

4. Override / Escalate

Agent detects it cannot safely complete the task and escalates to a human to take over. Unlike the patterns above, the human does not just approve — they complete the task themselves or delegate differently.

Example: agent processing a medical data request realises it crosses a compliance boundary → escalates to compliance officer rather than attempting to proceed

5. Async Review (Monitor)

Agent completes the task without waiting for approval, but every action is logged and delivered to a human reviewer asynchronously. Human can audit, flag, or reverse actions after the fact. This is not a checkpoint — it is an audit trail with rollback capability.

Example: AI agent triages and tags 500 support tickets/hour → human reviewer scans a dashboard and can re-tag or escalate after the fact

Implementation: Framework Patterns

LangGraph — interrupt()

LangGraph provides a first-class interrupt() primitive that pauses graph execution at any node, persists state to a checkpoint store, and waits for external input. When the human responds, the graph resumes exactly where it paused — no replay of earlier steps.

LangGraph HITL flow:

Graph node calls interrupt(value) with proposed action and context
Graph execution suspends; state is checkpointed to a store (Redis, Postgres, memory)
Application delivers the interrupt value to a human via UI, Slack, email, etc.
Human responds: approve, edit, or reject
Application calls graph.update_state(thread_id, human_response)
Graph resumes from the checkpoint; human response is in state for the next node

The graph is stateless between suspend and resume — only the checkpoint store maintains state. This makes HITL workflows resumable across server restarts.

OpenAI Agents SDK

The OpenAI Agents SDK (March 2025) provides a human_input_callback hook. When a tool flagged as requiring approval is called, the SDK invokes the callback, passing the tool name and arguments. The callback returns the human's decision, which the SDK uses to continue or abort.

HumanLayer SDK

HumanLayer is an open-source SDK designed specifically for HITL middleware. It provides decorators that intercept tool function calls, route them to configurable approval channels (Slack, email, web UI), and block until a response is received. It integrates with LangChain, LlamaIndex, and plain Python agents.

HumanLayer pattern (conceptual):

@require_approval(channel="slack", channel_id="#approvals")

def send_email(to: str, subject: str, body: str):

...

The decorator intercepts the call, posts to Slack with approve/reject buttons, and blocks until a human responds. No graph framework required.

Design Rules for HITL Systems

What to always include in an approval request:

What the agent plans to do — specific action with arguments, not just category
Why it plans to do it — the reasoning step that led to this action
Estimated impact — how many records, users, or systems are affected
Reversibility — can this be undone, and how?
Deadline — when will the request expire (timeout = cancel, not proceed)

HITL failure modes to avoid

Approval fatigue: too many checkpoints → humans rubber-stamp without reading → HITL provides false safety
Timeout = proceed: if timeout triggers execution, an attacker or slow reviewer creates a window to delay review and let harmful actions run
Insufficient context in request: reviewer cannot make an informed decision, approves by default
No escalation path: reviewer cannot delegate or escalate; bottleneck on single person
HITL on wrong actions: adding approval to low-risk reads while leaving destructive actions ungated

Synchronous vs Asynchronous HITL

Dimension	Synchronous	Asynchronous
Agent waits for	Human to respond before continuing	Nothing — continues while human reviews
Best for	Irreversible actions; user-facing conversational agents	Audit/monitoring; high-volume tasks with rollback support
Human latency impact	Directly delays task completion	No delay to task; review happens afterward
Framework support	LangGraph interrupt(), HumanLayer, Temporal signal	Logging + dashboard (LangSmith, Langfuse) + rollback APIs

Checklist: Do You Understand This?

What are the five HITL interaction patterns and what is the key difference between "Approve/Reject" and "Provide Input"?
How does LangGraph's interrupt() work — what happens to graph state when execution is suspended?
What is approval fatigue and why does it undermine the safety guarantee of HITL?
Why should a timeout on an approval request result in cancellation rather than proceeding?
What five pieces of information should always be included in a HITL approval request?
When would you choose asynchronous HITL (monitor/audit) over synchronous HITL (blocking approval)?