Advanced

Post-Incident Review

A post-incident review (PIR) — sometimes called a post-mortem or retrospective — is a structured analysis conducted after an AI incident is resolved. Its purpose is not to assign blame, but to understand what happened, why it happened, and what must change so it does not happen again. For AI systems, the PIR must account for causes that are inherently probabilistic, distributed across training-time and deployment-time decisions, and often the result of systemic gaps rather than individual errors.

Blameless Culture in AI Incidents

A blameless postmortem culture, popularised by Google SRE, is even more important for AI incidents than for software incidents. This is because AI failures frequently arise from:

Complex training pipelines with many contributors, none of whom individually caused the failure
Statistical behaviour that no single engineer could have predicted from code review alone
Data labelling or collection decisions made months before the incident by people not involved in deployment
Organisational decisions (cut scope of evaluation, skip bias testing to meet deadline) that look reasonable at the time but contributed to the failure

Blameless does not mean consequence-free. If policies were deliberately violated or reasonable precautions were knowingly skipped, accountability is appropriate. Blameless means the review focuses on systems,processes, and decisions — not on individual people as the root cause.

Five-Whys Adapted for AI Failures

The five-whys technique — asking "why?" repeatedly until a root cause is reached — must be adapted for AI systems to avoid stopping prematurely at proximate technical causes.

Why level	Proximate answer	AI-adapted deeper question
Why 1	The model produced incorrect outputs	What type of failure? Hallucination, drift, bias, adversarial manipulation?
Why 2	The model was not validated on this input type	Was this input type foreseeable? Was it in scope? Did we have test data for it?
Why 3	The evaluation process did not include this scenario	Why not? Was it an oversight, a resource constraint, or a process gap?
Why 4	There was no checklist item requiring evaluation coverage of edge cases	Is this a single checklist gap or a systematic gap in the evaluation process?
Why 5	The evaluation process was not reviewed against the AI risk taxonomy	Root cause: the AI governance framework did not connect risk taxonomy to evaluation requirements

Systemic vs One-Off Causes

Every PIR must determine whether the cause was systemic (affects the full AI development lifecycle and will produce more incidents if not addressed) or one-off (an isolated circumstance unlikely to recur). Getting this wrong in either direction is costly: treating systemic causes as one-offs means the incident recurs; treating one-offs as systemic triggers expensive process overhaul that provides no benefit.

Indicators of a systemic cause

Similar incidents have occurred before (check the incident register)
The root cause is a process, policy, or tooling gap that affects all AI systems — not just this one
The failure would have occurred with any model trained under the same process
Multiple teams reported the same type of near-miss in the past 12 months

Indicators of a one-off cause

The failure required a rare combination of circumstances that are unlikely to recur
It was caused by a specific external event (a sudden change in user behaviour, a third-party data outage)
No similar incidents in the history of this or comparable systems
The root cause was already addressed as part of remediation with no further process changes needed

PIR Process and Output

PIR meeting participants

Model owner (required)
On-call responders who handled the incident
Representatives from teams whose systems or decisions contributed to the failure
AI governance function (required for P1/P2 incidents)
Legal/compliance if regulatory obligations were triggered
A PIR facilitator who was not directly involved in the incident — to maintain objectivity

PIR document structure

Incident summary: What happened, when, who was affected, severity classification
Timeline: Detailed sequence of events from first signal to resolution
Impact assessment: Quantified harm: number of affected individuals, financial impact, regulatory exposure, reputational damage
Root cause analysis: Five-whys or equivalent; contributing factors; systemic vs one-off determination
What went well: Detection mechanisms, response actions, and decisions that worked as intended
Action items: Specific, owned, time-bound improvements — see below
Sign-off: AI risk owner and governance function confirm the PIR is complete and action items are tracked

Action Items: Turning PIRs into Governance Improvements

A PIR that identifies systemic causes but does not produce binding action items is a waste of time. Every systemic root cause must map to at least one action item with:

Field	Content
Action	Specific, observable change — not "improve bias testing" but "add demographic parity gate to CI/CD pipeline for all classification models by Q3"
Owner	Named individual, not a team. One person is accountable.
Due date	Specific date. P1 systemic fixes: 30 days. P2: 60 days. Others: 90 days default.
Verification	How will completion be verified? Code review merged? New test suite passing? Policy document reviewed and signed off?
Linked incidents	If this action item addresses the root cause of multiple incidents, link them all — useful for tracking whether the fix actually worked

Tracking Recurrence

The ultimate measure of a PIR's effectiveness is whether the same or similar incident recurs. Track this formally:

Tag incidents with root cause categories in the incident register — allows pattern detection across incidents over time
At each PIR, check: have we had incidents with the same root cause before? If yes, the previous PIR's action items either were not completed or were not effective
30/60/90 day follow-up reviews on P1 and P2 action items — confirm implementation was completed and that early metrics show the fix is working
Annual governance review: examine the incident register for patterns — which root cause categories account for most incidents? Prioritise systemic improvements accordingly.

Checklist: Do You Understand This?

What does "blameless" mean in the context of an AI postmortem, and why is it particularly important for AI incidents?
Apply the five-whys technique to the scenario: "the model denied a loan to a qualified applicant from a protected group."
What indicators suggest a root cause is systemic rather than one-off?
What must every action item in a PIR include to be effective?
How do you determine whether a previous PIR's remediation actually worked?