Intermediate

Quality Bar & Review Standards

AI output looks polished. Fluent prose, well-formatted code, confident assertions — the surface signals of quality are all there. The problem is that those signals are present whether the content is correct or not. Setting an explicit quality bar means defining what "done and correct" actually looks like before you publish, ship, or send.

Why AI Output Needs a Quality Bar

The problem with fluency

Hallucinated facts read identically to correct ones — there is no visual error signal
AI writes in a confident, authoritative register regardless of accuracy
Structural quality (good headings, clear paragraphs) masks content quality problems
The model cannot tell you when it is uncertain — hedging language is stylistic, not calibrated
Code compiles and looks sensible but fails on real inputs or has security flaws

What a quality bar provides

Consistent evaluation criteria that do not change based on how good the output looks
A checklist you can follow even when under time pressure to just ship it
Shared standards across a team so everyone reviews to the same level
Accountability — you own the output, and the quality bar is the standard you hold yourself to
Early catch of systematic errors before they compound across multiple deliverables

The Review Mindset: Reviewer Before Editor

Most people read AI output as an editor — improving phrasing, adjusting tone, cutting length. That is the wrong first pass. Start as a reviewer: is this correct, complete, and appropriate? Only then become an editor.

Role	Questions asked	What gets caught
Reviewer (first)	Is this correct? Is anything missing? Is it appropriate for the audience and purpose?	Factual errors, missing caveats, wrong scope, inappropriate tone for context
Editor (second)	Is this clear? Is it too long? Does the structure serve the reader?	Wordiness, unclear sentences, poor structure, AI voice tells
Owner (always)	Would I be comfortable defending every claim in this if challenged?	Anything you cannot vouch for — your name is on it

Skipping the reviewer pass and going straight to editing is the most common quality failure. You can make hallucinated content flow beautifully and still publish it wrong.

Output Rubrics by Type

Different output types have different quality criteria. A rubric makes those criteria explicit so you apply them consistently, not just when something feels off.

Documents & reports

Factual accuracy: every specific claim can be traced to a verifiable source
Source attribution: statistics have a date, origin, and correct context
Logical coherence: conclusions follow from evidence; no non-sequiturs
Appropriate scope: answers the actual question; does not pad with adjacent topics
Tone: matches the audience — formal for board, practical for ops, plain for general
Caveats present: limitations and uncertainties are stated, not omitted

Code

Runs without error: executed against real inputs, not just visually inspected
Edge cases handled: empty input, nulls, boundary values, concurrent calls
No security anti-patterns: no SQL injection, no exposed secrets, input validation present
Readable: a team member who did not write this can understand it without a walkthrough
Error handling: fails gracefully with useful messages, not silent failures
Dependencies: no libraries imported without understanding what they do and their security posture

Presentations & decks

Data accuracy: every chart, statistic, and figure verified against source
Correct attribution: data points labelled with year and source, not just asserted
Narrative coherence: the story arc is yours, not a generic AI framing of the topic
No AI voice tells: no corporate filler, no hedge-everything hedging, no bullet overload
Defensible: you would be comfortable answering a question about any claim on any slide

Research & analysis

Sources exist: every cited paper, report, or study was verified to exist and is accessible
Claims match sources: the source actually says what AI claims it says
Methodology valid: the analytical approach is appropriate for the question being answered
Limitations stated: the analysis acknowledges what it cannot know, not just what it found
No circularity: AI was not asked to verify its own output — verification was independent

Red Flags Checklist

These patterns in AI output are reliable indicators of quality problems. When you see them, slow down — something in the content likely needs verification or rework.

Content red flags

Suspicious specificity: precise statistics ("67.3% of companies") without a source — AI fabricates plausible numbers
Hedging without substance: "there are various factors to consider" — says nothing; filler where analysis should be
Overly comprehensive lists: 12-bullet lists that cover everything but say nothing specific about your situation
Generic advice: recommendations that could apply to any situation — not calibrated to your context or constraints
Confident future predictions: specific claims about what will happen — AI cannot predict; this is confabulation

Style red flags (AI voice tells)

Passive voice overuse: "It is important to note that..." / "It should be mentioned..." — no subject, no accountability
Corporate filler openers: "In today's rapidly evolving landscape..." / "In conclusion, it is clear that..."
Transition sentence clusters: "Now that we have covered X, let us turn to Y" — reader does not need narration
Symmetric structure at all costs: three points in every section, each exactly two sentences — artificial balance imposed on uneven ideas
Emphatic adjectives: "crucial", "fundamental", "transformative", "comprehensive" used generically — signals filler

Acceptance Criteria Templates

Acceptance criteria are the definition of "done and correct" for a specific deliverable. Agree on them before starting AI-assisted work — not after seeing the output.

Deliverable	Must pass	Must not
Customer-facing article	All facts verified; all statistics attributed; read by a non-author before publish	Include AI voice tells; assert specific statistics without a source link
Internal analysis	Conclusions traceable to cited data; limitations section present; methodology stated	Cite sources that have not been opened and verified; omit contrary evidence
Production code	Tests pass on CI; security review complete; reviewed by at least one other engineer	Import libraries not vetted for licence and security; leave TODOs in shipped paths
Board or exec presentation	Every data point verified; narrative reviewed by subject matter expert; defensible under questioning	Use unattributed statistics; include forward-looking claims framed as fact

Review Workflow

Who reviews, and at what depth, should scale with the stakes of the output — not with how confident the AI output looks.

Self-review (always)

Apply the reviewer-before-editor sequence
Check every factual claim against the red flags list
Verify sources independently — do not ask AI to verify its own output
Read aloud to catch AI voice tells your eyes skip over

Peer review (consequential)

Required for anything customer-facing or published externally
Reviewer applies the same output rubric — not just a general read
Share the acceptance criteria with the reviewer, not just the output
Peer is checking for correctness, not just style — agree on that upfront

SME review (specialist content)

Required when content requires domain expertise to evaluate
Legal, medical, financial, security, or technical specialist review
SME is verifying domain accuracy — not just reading for comprehension
Document that SME review occurred before publication or deployment

The review shortcut trap

The most common shortcut is skipping formal review because the output "looks good". AI output almost always looks good. That is the point. The quality bar is not a check for whether the output is polished — it is a check for whether it is correct. Polished and correct are independent dimensions. Do not conflate them.

Checklist: Do You Understand This?

Why does fluent, well-formatted AI output not indicate that the content is correct?
What is the reviewer-before-editor sequence — and what does skipping the reviewer pass risk?
Name four rubric criteria for evaluating AI-assisted research or analysis.
Name three red flags that indicate potential content quality problems in AI output.
What is the difference between a self-review, a peer review, and an SME review — when does each apply?
Write an acceptance criteria statement for an AI-assisted internal analysis document.