🧠 All Things AI
Intermediate

Quality Bar & Review Standards

AI output looks polished. Fluent prose, well-formatted code, confident assertions — the surface signals of quality are all there. The problem is that those signals are present whether the content is correct or not. Setting an explicit quality bar means defining what "done and correct" actually looks like before you publish, ship, or send.

Why AI Output Needs a Quality Bar

The problem with fluency

  • Hallucinated facts read identically to correct ones — there is no visual error signal
  • AI writes in a confident, authoritative register regardless of accuracy
  • Structural quality (good headings, clear paragraphs) masks content quality problems
  • The model cannot tell you when it is uncertain — hedging language is stylistic, not calibrated
  • Code compiles and looks sensible but fails on real inputs or has security flaws

What a quality bar provides

  • Consistent evaluation criteria that do not change based on how good the output looks
  • A checklist you can follow even when under time pressure to just ship it
  • Shared standards across a team so everyone reviews to the same level
  • Accountability — you own the output, and the quality bar is the standard you hold yourself to
  • Early catch of systematic errors before they compound across multiple deliverables

The Review Mindset: Reviewer Before Editor

Most people read AI output as an editor — improving phrasing, adjusting tone, cutting length. That is the wrong first pass. Start as a reviewer: is this correct, complete, and appropriate? Only then become an editor.

RoleQuestions askedWhat gets caught
Reviewer (first)Is this correct? Is anything missing? Is it appropriate for the audience and purpose?Factual errors, missing caveats, wrong scope, inappropriate tone for context
Editor (second)Is this clear? Is it too long? Does the structure serve the reader?Wordiness, unclear sentences, poor structure, AI voice tells
Owner (always)Would I be comfortable defending every claim in this if challenged?Anything you cannot vouch for — your name is on it

Skipping the reviewer pass and going straight to editing is the most common quality failure. You can make hallucinated content flow beautifully and still publish it wrong.

Output Rubrics by Type

Different output types have different quality criteria. A rubric makes those criteria explicit so you apply them consistently, not just when something feels off.

Documents & reports

  • Factual accuracy: every specific claim can be traced to a verifiable source
  • Source attribution: statistics have a date, origin, and correct context
  • Logical coherence: conclusions follow from evidence; no non-sequiturs
  • Appropriate scope: answers the actual question; does not pad with adjacent topics
  • Tone: matches the audience — formal for board, practical for ops, plain for general
  • Caveats present: limitations and uncertainties are stated, not omitted

Code

  • Runs without error: executed against real inputs, not just visually inspected
  • Edge cases handled: empty input, nulls, boundary values, concurrent calls
  • No security anti-patterns: no SQL injection, no exposed secrets, input validation present
  • Readable: a team member who did not write this can understand it without a walkthrough
  • Error handling: fails gracefully with useful messages, not silent failures
  • Dependencies: no libraries imported without understanding what they do and their security posture

Presentations & decks

  • Data accuracy: every chart, statistic, and figure verified against source
  • Correct attribution: data points labelled with year and source, not just asserted
  • Narrative coherence: the story arc is yours, not a generic AI framing of the topic
  • No AI voice tells: no corporate filler, no hedge-everything hedging, no bullet overload
  • Defensible: you would be comfortable answering a question about any claim on any slide

Research & analysis

  • Sources exist: every cited paper, report, or study was verified to exist and is accessible
  • Claims match sources: the source actually says what AI claims it says
  • Methodology valid: the analytical approach is appropriate for the question being answered
  • Limitations stated: the analysis acknowledges what it cannot know, not just what it found
  • No circularity: AI was not asked to verify its own output — verification was independent

Red Flags Checklist

These patterns in AI output are reliable indicators of quality problems. When you see them, slow down — something in the content likely needs verification or rework.

Content red flags

  • Suspicious specificity: precise statistics ("67.3% of companies") without a source — AI fabricates plausible numbers
  • Hedging without substance: "there are various factors to consider" — says nothing; filler where analysis should be
  • Overly comprehensive lists: 12-bullet lists that cover everything but say nothing specific about your situation
  • Generic advice: recommendations that could apply to any situation — not calibrated to your context or constraints
  • Confident future predictions: specific claims about what will happen — AI cannot predict; this is confabulation

Style red flags (AI voice tells)

  • Passive voice overuse: "It is important to note that..." / "It should be mentioned..." — no subject, no accountability
  • Corporate filler openers: "In today's rapidly evolving landscape..." / "In conclusion, it is clear that..."
  • Transition sentence clusters: "Now that we have covered X, let us turn to Y" — reader does not need narration
  • Symmetric structure at all costs: three points in every section, each exactly two sentences — artificial balance imposed on uneven ideas
  • Emphatic adjectives: "crucial", "fundamental", "transformative", "comprehensive" used generically — signals filler

Acceptance Criteria Templates

Acceptance criteria are the definition of "done and correct" for a specific deliverable. Agree on them before starting AI-assisted work — not after seeing the output.

DeliverableMust passMust not
Customer-facing articleAll facts verified; all statistics attributed; read by a non-author before publishInclude AI voice tells; assert specific statistics without a source link
Internal analysisConclusions traceable to cited data; limitations section present; methodology statedCite sources that have not been opened and verified; omit contrary evidence
Production codeTests pass on CI; security review complete; reviewed by at least one other engineerImport libraries not vetted for licence and security; leave TODOs in shipped paths
Board or exec presentationEvery data point verified; narrative reviewed by subject matter expert; defensible under questioningUse unattributed statistics; include forward-looking claims framed as fact

Review Workflow

Who reviews, and at what depth, should scale with the stakes of the output — not with how confident the AI output looks.

Self-review (always)

  • Apply the reviewer-before-editor sequence
  • Check every factual claim against the red flags list
  • Verify sources independently — do not ask AI to verify its own output
  • Read aloud to catch AI voice tells your eyes skip over

Peer review (consequential)

  • Required for anything customer-facing or published externally
  • Reviewer applies the same output rubric — not just a general read
  • Share the acceptance criteria with the reviewer, not just the output
  • Peer is checking for correctness, not just style — agree on that upfront

SME review (specialist content)

  • Required when content requires domain expertise to evaluate
  • Legal, medical, financial, security, or technical specialist review
  • SME is verifying domain accuracy — not just reading for comprehension
  • Document that SME review occurred before publication or deployment

The review shortcut trap

The most common shortcut is skipping formal review because the output "looks good". AI output almost always looks good. That is the point. The quality bar is not a check for whether the output is polished — it is a check for whether it is correct. Polished and correct are independent dimensions. Do not conflate them.

Checklist: Do You Understand This?

  • Why does fluent, well-formatted AI output not indicate that the content is correct?
  • What is the reviewer-before-editor sequence — and what does skipping the reviewer pass risk?
  • Name four rubric criteria for evaluating AI-assisted research or analysis.
  • Name three red flags that indicate potential content quality problems in AI output.
  • What is the difference between a self-review, a peer review, and an SME review — when does each apply?
  • Write an acceptance criteria statement for an AI-assisted internal analysis document.