Intermediate

Types of AI Bias

Bias in AI systems is not a single problem — it is a family of distinct problems that can arise at different stages of the AI lifecycle. Understanding where each type of bias originates helps teams intervene at the right point. A model that performs well on aggregate metrics can still contain significant bias that only surfaces when outputs are disaggregated by subgroup, context, or deployment scenario.

Data Bias

Data bias originates in the training dataset before any model is trained. It is the most common root cause of bias in production AI systems.

Historical bias

Training data reflects past human decisions that were themselves biased. A hiring model trained on past promotion decisions will encode the historical underrepresentation of certain groups in leadership roles — regardless of how fair the algorithm is.

Representation bias

Certain groups appear at very low frequency in training data, so the model learns weak representations for them. A medical imaging model trained predominantly on lighter skin tones performs significantly worse on darker skin tones.

Measurement bias

The proxy used to measure the target variable is a better measure for some groups than others. Using arrest records as a proxy for criminal recidivism is unreliable because arrest rates are themselves influenced by policing patterns that differ by demographic.

Aggregation bias

A single model is trained on data from multiple distinct groups where the relationship between features and outcomes differs across groups. A diabetes prediction model trained on pooled global data may not generalise well to specific ethnic populations with different biomarker patterns.

Model Bias

Model bias arises during the training process itself — from the choice of objective function, architecture, or regularisation methods.

Objective misalignment: Optimising for aggregate accuracy incentivises the model to perform well on the majority group and ignore minority groups. A model that achieves 95% accuracy on a 90/10 class split could have 0% accuracy on the minority class.
Inductive bias: Every model architecture encodes assumptions about what patterns are learnable. These can interact with group-correlated features in ways that amplify disparities.
Spurious correlations: The model learns associations between sensitive attributes (or their proxies) and target labels — even when the sensitive attribute is not an input feature — because the training data contains such correlations.
Label noise disparity: Annotation errors or disagreements are not evenly distributed across groups. If annotators systematically disagree on examples involving one group, the model receives noisier signal for that group.

Evaluation Bias

Evaluation bias occurs when the benchmarks or test sets used to assess model quality do not represent the deployment population, leading teams to falsely believe bias has been mitigated.

Benchmark mismatch

Evaluation datasets were collected in a different context than the deployment setting. An NLP benchmark using news articles may not reflect the language patterns of the communities who will use the deployed system.

Aggregate masking

Reporting only overall accuracy hides per-subgroup performance disparities. The model "passes" the evaluation threshold on aggregates while performing poorly on underrepresented subgroups.

Deployment Bias

Deployment bias occurs when a model is applied in contexts it was not designed for, or when the real-world usage pattern diverges from the validated use case.

Scenario	Bias mechanism
Model used in a new geography	Training distribution does not match local demographic, language, or social norms
Feature availability differs by group	Missing data for some groups causes the model to fall back on biased proxies or defaults
Feedback loop	Model outputs influence future training data — decisions made by a biased model become "ground truth" for the next model generation
Scope creep	Model validated for low-stakes recommendations reused for consequential decisions (credit, employment) without re-validation

Human and Cognitive Bias in the AI Pipeline

AI systems are built by humans, and human cognitive biases shape the decisions made throughout the development lifecycle — what data to collect, how to label it, what objective to optimise, and which fairness definition to adopt. Key examples:

Confirmation bias: Teams select evaluation metrics that confirm the model is working, rather than metrics designed to surface failure modes.
Automation bias: Once deployed, human reviewers defer to model outputs rather than exercising independent judgment — particularly for edge cases where the model is most likely to fail.
In-group bias: Homogeneous development teams may not anticipate or test for harms that disproportionately affect groups not represented on the team.
Framing effects: How risk and accuracy are presented to stakeholders influences acceptance decisions — a 98% accuracy framing conceals that 2% of errors might fall disproportionately on protected groups.

Bias in Generative AI

Large language models and image generation models introduce additional bias patterns:

Stereotype perpetuation

LLMs trained on internet-scale data absorb and reproduce social stereotypes — associating professions, traits, and capabilities with particular demographic groups in generated text.

Representation disparity in images

Image generation models over-represent certain demographic groups for neutral prompts like "doctor" or "engineer", reflecting skews in the training data.

RLHF-introduced values bias

Reinforcement Learning from Human Feedback reflects the preferences of rater pools, which are typically non-representative of global diversity in values and norms.

Language bias

Performance degrades significantly for non-English languages and dialects — compounding into worse outputs for speakers of under-resourced languages even when they interact in those languages.

Checklist: Do You Understand This?

Name the four main types of data bias and give one example of each.
Why does optimising for aggregate accuracy fail to address bias against minority groups?
What is a feedback loop bias and why is it particularly dangerous in production systems?
How does evaluation bias allow a biased model to pass quality gates undetected?
What bias type is introduced when human annotators do not reflect the diversity of end users?
Why do LLMs perpetuate stereotypes even when the training pipeline has no explicit bias mitigation failures?