Types of AI Bias
Bias in AI systems is not a single problem — it is a family of distinct problems that can arise at different stages of the AI lifecycle. Understanding where each type of bias originates helps teams intervene at the right point. A model that performs well on aggregate metrics can still contain significant bias that only surfaces when outputs are disaggregated by subgroup, context, or deployment scenario.
Data Bias
Data bias originates in the training dataset before any model is trained. It is the most common root cause of bias in production AI systems.
Historical bias
Training data reflects past human decisions that were themselves biased. A hiring model trained on past promotion decisions will encode the historical underrepresentation of certain groups in leadership roles — regardless of how fair the algorithm is.
Representation bias
Certain groups appear at very low frequency in training data, so the model learns weak representations for them. A medical imaging model trained predominantly on lighter skin tones performs significantly worse on darker skin tones.
Measurement bias
The proxy used to measure the target variable is a better measure for some groups than others. Using arrest records as a proxy for criminal recidivism is unreliable because arrest rates are themselves influenced by policing patterns that differ by demographic.
Aggregation bias
A single model is trained on data from multiple distinct groups where the relationship between features and outcomes differs across groups. A diabetes prediction model trained on pooled global data may not generalise well to specific ethnic populations with different biomarker patterns.
Model Bias
Model bias arises during the training process itself — from the choice of objective function, architecture, or regularisation methods.
- Objective misalignment: Optimising for aggregate accuracy incentivises the model to perform well on the majority group and ignore minority groups. A model that achieves 95% accuracy on a 90/10 class split could have 0% accuracy on the minority class.
- Inductive bias: Every model architecture encodes assumptions about what patterns are learnable. These can interact with group-correlated features in ways that amplify disparities.
- Spurious correlations: The model learns associations between sensitive attributes (or their proxies) and target labels — even when the sensitive attribute is not an input feature — because the training data contains such correlations.
- Label noise disparity: Annotation errors or disagreements are not evenly distributed across groups. If annotators systematically disagree on examples involving one group, the model receives noisier signal for that group.
Evaluation Bias
Evaluation bias occurs when the benchmarks or test sets used to assess model quality do not represent the deployment population, leading teams to falsely believe bias has been mitigated.
Benchmark mismatch
Evaluation datasets were collected in a different context than the deployment setting. An NLP benchmark using news articles may not reflect the language patterns of the communities who will use the deployed system.
Aggregate masking
Reporting only overall accuracy hides per-subgroup performance disparities. The model "passes" the evaluation threshold on aggregates while performing poorly on underrepresented subgroups.
Deployment Bias
Deployment bias occurs when a model is applied in contexts it was not designed for, or when the real-world usage pattern diverges from the validated use case.
| Scenario | Bias mechanism |
|---|---|
| Model used in a new geography | Training distribution does not match local demographic, language, or social norms |
| Feature availability differs by group | Missing data for some groups causes the model to fall back on biased proxies or defaults |
| Feedback loop | Model outputs influence future training data — decisions made by a biased model become "ground truth" for the next model generation |
| Scope creep | Model validated for low-stakes recommendations reused for consequential decisions (credit, employment) without re-validation |
Human and Cognitive Bias in the AI Pipeline
AI systems are built by humans, and human cognitive biases shape the decisions made throughout the development lifecycle — what data to collect, how to label it, what objective to optimise, and which fairness definition to adopt. Key examples:
- Confirmation bias: Teams select evaluation metrics that confirm the model is working, rather than metrics designed to surface failure modes.
- Automation bias: Once deployed, human reviewers defer to model outputs rather than exercising independent judgment — particularly for edge cases where the model is most likely to fail.
- In-group bias: Homogeneous development teams may not anticipate or test for harms that disproportionately affect groups not represented on the team.
- Framing effects: How risk and accuracy are presented to stakeholders influences acceptance decisions — a 98% accuracy framing conceals that 2% of errors might fall disproportionately on protected groups.
Bias in Generative AI
Large language models and image generation models introduce additional bias patterns:
Stereotype perpetuation
LLMs trained on internet-scale data absorb and reproduce social stereotypes — associating professions, traits, and capabilities with particular demographic groups in generated text.
Representation disparity in images
Image generation models over-represent certain demographic groups for neutral prompts like "doctor" or "engineer", reflecting skews in the training data.
RLHF-introduced values bias
Reinforcement Learning from Human Feedback reflects the preferences of rater pools, which are typically non-representative of global diversity in values and norms.
Language bias
Performance degrades significantly for non-English languages and dialects — compounding into worse outputs for speakers of under-resourced languages even when they interact in those languages.
Checklist: Do You Understand This?
- Name the four main types of data bias and give one example of each.
- Why does optimising for aggregate accuracy fail to address bias against minority groups?
- What is a feedback loop bias and why is it particularly dangerous in production systems?
- How does evaluation bias allow a biased model to pass quality gates undetected?
- What bias type is introduced when human annotators do not reflect the diversity of end users?
- Why do LLMs perpetuate stereotypes even when the training pipeline has no explicit bias mitigation failures?