Safety & Risk
AI systems face threats that traditional software security does not cover — prompt injection, jailbreaks, data exfiltration through model outputs, and policy violations through adversarial inputs. This section covers how to identify these risks systematically, defend against the most common attacks, and enforce behaviour policies at scale.
In This Section
Threat Modeling
Systematically identifying AI-specific threats for your system — attack surfaces, threat actors, and prioritising mitigations by risk level.
Prompt Injection & Exfiltration
How prompt injection attacks work, indirect injection via retrieved content, and the defences that reduce exfiltration risk.
Jailbreak Resistance
Common jailbreak patterns, why system prompt hardening alone is insufficient, and layered defences for production AI systems.
Policy Enforcement
Enforcing output and behaviour policies at scale — guardrail layers, input/output classifiers, and monitoring for policy violations in production.