Vendor Risk Management
An enterprise AI programme typically depends on 8-15 vendors: a frontier model provider, a managed AI platform, a vector database, an observability tool, guardrails, and several others. Each vendor is a dependency risk — a security incident, pricing change, model deprecation, or acquisition at any one of them can affect your production AI systems. Systematic vendor risk management is not bureaucracy; it is the difference between a managed disruption and a production emergency.
AI Vendor Categories and Risk Levels
| Category | Examples | Risk level | Key risk |
|---|---|---|---|
| Frontier model API | OpenAI, Anthropic, Google | Critical | All AI capability depends on availability and DPA terms |
| Managed AI platform | AWS Bedrock, Azure AI Foundry, GCP Vertex | Critical | Data processing; multi-model access; vendor lock-in risk |
| Open-weight model | Meta LLaMA, Mistral, Qwen | High | No vendor SLA; licence terms; you own security and compliance |
| Vector database | Pinecone, Weaviate, Qdrant | High | Data residency; retention; availability SLA |
| Observability / tracing | Langfuse, LangSmith, Datadog LLM | Medium | Prompt content in logs; data processing terms |
| Guardrails / safety | Lakera Guard, Azure AI Content Safety | Medium | Latency dependency; accuracy SLA |
Due Diligence Checklist
Complete this checklist before onboarding any vendor that processes your data or contributes to your production AI stack.
Security certifications
- SOC 2 Type II (preferred over Type I — Type II tests over time)
- ISO 27001 for international requirements
- HIPAA BAA if processing PHI
- FedRAMP if US government data
- Most recent pen test report (less than 12 months)
Data handling
- Data Processing Agreement (DPA) signed before any data is shared
- Subprocessor list reviewed and approved
- Data residency options meet your regulatory requirements
- Training on customer data: opt-out confirmed in writing
- Incident notification SLA (72 hours for GDPR)
- Data deletion on contract termination: confirmed timeline
Data Processing Agreement Key Clauses
| Clause | What to look for |
|---|---|
| Training data use | Explicit opt-out from using your data to train or improve models; confirm this is the enterprise tier default |
| Data retention | How long prompts and responses are retained; confirm it matches your data minimisation requirements |
| Deletion on request | Timeline for deleting your data on request; should align with GDPR erasure obligations (typically 30 days) |
| Breach notification | 72-hour notification SLA for GDPR; confirm the notification goes to your legal/DPO, not just a generic email |
| Subprocessors | List of subprocessors must be available; you must be notified before new subprocessors are added |
Vendor Concentration Risk
Over-reliance on a single provider is a reliability risk
Relying on a single frontier model provider for all AI capability means that provider's outages, price changes, model deprecations, or DPA changes directly impact your production systems. Mitigations:
- Provider abstraction layer (LiteLLM, AWS Bedrock unified endpoint) — swap providers without application code changes
- Multi-provider failover for critical use cases
- Open-weight model as fallback for non-sensitive use cases
- Document the migration path to a different primary provider before you need it
Ongoing Vendor Monitoring
- Subscribe to vendor security bulletins and status pages
- Track DPA change notifications — providers update DPAs and notify by email; document reviews and approvals
- Monitor model deprecation announcements — providers typically give 3-6 months notice; act on it
- Track pricing changes — AI pricing moves rapidly; cost models built on today's prices need quarterly review
- Annual vendor security reassessment — request updated SOC 2 reports and confirm DPA still matches your data processing
Checklist: Do You Understand This?
- What makes a frontier model API a "Critical" risk category vendor — and what does that imply for due diligence?
- What is the difference between SOC 2 Type I and Type II — and which should you require?
- What five clauses in a Data Processing Agreement are most important for an AI vendor relationship?
- What is vendor concentration risk — and what architectural mitigations reduce it?
- Why do you need to monitor DPA change notifications ongoing, not just at onboarding?
- What is the risk of using an open-weight model (like LLaMA) compared to a managed frontier model API?