🧠 All Things AI
Advanced

Vendor Risk Management

An enterprise AI programme typically depends on 8-15 vendors: a frontier model provider, a managed AI platform, a vector database, an observability tool, guardrails, and several others. Each vendor is a dependency risk — a security incident, pricing change, model deprecation, or acquisition at any one of them can affect your production AI systems. Systematic vendor risk management is not bureaucracy; it is the difference between a managed disruption and a production emergency.

AI Vendor Categories and Risk Levels

CategoryExamplesRisk levelKey risk
Frontier model APIOpenAI, Anthropic, GoogleCriticalAll AI capability depends on availability and DPA terms
Managed AI platformAWS Bedrock, Azure AI Foundry, GCP VertexCriticalData processing; multi-model access; vendor lock-in risk
Open-weight modelMeta LLaMA, Mistral, QwenHighNo vendor SLA; licence terms; you own security and compliance
Vector databasePinecone, Weaviate, QdrantHighData residency; retention; availability SLA
Observability / tracingLangfuse, LangSmith, Datadog LLMMediumPrompt content in logs; data processing terms
Guardrails / safetyLakera Guard, Azure AI Content SafetyMediumLatency dependency; accuracy SLA

Due Diligence Checklist

Complete this checklist before onboarding any vendor that processes your data or contributes to your production AI stack.

Security certifications

  • SOC 2 Type II (preferred over Type I — Type II tests over time)
  • ISO 27001 for international requirements
  • HIPAA BAA if processing PHI
  • FedRAMP if US government data
  • Most recent pen test report (less than 12 months)

Data handling

  • Data Processing Agreement (DPA) signed before any data is shared
  • Subprocessor list reviewed and approved
  • Data residency options meet your regulatory requirements
  • Training on customer data: opt-out confirmed in writing
  • Incident notification SLA (72 hours for GDPR)
  • Data deletion on contract termination: confirmed timeline

Data Processing Agreement Key Clauses

ClauseWhat to look for
Training data useExplicit opt-out from using your data to train or improve models; confirm this is the enterprise tier default
Data retentionHow long prompts and responses are retained; confirm it matches your data minimisation requirements
Deletion on requestTimeline for deleting your data on request; should align with GDPR erasure obligations (typically 30 days)
Breach notification72-hour notification SLA for GDPR; confirm the notification goes to your legal/DPO, not just a generic email
SubprocessorsList of subprocessors must be available; you must be notified before new subprocessors are added

Vendor Concentration Risk

Over-reliance on a single provider is a reliability risk

Relying on a single frontier model provider for all AI capability means that provider's outages, price changes, model deprecations, or DPA changes directly impact your production systems. Mitigations:

  • Provider abstraction layer (LiteLLM, AWS Bedrock unified endpoint) — swap providers without application code changes
  • Multi-provider failover for critical use cases
  • Open-weight model as fallback for non-sensitive use cases
  • Document the migration path to a different primary provider before you need it

Ongoing Vendor Monitoring

  • Subscribe to vendor security bulletins and status pages
  • Track DPA change notifications — providers update DPAs and notify by email; document reviews and approvals
  • Monitor model deprecation announcements — providers typically give 3-6 months notice; act on it
  • Track pricing changes — AI pricing moves rapidly; cost models built on today's prices need quarterly review
  • Annual vendor security reassessment — request updated SOC 2 reports and confirm DPA still matches your data processing

Checklist: Do You Understand This?

  • What makes a frontier model API a "Critical" risk category vendor — and what does that imply for due diligence?
  • What is the difference between SOC 2 Type I and Type II — and which should you require?
  • What five clauses in a Data Processing Agreement are most important for an AI vendor relationship?
  • What is vendor concentration risk — and what architectural mitigations reduce it?
  • Why do you need to monitor DPA change notifications ongoing, not just at onboarding?
  • What is the risk of using an open-weight model (like LLaMA) compared to a managed frontier model API?