🧠 All Things AI
Advanced

AI Centers of Excellence

An AI Center of Excellence (CoE) centralises expertise that would otherwise be rediscovered in parallel across teams — security reviews, model evaluations, reusable components, and training programmes. Done well, a CoE accelerates adoption while enforcing governance. Done poorly, it becomes a bottleneck that slows everything down while providing the illusion of oversight.

CoE vs Federated vs Hybrid

ModelHow it worksAdvantagesDisadvantages
Centralised CoEAll AI work goes through the CoE; other teams are consumersMaximum consistency; strong governance; expertise concentratedBottleneck; slow to respond to team needs; CoE becomes single point of failure
FederatedEach team builds independently; no central coordinationFast; teams have full autonomy; no bottleneckFragmentation; duplicated effort; inconsistent security and quality; no shared learning
Hybrid (recommended)CoE provides platform, standards, and fast-path for low-risk; teams build on top with self-serviceSpeed for teams + consistency + shared expertise; CoE is an enabler not a gatekeeperRequires investment to build self-service platform; governance boundaries must be clearly defined

CoE Core Responsibilities

Platform and standards

  • Approved model register — which models are approved for which use cases
  • Shared infrastructure: LLM gateway (LiteLLM), observability (Langfuse), evaluation harnesses
  • Reusable component catalog: approved system prompts, RAG configurations, agent templates
  • Security baseline: PII handling standards, prompt injection defences, audit logging schema
  • Cost management: centralised budget visibility, chargeback model, guardrail defaults

Governance and enablement

  • Intake and risk triage for new AI use cases
  • Security review for high-risk use cases before production
  • Training programme for practitioners (prompt engineering, responsible use)
  • Tool evaluation — assess new AI tools and models before teams adopt
  • Community of practice: Slack channel, lunch-and-learns, internal case studies

CoE Team Composition

RoleResponsibilitiesFull-time or embedded
CoE LeadStrategy, stakeholder management, governance process ownership, escalation pathFull-time
AI/ML EngineerPlatform engineering, evaluation harnesses, component catalog, model selection evaluationFull-time (1-2 engineers minimum)
Security representativeAI security standards, threat model reviews, vendor security assessment, DPA reviewEmbedded from security team; 20-30% allocation
Legal/compliance representativeRegulatory requirements (EU AI Act, GDPR), data processing agreements, approved use definitionsEmbedded from legal; 10-20% allocation
Product representativeConnects CoE standards to product team needs; prevents CoE from building an ivory towerRotating; 1 product manager embedded per quarter

Avoiding the CoE Bottleneck

A CoE that reviews everything will eventually approve nothing quickly

The most common CoE failure mode is becoming a required approval gate for every AI task, regardless of risk. This creates a queue, review fatigue sets in, and approvals become rubber stamps — the worst of both worlds (slow AND no real oversight). The solution is risk-tiered self-service: low-risk use cases (internal productivity tools, pre-approved patterns) have a self-service fast path. The CoE only reviews medium-risk and high-risk use cases. Define the fast-path criteria clearly and publish them.

  • Fast path criteria (no CoE review needed): internal tool only, uses approved model tier, no PII, no customer-facing, uses a catalog component without modification
  • CoE review required: customer-facing, processes PII, financial or medical decision support, agentic with irreversible actions, new model or vendor not yet approved
  • Target review SLA: 3 business days for standard reviews; 1 day for fast path registration

Maturity Stages

Stage 1 — Ad hoc

Teams adopt AI independently; no shared standards; duplicated effort; security gaps unknown

Stage 2 — CoE formed

CoE team established; approved model register created; intake process defined; first security standards published

Stage 3 — Platform built

Shared LLM gateway; observability; component catalog; training programme running; teams onboarded

Stage 4 — Federated with guardrails

Teams build independently on the CoE platform; self-service for low-risk; CoE focuses on standards and high-risk reviews; fast path reduces bottleneck

Stage 5 — Optimising

CoE drives cost optimisation, cross-team learning, advanced capability development; governance is embedded in tooling not process

Checklist: Do You Understand This?

  • What is the primary failure mode of a centralised CoE — and what does it look like in practice?
  • What is the hybrid model, and how does it balance governance with speed?
  • Name five core responsibilities of an AI CoE.
  • What criteria define a "fast path" use case that bypasses full CoE review?
  • Why is a rotating product representative on the CoE important — what failure does it prevent?
  • Describe the maturity progression from ad hoc AI adoption to a fully federated model with guardrails.