🧠 All Things AI
Advanced

Encryption for AI Systems

AI systems create new encryption challenges compared to traditional software: prompts carry sensitive user context across networks, vector embeddings encode semantic meaning that may be partially reversible, and model weights represent significant intellectual property. Standard encryption practices apply, but require AI-specific application to each data type in the pipeline.

Where Data Moves and What to Encrypt

Data typeIn transitAt rest
User promptsTLS 1.2+ mandatory; mTLS for internal servicesEncrypt conversation logs; consider pseudonymisation before storage
Model responsesTLS 1.2+ for all API callsEncrypt response cache and stored outputs
Vector embeddingsTLS for vector DB API callsVerify vector DB provider encrypts at rest; not default on all providers
Fine-tuning datasetsTLS for upload; verify provider's storage encryptionEncrypt local copies; AES-256 minimum
Model weights (self-hosted)TLS for download from registryEncrypt at rest; restrict read access to inference service only
API keys and secretsNever transmitted in URL parameters or log outputVault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault); never in config files

Encryption in Transit

Requirements

  • TLS 1.2 minimum; TLS 1.3 preferred for all external API calls
  • Certificate validation: never disable certificate checks in production (common in test code that reaches prod)
  • mTLS for internal service-to-service communication in sensitive pipelines
  • HTTP/2 for streaming model responses (reduces connection overhead)

Common mistakes

  • Disabling TLS verification with verify=False or equivalent — common in dev, leaks to prod
  • Logging full API URLs containing API keys as query parameters
  • Using HTTP (not HTTPS) for internal services on the assumption the network is trusted
  • Not pinning TLS certificates for high-sensitivity connections to provider APIs

Encryption at Rest

Key management serviceBest forKey rotation
AWS KMSAWS-native deployments; integrates with S3, RDS, EBS, BedrockAutomatic annual rotation; manual on-demand
Azure Key VaultAzure-native; integrates with Azure AI, Cosmos DB, Blob StorageConfigurable rotation policy; automatic notification on expiry
HashiCorp VaultMulti-cloud; self-hosted; dynamic secrets; fine-grained policiesDynamic secret leases with auto-revocation; configurable TTL
Google Cloud KMSGCP-native; integrates with Vertex AI, BigQuery, Cloud StorageAutomatic rotation; supports CMEK (customer-managed encryption keys)

AES-256 is the standard for data at rest. Use your cloud provider's managed encryption (server-side encryption) by default, then add customer-managed keys (CMEK/CMK) for regulated data where you need control over key lifecycle.

What Model Providers See

Your prompts are visible to the provider

When you call OpenAI, Anthropic, or Google APIs, your prompts and responses transit and are temporarily processed on their infrastructure. TLS protects them in transit, but the provider's servers decrypt them for inference. This means:

  • Never include secrets, credentials, or encryption keys in prompts
  • Never include data your contracts prohibit sharing with third parties
  • The Data Processing Agreement governs what the provider can do with prompt data
  • Enterprise API tiers (Azure OpenAI, AWS Bedrock, GCP Vertex AI) offer stronger data isolation and DPA terms than direct consumer APIs
  • Self-hosted open-weight models (Ollama, vLLM) eliminate provider visibility entirely

Vector Database Encryption

Vector embeddings encode semantic meaning. While reversing an embedding to exact source text is difficult, research shows partial reconstruction is possible, making unencrypted embeddings a privacy risk for sensitive documents.

Encrypted at rest by default

  • Pinecone (managed) — AES-256, SOC 2
  • Weaviate Cloud — encrypted at rest
  • Qdrant Cloud — encrypted at rest
  • pgvector on RDS/Aurora — inherits database encryption

Self-hosted — you manage encryption

  • Chroma — no built-in encryption; encrypt the underlying storage volume
  • Weaviate self-hosted — encrypt the storage backend separately
  • Qdrant self-hosted — use encrypted filesystem or block storage

Checklist: Do You Understand This?

  • What is the single most common encryption mistake in AI development code that reaches production?
  • Why should API keys never appear in system prompts — even if the system prompt is encrypted in transit?
  • What is the difference between server-side encryption and customer-managed encryption keys (CMEK)?
  • Under what circumstances would you choose a self-hosted LLM specifically to eliminate provider visibility of prompt data?
  • What encryption approach is correct for a self-hosted Qdrant vector database containing customer support transcripts?
  • What two things does a Data Processing Agreement govern for AI API calls — and which enterprise API tier offers stronger DPA terms?