Encryption for AI Systems
AI systems create new encryption challenges compared to traditional software: prompts carry sensitive user context across networks, vector embeddings encode semantic meaning that may be partially reversible, and model weights represent significant intellectual property. Standard encryption practices apply, but require AI-specific application to each data type in the pipeline.
Where Data Moves and What to Encrypt
| Data type | In transit | At rest |
|---|---|---|
| User prompts | TLS 1.2+ mandatory; mTLS for internal services | Encrypt conversation logs; consider pseudonymisation before storage |
| Model responses | TLS 1.2+ for all API calls | Encrypt response cache and stored outputs |
| Vector embeddings | TLS for vector DB API calls | Verify vector DB provider encrypts at rest; not default on all providers |
| Fine-tuning datasets | TLS for upload; verify provider's storage encryption | Encrypt local copies; AES-256 minimum |
| Model weights (self-hosted) | TLS for download from registry | Encrypt at rest; restrict read access to inference service only |
| API keys and secrets | Never transmitted in URL parameters or log output | Vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault); never in config files |
Encryption in Transit
Requirements
- TLS 1.2 minimum; TLS 1.3 preferred for all external API calls
- Certificate validation: never disable certificate checks in production (common in test code that reaches prod)
- mTLS for internal service-to-service communication in sensitive pipelines
- HTTP/2 for streaming model responses (reduces connection overhead)
Common mistakes
- Disabling TLS verification with
verify=Falseor equivalent — common in dev, leaks to prod - Logging full API URLs containing API keys as query parameters
- Using HTTP (not HTTPS) for internal services on the assumption the network is trusted
- Not pinning TLS certificates for high-sensitivity connections to provider APIs
Encryption at Rest
| Key management service | Best for | Key rotation |
|---|---|---|
| AWS KMS | AWS-native deployments; integrates with S3, RDS, EBS, Bedrock | Automatic annual rotation; manual on-demand |
| Azure Key Vault | Azure-native; integrates with Azure AI, Cosmos DB, Blob Storage | Configurable rotation policy; automatic notification on expiry |
| HashiCorp Vault | Multi-cloud; self-hosted; dynamic secrets; fine-grained policies | Dynamic secret leases with auto-revocation; configurable TTL |
| Google Cloud KMS | GCP-native; integrates with Vertex AI, BigQuery, Cloud Storage | Automatic rotation; supports CMEK (customer-managed encryption keys) |
AES-256 is the standard for data at rest. Use your cloud provider's managed encryption (server-side encryption) by default, then add customer-managed keys (CMEK/CMK) for regulated data where you need control over key lifecycle.
What Model Providers See
Your prompts are visible to the provider
When you call OpenAI, Anthropic, or Google APIs, your prompts and responses transit and are temporarily processed on their infrastructure. TLS protects them in transit, but the provider's servers decrypt them for inference. This means:
- Never include secrets, credentials, or encryption keys in prompts
- Never include data your contracts prohibit sharing with third parties
- The Data Processing Agreement governs what the provider can do with prompt data
- Enterprise API tiers (Azure OpenAI, AWS Bedrock, GCP Vertex AI) offer stronger data isolation and DPA terms than direct consumer APIs
- Self-hosted open-weight models (Ollama, vLLM) eliminate provider visibility entirely
Vector Database Encryption
Vector embeddings encode semantic meaning. While reversing an embedding to exact source text is difficult, research shows partial reconstruction is possible, making unencrypted embeddings a privacy risk for sensitive documents.
Encrypted at rest by default
- Pinecone (managed) — AES-256, SOC 2
- Weaviate Cloud — encrypted at rest
- Qdrant Cloud — encrypted at rest
- pgvector on RDS/Aurora — inherits database encryption
Self-hosted — you manage encryption
- Chroma — no built-in encryption; encrypt the underlying storage volume
- Weaviate self-hosted — encrypt the storage backend separately
- Qdrant self-hosted — use encrypted filesystem or block storage
Checklist: Do You Understand This?
- What is the single most common encryption mistake in AI development code that reaches production?
- Why should API keys never appear in system prompts — even if the system prompt is encrypted in transit?
- What is the difference between server-side encryption and customer-managed encryption keys (CMEK)?
- Under what circumstances would you choose a self-hosted LLM specifically to eliminate provider visibility of prompt data?
- What encryption approach is correct for a self-hosted Qdrant vector database containing customer support transcripts?
- What two things does a Data Processing Agreement govern for AI API calls — and which enterprise API tier offers stronger DPA terms?