Advanced

Encryption for AI Systems

AI systems create new encryption challenges compared to traditional software: prompts carry sensitive user context across networks, vector embeddings encode semantic meaning that may be partially reversible, and model weights represent significant intellectual property. Standard encryption practices apply, but require AI-specific application to each data type in the pipeline.

Where Data Moves and What to Encrypt

Data type	In transit	At rest
User prompts	TLS 1.2+ mandatory; mTLS for internal services	Encrypt conversation logs; consider pseudonymisation before storage
Model responses	TLS 1.2+ for all API calls	Encrypt response cache and stored outputs
Vector embeddings	TLS for vector DB API calls	Verify vector DB provider encrypts at rest; not default on all providers
Fine-tuning datasets	TLS for upload; verify provider's storage encryption	Encrypt local copies; AES-256 minimum
Model weights (self-hosted)	TLS for download from registry	Encrypt at rest; restrict read access to inference service only
API keys and secrets	Never transmitted in URL parameters or log output	Vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault); never in config files

Encryption in Transit

Requirements

TLS 1.2 minimum; TLS 1.3 preferred for all external API calls
Certificate validation: never disable certificate checks in production (common in test code that reaches prod)
mTLS for internal service-to-service communication in sensitive pipelines
HTTP/2 for streaming model responses (reduces connection overhead)

Common mistakes

Disabling TLS verification with verify=False or equivalent — common in dev, leaks to prod
Logging full API URLs containing API keys as query parameters
Using HTTP (not HTTPS) for internal services on the assumption the network is trusted
Not pinning TLS certificates for high-sensitivity connections to provider APIs

Encryption at Rest

Key management service	Best for	Key rotation
AWS KMS	AWS-native deployments; integrates with S3, RDS, EBS, Bedrock	Automatic annual rotation; manual on-demand
Azure Key Vault	Azure-native; integrates with Azure AI, Cosmos DB, Blob Storage	Configurable rotation policy; automatic notification on expiry
HashiCorp Vault	Multi-cloud; self-hosted; dynamic secrets; fine-grained policies	Dynamic secret leases with auto-revocation; configurable TTL
Google Cloud KMS	GCP-native; integrates with Vertex AI, BigQuery, Cloud Storage	Automatic rotation; supports CMEK (customer-managed encryption keys)

AES-256 is the standard for data at rest. Use your cloud provider's managed encryption (server-side encryption) by default, then add customer-managed keys (CMEK/CMK) for regulated data where you need control over key lifecycle.

What Model Providers See

Your prompts are visible to the provider

When you call OpenAI, Anthropic, or Google APIs, your prompts and responses transit and are temporarily processed on their infrastructure. TLS protects them in transit, but the provider's servers decrypt them for inference. This means:

Never include secrets, credentials, or encryption keys in prompts
Never include data your contracts prohibit sharing with third parties
The Data Processing Agreement governs what the provider can do with prompt data
Enterprise API tiers (Azure OpenAI, AWS Bedrock, GCP Vertex AI) offer stronger data isolation and DPA terms than direct consumer APIs
Self-hosted open-weight models (Ollama, vLLM) eliminate provider visibility entirely

Vector Database Encryption

Vector embeddings encode semantic meaning. While reversing an embedding to exact source text is difficult, research shows partial reconstruction is possible, making unencrypted embeddings a privacy risk for sensitive documents.

Encrypted at rest by default

Pinecone (managed) — AES-256, SOC 2
Weaviate Cloud — encrypted at rest
Qdrant Cloud — encrypted at rest
pgvector on RDS/Aurora — inherits database encryption

Self-hosted — you manage encryption

Chroma — no built-in encryption; encrypt the underlying storage volume
Weaviate self-hosted — encrypt the storage backend separately
Qdrant self-hosted — use encrypted filesystem or block storage

Checklist: Do You Understand This?

What is the single most common encryption mistake in AI development code that reaches production?
Why should API keys never appear in system prompts — even if the system prompt is encrypted in transit?
What is the difference between server-side encryption and customer-managed encryption keys (CMEK)?
Under what circumstances would you choose a self-hosted LLM specifically to eliminate provider visibility of prompt data?
What encryption approach is correct for a self-hosted Qdrant vector database containing customer support transcripts?
What two things does a Data Processing Agreement govern for AI API calls — and which enterprise API tier offers stronger DPA terms?