Bedrock Overview
Amazon Bedrock is a fully managed AWS service that lets you access foundation models from multiple providers through a single API — no GPU clusters to provision, no model weights to download, no inference servers to manage. You call an API, AWS runs the model, you pay per token.
What Bedrock Is
Bedrock is a model-as-a-service layer inside AWS. It sits between your application code and the underlying model infrastructure. You never touch the hardware — AWS handles everything from GPU provisioning to model serving to autoscaling.
Bedrock is not just a model router. It bundles several managed capabilities on top of raw inference: Agents (multi-step reasoning), Knowledge Bases (managed RAG), Guardrails (content safety), Batch Inference (async bulk processing), Model Evaluation, and Fine-tuning. All integrated, all within the AWS security model.
What you get
- • Access to 50+ models from Anthropic, Amazon, Meta, Mistral, Cohere, AI21, Stability AI
- • IAM-based access control — same roles and policies as the rest of AWS
- • VPC endpoints — traffic never leaves AWS network
- • CloudWatch metrics, CloudTrail audit logs out of the box
- • No data used for model training (contractual guarantee)
What you don't get
- • Model customisation below fine-tuning (no RLHF, no pre-training)
- • Guaranteed low latency (on-demand inference shares capacity)
- • Access to every model version the day it launches
- • Direct access to model weights or internal activations
Region Availability
Bedrock is available in most major AWS regions: US East (N. Virginia), US West (Oregon), EU West (Ireland), EU Central (Frankfurt), AP Southeast (Singapore, Sydney), AP Northeast (Tokyo), and more. Model availability varies by region — not every model is available in every region. Claude models on Bedrock have the widest regional coverage among third-party providers.
Cross-region inference (GA 2024) lets you route requests to the nearest region with available capacity, reducing latency and improving throughput for high-volume workloads. You configure a cross-region inference profile instead of a model ARN.
Pricing Models
Bedrock has two pricing modes:
On-Demand
Pay per 1,000 input + output tokens. No commitment.
- • Best for: variable traffic, prototyping, low-volume production
- • Shared capacity — latency may spike at peak AWS load
- • No minimum spend
Provisioned Throughput
Reserve model units (MUs) for dedicated capacity. Hourly billing.
- • Best for: consistent high-volume production workloads
- • Guaranteed tokens-per-minute throughput
- • 1-month or 6-month commitment for best rates
When to Choose Bedrock
Choose Bedrock when your infrastructure is already in AWS and you need enterprise-grade security without managing model infrastructure yourself. The IAM integration, VPC support, and CloudTrail logging are difficult to replicate with direct API calls.
Bedrock makes less sense if you need bleeding-edge model versions the day they launch (direct Anthropic API gets new features first), if you're doing cost-sensitive high-volume inference (direct provider APIs are often cheaper), or if you need capabilities Bedrock doesn't expose yet (extended thinking token budgets, streaming tool use, etc.).
Bedrock vs Direct API — Quick Rule
If your app is in AWS and security / compliance matters → Bedrock. If you need maximum feature velocity or the lowest cost per token → direct provider API. Many teams use both: Bedrock for production enterprise workloads, direct API for prototyping.
Checklist: Do You Understand This?
- Can you explain what "fully managed" means in the context of Bedrock?
- What are the two Bedrock pricing modes and when would you use each?
- What security capabilities does Bedrock provide that direct API calls don't?
- Can you name three managed capabilities Bedrock bundles beyond raw inference?