Batch Inference
Batch inference lets you process large volumes of requests asynchronously. Instead of sending requests one by one and waiting for each response, you upload a file of requests to S3, submit a batch job, and Bedrock processes them in the background. Results land in another S3 path when the job completes.
When to Use It
Good for batch
- • Nightly document classification
- • Bulk embedding generation
- • Dataset annotation or enrichment
- • Evaluation runs across many test cases
- • Report generation on fixed schedules
Not for batch
- • Real-time user-facing requests
- • Tasks where response time matters
- • Interactive chat or streaming
- • Anything that needs a result in under a minute
Pricing Benefit
Batch inference is priced at 50% of on-demand inference cost. For high-volume workloads where latency is not a concern, this halves your inference costs compared to real-time calls. The tradeoff is completion time — jobs may take minutes to hours depending on queue depth and job size. Maximum job duration is 24 hours.
Input Format — JSONL
Input is a JSONL file (one JSON object per line) stored in S3. Each line has a recordId(your identifier for matching results) and a modelInput matching the model's InvokeModel body format:
// input.jsonl — one request per line
{"recordId": "doc-001", "modelInput": {"anthropic_version": "bedrock-2023-05-31", "max_tokens": 256, "messages": [{"role": "user", "content": "Classify sentiment: Great product!"}]}}
{"recordId": "doc-002", "modelInput": {"anthropic_version": "bedrock-2023-05-31", "max_tokens": 256, "messages": [{"role": "user", "content": "Classify sentiment: Terrible experience."}]}}
{"recordId": "doc-003", "modelInput": {"anthropic_version": "bedrock-2023-05-31", "max_tokens": 256, "messages": [{"role": "user", "content": "Classify sentiment: It was okay."}]}}Output is also JSONL, with matching recordId values so you can join results back to inputs. Failed records include error details — the job doesn't fail completely if individual records error.
Workflow
import boto3
bedrock = boto3.client("bedrock", region_name="us-east-1")
# Create the batch job
job = bedrock.create_model_invocation_job(
jobName="sentiment-batch-001",
modelId="anthropic.claude-haiku-3-5",
inputDataConfig={
"s3InputDataConfig": {
"s3Uri": "s3://my-bucket/input.jsonl",
"s3InputFormat": "JSONL",
}
},
outputDataConfig={
"s3OutputDataConfig": {"s3Uri": "s3://my-bucket/output/"}
},
roleArn="arn:aws:iam::ACCOUNT:role/BedrockBatchRole",
)
print(job["jobArn"])Limits
Maximum records per job: 50,000. Maximum file size: 512 MB. Maximum job duration: 24 hours. Not all models support batch inference — check the Bedrock documentation for the current supported model list. Claude Haiku models are the most cost-effective choice for large batch jobs.
Checklist: Do You Understand This?
- What is the pricing advantage of batch inference over on-demand?
- What format does the input file use, and what does each line contain?
- Can you describe the four steps to run a batch job?
- What types of workloads are well suited to batch inference?