🧠 All Things AI
Intermediate

Batch Inference

Batch inference lets you process large volumes of requests asynchronously. Instead of sending requests one by one and waiting for each response, you upload a file of requests to S3, submit a batch job, and Bedrock processes them in the background. Results land in another S3 path when the job completes.

When to Use It

Good for batch

  • • Nightly document classification
  • • Bulk embedding generation
  • • Dataset annotation or enrichment
  • • Evaluation runs across many test cases
  • • Report generation on fixed schedules

Not for batch

  • • Real-time user-facing requests
  • • Tasks where response time matters
  • • Interactive chat or streaming
  • • Anything that needs a result in under a minute

Pricing Benefit

Batch inference is priced at 50% of on-demand inference cost. For high-volume workloads where latency is not a concern, this halves your inference costs compared to real-time calls. The tradeoff is completion time — jobs may take minutes to hours depending on queue depth and job size. Maximum job duration is 24 hours.

Input Format — JSONL

Input is a JSONL file (one JSON object per line) stored in S3. Each line has a recordId(your identifier for matching results) and a modelInput matching the model's InvokeModel body format:

// input.jsonl — one request per line
{"recordId": "doc-001", "modelInput": {"anthropic_version": "bedrock-2023-05-31", "max_tokens": 256, "messages": [{"role": "user", "content": "Classify sentiment: Great product!"}]}}
{"recordId": "doc-002", "modelInput": {"anthropic_version": "bedrock-2023-05-31", "max_tokens": 256, "messages": [{"role": "user", "content": "Classify sentiment: Terrible experience."}]}}
{"recordId": "doc-003", "modelInput": {"anthropic_version": "bedrock-2023-05-31", "max_tokens": 256, "messages": [{"role": "user", "content": "Classify sentiment: It was okay."}]}}

Output is also JSONL, with matching recordId values so you can join results back to inputs. Failed records include error details — the job doesn't fail completely if individual records error.

Workflow

1
Upload JSONLPut your input.jsonl in an S3 bucket Bedrock can read
2
Create JobCall CreateModelInvocationJob with model ID, input S3 URI, output S3 URI
3
Poll StatusGetModelInvocationJob returns InProgress / Completed / Failed
4
Download ResultsOutput JSONL appears at the output S3 path when Completed
import boto3

bedrock = boto3.client("bedrock", region_name="us-east-1")

# Create the batch job
job = bedrock.create_model_invocation_job(
    jobName="sentiment-batch-001",
    modelId="anthropic.claude-haiku-3-5",
    inputDataConfig={
        "s3InputDataConfig": {
            "s3Uri": "s3://my-bucket/input.jsonl",
            "s3InputFormat": "JSONL",
        }
    },
    outputDataConfig={
        "s3OutputDataConfig": {"s3Uri": "s3://my-bucket/output/"}
    },
    roleArn="arn:aws:iam::ACCOUNT:role/BedrockBatchRole",
)

print(job["jobArn"])

Limits

Maximum records per job: 50,000. Maximum file size: 512 MB. Maximum job duration: 24 hours. Not all models support batch inference — check the Bedrock documentation for the current supported model list. Claude Haiku models are the most cost-effective choice for large batch jobs.

Checklist: Do You Understand This?

  • What is the pricing advantage of batch inference over on-demand?
  • What format does the input file use, and what does each line contain?
  • Can you describe the four steps to run a batch job?
  • What types of workloads are well suited to batch inference?