Batch API
The OpenAI Batch API is an asynchronous bulk inference endpoint designed for workloads where you have many requests to process and do not need results in real time. In exchange for accepting a 24-hour turnaround window, you get 50% off standard model pricing — a significant cost reduction for large-scale data processing pipelines.
What the Batch API Is
Rather than sending requests one by one to the synchronous API, you prepare a JSONL file containing all your requests, upload it, create a batch job, and then retrieve the results when the job completes. OpenAI processes your requests using available compute capacity without the latency guarantees of the real-time endpoints — hence the lower price.
Critically, batch jobs run against a separate quota from your real-time API limits. Running a batch of 100,000 requests does not consume your synchronous RPM or TPM limits — your real-time applications continue unaffected while the batch runs in the background.
Pricing
All supported models are priced at 50% of their standard synchronous rate via the Batch API. Examples:
| Model | Standard Input | Batch Input | Standard Output | Batch Output |
|---|---|---|---|---|
| GPT-5 | $1.25/1M tokens | $0.625/1M tokens | $10/1M tokens | $5.00/1M tokens |
| GPT-4o | $2.50/1M tokens | $1.25/1M tokens | $10/1M tokens | $5.00/1M tokens |
| o4-mini | $1.10/1M tokens | $0.55/1M tokens | $4.40/1M tokens | $2.20/1M tokens |
Supported Models
The Batch API supports the following model families: GPT-4o, GPT-4.1, GPT-5, o3, o4-mini, and embedding models (text-embedding-3-small, text-embedding-3-large). Check platform.openai.com/docs/guides/batch for the current complete list, as new models are added regularly.
How It Works (4 Steps)
Step 1 — Prepare and Upload Your JSONL File
Each line of the JSONL file is one request. Every request must include a custom ID for matching results:
{"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": [{"role": "user", "content": "Classify the sentiment: 'This product is amazing!'"}]}}
{"custom_id": "req-002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": [{"role": "user", "content": "Classify the sentiment: 'Terrible experience, would not recommend.'"}]}}# Upload the file
with open("requests.jsonl", "rb") as f:
batch_file = client.files.create(file=f, purpose="batch")Step 2 — Create the Batch Job
batch = client.batches.create(
input_file_id=batch_file.id,
endpoint="/v1/chat/completions",
completion_window="24h"
)Step 3 — Poll for Completion
import time
while True:
status = client.batches.retrieve(batch.id)
if status.status in ["completed", "failed", "cancelled"]:
break
time.sleep(60) # Check every minuteStep 4 — Download Results
if status.status == "completed":
result_file = client.files.content(status.output_file_id)
# result_file.text contains the JSONL results
for line in result_file.text.strip().split("\n"):
result = json.loads(line)
print(result["custom_id"], result["response"]["body"]["choices"][0]["message"]["content"])Use Cases
Ideal Batch API Use Cases
- Bulk sentiment classification of customer reviews
- Nightly report generation from aggregated data
- Content generation pipelines (product descriptions, summaries)
- Offline evaluation runs for model comparison
- Large-scale data extraction and transformation
- Embedding generation for an entire document corpus
Not Suitable For
- Real-time user-facing interactions
- Tasks where results are needed immediately
- Streaming responses
- Low-latency conversational applications
Checklist
- What is the pricing discount for Batch API versus standard synchronous API calls?
- Why does running a large batch job not impact your real-time API rate limits?
- What is the maximum turnaround window for a batch job?
- What file format do you use to submit batch requests, and what must each request include?
- Name three use cases that are well-suited for the Batch API.