🧠 All Things AI
Advanced

Batch API

The OpenAI Batch API is an asynchronous bulk inference endpoint designed for workloads where you have many requests to process and do not need results in real time. In exchange for accepting a 24-hour turnaround window, you get 50% off standard model pricing — a significant cost reduction for large-scale data processing pipelines.

What the Batch API Is

Rather than sending requests one by one to the synchronous API, you prepare a JSONL file containing all your requests, upload it, create a batch job, and then retrieve the results when the job completes. OpenAI processes your requests using available compute capacity without the latency guarantees of the real-time endpoints — hence the lower price.

Critically, batch jobs run against a separate quota from your real-time API limits. Running a batch of 100,000 requests does not consume your synchronous RPM or TPM limits — your real-time applications continue unaffected while the batch runs in the background.

Pricing

All supported models are priced at 50% of their standard synchronous rate via the Batch API. Examples:

ModelStandard InputBatch InputStandard OutputBatch Output
GPT-5$1.25/1M tokens$0.625/1M tokens$10/1M tokens$5.00/1M tokens
GPT-4o$2.50/1M tokens$1.25/1M tokens$10/1M tokens$5.00/1M tokens
o4-mini$1.10/1M tokens$0.55/1M tokens$4.40/1M tokens$2.20/1M tokens

Supported Models

The Batch API supports the following model families: GPT-4o, GPT-4.1, GPT-5, o3, o4-mini, and embedding models (text-embedding-3-small, text-embedding-3-large). Check platform.openai.com/docs/guides/batch for the current complete list, as new models are added regularly.

How It Works (4 Steps)

Step 1 — Prepare and Upload Your JSONL File

Each line of the JSONL file is one request. Every request must include a custom ID for matching results:

{"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": [{"role": "user", "content": "Classify the sentiment: 'This product is amazing!'"}]}}
{"custom_id": "req-002", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "gpt-5", "messages": [{"role": "user", "content": "Classify the sentiment: 'Terrible experience, would not recommend.'"}]}}
# Upload the file
with open("requests.jsonl", "rb") as f:
    batch_file = client.files.create(file=f, purpose="batch")

Step 2 — Create the Batch Job

batch = client.batches.create(
    input_file_id=batch_file.id,
    endpoint="/v1/chat/completions",
    completion_window="24h"
)

Step 3 — Poll for Completion

import time

while True:
    status = client.batches.retrieve(batch.id)
    if status.status in ["completed", "failed", "cancelled"]:
        break
    time.sleep(60)  # Check every minute

Step 4 — Download Results

if status.status == "completed":
    result_file = client.files.content(status.output_file_id)
    # result_file.text contains the JSONL results
    for line in result_file.text.strip().split("\n"):
        result = json.loads(line)
        print(result["custom_id"], result["response"]["body"]["choices"][0]["message"]["content"])

Use Cases

Ideal Batch API Use Cases

  • Bulk sentiment classification of customer reviews
  • Nightly report generation from aggregated data
  • Content generation pipelines (product descriptions, summaries)
  • Offline evaluation runs for model comparison
  • Large-scale data extraction and transformation
  • Embedding generation for an entire document corpus

Not Suitable For

  • Real-time user-facing interactions
  • Tasks where results are needed immediately
  • Streaming responses
  • Low-latency conversational applications

Checklist

  • What is the pricing discount for Batch API versus standard synchronous API calls?
  • Why does running a large batch job not impact your real-time API rate limits?
  • What is the maximum turnaround window for a batch job?
  • What file format do you use to submit batch requests, and what must each request include?
  • Name three use cases that are well-suited for the Batch API.