Responses API
The Responses API, launched in 2025, is OpenAI's recommended API surface for building agentic and multi-turn applications. It replaces the Assistants API with a dramatically simpler object model while adding first-class support for built-in tools, structured outputs, and MCP server connections. The Assistants API is being shut down on August 26, 2026 — migration is mandatory for any existing Assistants-based applications.
Object Model
The Assistants API required managing five distinct object types: Assistants, Threads, Messages, Runs, and Run Steps. Tracking state across these objects added significant complexity to application code. The Responses API replaces this with a flat, intuitive model:
Assistants API (deprecated)
- Assistant object (configuration)
- Thread object (conversation container)
- Message objects (individual turns)
- Run object (execution instance)
- Run Step objects (tool call tracking)
- Complex polling loop required
Responses API (current)
- Single Response object
- Items array (messages + tool calls + tool outputs)
- Simple create call with optional previous_response_id
- Streaming supported natively
- No polling required
Basic Usage
Creating a response in Python with the Responses API:
from openai import OpenAI
client = OpenAI()
response = client.responses.create(
model="gpt-5",
input="Summarise the key trends in large language models in 2025."
)
print(response.output_text)For multi-turn conversations, pass the previous response ID to maintain context:
follow_up = client.responses.create(
model="gpt-5",
input="What are the implications for enterprise adoption?",
previous_response_id=response.id
)Built-in Tools
The Responses API ships with four built-in tools that can be enabled without any external integration:
- web_search: Live web search integrated directly into the model's reasoning
- file_search: Retrieval from OpenAI-hosted vector stores (see Storage page)
- code_interpreter: Python execution in a sandbox for calculations and data manipulation
- computer_use: Computer vision and interaction for GUI automation tasks
Enable a tool by including it in the tools array of the request. The model decides when to invoke tools based on the query and available tools — you do not need to write routing logic.
MCP Server Support
From February 2026, the Responses API supports connecting to MCP (Model Context Protocol) servers. This means you can connect your application to any MCP-compatible server — databases, internal APIs, file systems, SaaS tools — and the model can call tools exposed by those servers during inference. MCP dramatically expands what the model can reach without custom function calling code for every integration.
Pricing Notes
Model pricing applies at standard rates. Two additional charges apply when using hosted tools:
- File search calls: $2.50 per 1,000 calls
- Vector store storage: $0.10/GB/day (first 1 GB/day free)
Web search and code interpreter do not carry per-call charges beyond the model token costs.
Structured Outputs
The Responses API supports structured outputs — constrained generation that guarantees the model returns a valid JSON object matching a schema you define. This eliminates the need for fragile output parsing:
response = client.responses.create(
model="gpt-5",
input="Extract the company name and founding year from: ...",
response_format={
"type": "json_schema",
"json_schema": {
"name": "company_info",
"schema": {
"type": "object",
"properties": {
"company_name": {"type": "string"},
"founding_year": {"type": "integer"}
},
"required": ["company_name", "founding_year"]
}
}
}
)Migrating from Assistants API
The Assistants API shuts down on August 26, 2026. Migration steps:
- Replace Assistant objects with
systeminput in the Responses API - Replace Thread + Run management with
previous_response_idchaining - Replace tool polling loops with streaming or synchronous
responses.createcalls - Re-register vector stores — existing Assistants vector stores need migration
Full migration guide: platform.openai.com/docs/guides/migrate-to-responses
Checklist
- What is the key simplification the Responses API makes over the Assistants API object model?
- How do you maintain conversation context across multiple Responses API calls?
- What are the four built-in tools the Responses API provides?
- What does MCP server support in the Responses API enable?
- When does the Assistants API shut down, and where is the migration guide?