Codex Agent
OpenAI Codex is an agentic software engineering product — not to be confused with the original 2021 Codex text-completion model that powered GitHub Copilot's first generation. The current Codex is a full software engineering agent built on GPT-5-Codex, capable of taking a task description and autonomously building, testing, debugging, and refactoring across a real codebase.
What Codex Is
Codex is OpenAI's answer to the question: what happens when you give a highly capable language model access to a real development environment and ask it to get work done? Rather than just answering coding questions conversationally, Codex is a delegate — you hand it a task, it executes that task across your codebase, writes and runs tests, reads error output, adjusts its approach, and reports back when done (or when it needs clarification).
The underlying model is GPT-5-Codex — a variant of GPT-5 specifically optimised for software engineering tasks. It is tuned for code comprehension across entire repositories, test writing, multi-file refactoring, and interpreting compiler and test output.
Capabilities
What Codex Can Do
- Build full projects from a written specification
- Add features to an existing codebase
- Write, run, and fix failing unit tests
- Debug issues by reading stack traces and runtime errors
- Perform large-scale refactors across many files simultaneously
- Conduct code reviews and produce structured feedback
- Handle long sessions without losing context
- Reason across web sources, cloud environments, and IDE context
Task Execution Model
You give Codex a task — written in natural language — and it executes autonomously. You can interrupt mid-task and redirect it if you want to change the approach. It maintains context over long sessions and can handle tasks that span hours of real development work. It is designed to be used like a junior engineer you can delegate to: you describe what you want, it does the work, you review the output.
Codex vs ChatGPT Coding
The distinction is important for understanding when to use each:
| Dimension | ChatGPT (coding questions) | Codex Agent |
|---|---|---|
| Interaction model | Conversational — ask, receive answer | Delegation — assign task, agent executes |
| Code execution | Sandbox only (Code Interpreter) | Real codebase, real environment |
| Scope | Single function, snippet, explanation | Entire feature, refactor, project build |
| Test integration | Writes tests, does not run them | Writes and runs tests, fixes failures |
| Session length | Short turn-by-turn | Long autonomous sessions |
Performance Benchmarks
As of early 2026, GPT-5.3-Codex leads Terminal-Bench 2.0 at 77.3%, compared to Claude Code's 65.4%. On SWE-Bench Pro (real-world GitHub issue resolution), it achieves approximately 56.8%. These are among the highest scores recorded on agentic coding benchmarks at this point in time.
Benchmark scores should be treated as directional indicators rather than guarantees — performance varies significantly by codebase, language, and task type.
Access and Availability
Codex is included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans — there is no separate pricing or add-on required. It is accessible from:
- chatgpt.com: Via the web interface with a dedicated Codex tab
- VS Code extension: Integrated into the editor for in-IDE task delegation
- Codex CLI: Terminal-based access (see the Codex CLI page for details)
Checklist
- What is the underlying model that powers Codex Agent, and what is it optimised for?
- How does the interaction model of Codex differ from using ChatGPT for coding help?
- What can Codex do with tests that ChatGPT coding mode cannot?
- On which ChatGPT plans is Codex included?
- What does Codex's Terminal-Bench 2.0 score indicate relative to competitors?