🧠 All Things AI
Advanced

Codex Agent

OpenAI Codex is an agentic software engineering product — not to be confused with the original 2021 Codex text-completion model that powered GitHub Copilot's first generation. The current Codex is a full software engineering agent built on GPT-5-Codex, capable of taking a task description and autonomously building, testing, debugging, and refactoring across a real codebase.

What Codex Is

Codex is OpenAI's answer to the question: what happens when you give a highly capable language model access to a real development environment and ask it to get work done? Rather than just answering coding questions conversationally, Codex is a delegate — you hand it a task, it executes that task across your codebase, writes and runs tests, reads error output, adjusts its approach, and reports back when done (or when it needs clarification).

The underlying model is GPT-5-Codex — a variant of GPT-5 specifically optimised for software engineering tasks. It is tuned for code comprehension across entire repositories, test writing, multi-file refactoring, and interpreting compiler and test output.

Capabilities

What Codex Can Do

  • Build full projects from a written specification
  • Add features to an existing codebase
  • Write, run, and fix failing unit tests
  • Debug issues by reading stack traces and runtime errors
  • Perform large-scale refactors across many files simultaneously
  • Conduct code reviews and produce structured feedback
  • Handle long sessions without losing context
  • Reason across web sources, cloud environments, and IDE context

Task Execution Model

You give Codex a task — written in natural language — and it executes autonomously. You can interrupt mid-task and redirect it if you want to change the approach. It maintains context over long sessions and can handle tasks that span hours of real development work. It is designed to be used like a junior engineer you can delegate to: you describe what you want, it does the work, you review the output.

Codex vs ChatGPT Coding

The distinction is important for understanding when to use each:

DimensionChatGPT (coding questions)Codex Agent
Interaction modelConversational — ask, receive answerDelegation — assign task, agent executes
Code executionSandbox only (Code Interpreter)Real codebase, real environment
ScopeSingle function, snippet, explanationEntire feature, refactor, project build
Test integrationWrites tests, does not run themWrites and runs tests, fixes failures
Session lengthShort turn-by-turnLong autonomous sessions

Performance Benchmarks

As of early 2026, GPT-5.3-Codex leads Terminal-Bench 2.0 at 77.3%, compared to Claude Code's 65.4%. On SWE-Bench Pro (real-world GitHub issue resolution), it achieves approximately 56.8%. These are among the highest scores recorded on agentic coding benchmarks at this point in time.

Benchmark scores should be treated as directional indicators rather than guarantees — performance varies significantly by codebase, language, and task type.

Access and Availability

Codex is included with ChatGPT Plus, Pro, Business, Edu, and Enterprise plans — there is no separate pricing or add-on required. It is accessible from:

  • chatgpt.com: Via the web interface with a dedicated Codex tab
  • VS Code extension: Integrated into the editor for in-IDE task delegation
  • Codex CLI: Terminal-based access (see the Codex CLI page for details)

Checklist

  • What is the underlying model that powers Codex Agent, and what is it optimised for?
  • How does the interaction model of Codex differ from using ChatGPT for coding help?
  • What can Codex do with tests that ChatGPT coding mode cannot?
  • On which ChatGPT plans is Codex included?
  • What does Codex's Terminal-Bench 2.0 score indicate relative to competitors?