OpenAI Codex CLI: The Rust-Powered Terminal Agent Taking on Claude Code

OpenAI Codex CLI
is an open-source (Apache 2.0), Rust-built terminal coding agent that has accumulated over 72,000 GitHub stars since its release. It pairs GPT-5.4’s 272K default context window (configurable up to 1M tokens) with operating-system-level sandboxing via Apple Seatbelt on macOS and Landlock/seccomp on Linux. That last detail matters: Codex CLI is the only major AI coding agent that enforces security at the kernel level rather than through application-layer hooks. Combined with codex exec for CI pipelines, MCP client and server support, and a GitHub Action for automated PR review, it has become the most infrastructure-ready competitor to Claude Code
in 2026.
Architecture and the Rust Rewrite
Codex CLI started life as a Node.js/TypeScript project in mid-2025. By late 2025, OpenAI had rewritten the core in Rust (the codex-rs crate), and as of early 2026 Rust accounts for roughly 95% of the codebase. This was not a vanity rewrite. The motivations were practical: eliminate the Node.js runtime dependency, get lower memory consumption with no garbage collection pauses, and gain native access to platform sandboxing APIs without FFI overhead.

The resulting binary is self-contained. No Node.js, no Python, no Docker required at runtime. You can install it through several channels:
npm install -g @openai/codex(an npm wrapper that downloads the Rust binary)brew install --cask codexon macOS- Direct binary download from the releases page
The project moves fast. As of March 2026, it has shipped over 640 tagged releases (roughly one per day since launch), accumulated 5,075+ commits from 400+ contributors, and reached 9,000 forks. The latest release at the time of writing is v0.118.0, published March 31, 2026. That release cadence suggests a large, well-resourced engineering team iterating aggressively. The full history is tracked on the Codex CLI changelog .
The default model is GPT-5.4, which ships with a 272K standard context window. Users can push this to 1M tokens via model_context_window and model_auto_compact_token_limit in the config. Previous defaults included GPT-5.3-Codex and GPT-5.2-Codex, and you can still select these or any other OpenAI model. Beyond the terminal, Codex also offers a desktop app mode via codex app and editor integrations for VS Code
, Cursor
, and Windsurf.
OS-Level Sandboxing - The Security Architecture That Sets It Apart
Most AI coding agents rely on application-layer safety mechanisms - permission prompts, command allowlists, hook-based interception. Codex CLI goes a level deeper: it enforces restrictions through the operating system kernel, so the model cannot bypass them regardless of what commands it tries to execute.
There are three sandbox permission modes:
| Mode | Behavior |
|---|---|
| Read-only (suggest) | Agent can read files and propose changes but cannot modify anything |
| Workspace-write (default) | Agent can write files within the project directory; network is blocked |
| Full access (danger) | No restrictions - intended for trusted environments only |
The implementation is platform-specific:
- macOS: Uses Apple’s Seatbelt
framework via
sandbox-execwith custom profiles for each permission level. When restricted read access is enabled, Codex appends curated macOS platform policies instead of broadly allowing/System, keeping tool compatibility while maintaining isolation. - Linux: Combines Bubblewrap
(
bwrap) for filesystem namespacing with seccomp for syscall filtering. Landlock LSM provides an additional filesystem access control layer. Network is blocked by default in standard sandbox modes. Codex vendors its own copy of Bubblewrap for consistent behavior across distributions. - Windows: Runs the Linux sandbox implementation through WSL.
For debugging, codex debug seatbelt and codex debug landlock let you test arbitrary commands through the sandbox before running them in a real session - useful for diagnosing why a particular tool fails under sandboxing.

Compare this to Claude Code’s approach. Claude Code uses application-layer safety through its hooks system, which provides 17 lifecycle event interception points. Hooks can inspect and block commands before execution, but they operate at the application level - a sufficiently determined or malformed command could theoretically circumvent them. Codex’s kernel-level approach is harder to bypass but less flexible for custom policies. The tradeoff is real: Claude Code’s hooks let teams write nuanced, project-specific rules (block certain API calls but allow others, for example), while Codex’s model is more binary - you pick a sandbox tier and the OS enforces it. OpenAI documents the full isolation model in their sandboxing architecture documentation .
CI, GitHub Integration, and the codex exec Pipeline
The codex exec command (short form codex e) runs Codex in non-interactive mode for scripted and CI workflows. It accepts prompt-plus-stdin, so you can pipe input from another process and pass a separate instruction on the command line. This turns Codex into something you can embed in automated pipelines, not just use interactively.
The Codex GitHub Action
(openai/codex-action@v1) wraps this into a GitHub Actions workflow step. It installs the CLI, starts the Responses API proxy when you provide an API key, and runs codex exec with configurable permissions. Practical use cases include:
- Automatically applying patches when tests fail in CI
- Posting code review comments on every PR
- Running Codex-driven quality checks as a merge gate
The automatic PR review feature deserves its own mention. When enabled in Codex settings, a separate Codex agent instance reviews every new pull request without requiring an @codex review comment. It posts inline review comments just like a human reviewer would.
Multi-agent coordination has also matured. Sub-agents now use readable path-based addresses like /root/agent_a with structured inter-agent messaging, enabling workflows where one Codex instance orchestrates others - for example, one agent writes code while another runs tests and a third reviews the diff.
Enterprise teams get proxy support with custom CA certificates and structured network policies, so Codex works behind corporate firewalls without TLS interception errors. There is also a growing plugins system
: Codex syncs product-scoped plugins at startup, and users can browse, install, and remove them through the /plugins interface.
MCP, AGENTS.md, and the Extensibility Stack
Codex CLI supports Model Context Protocol (MCP) as both client and server, which makes it a flexible building block in larger agentic architectures.
On the client side, you can connect Codex to external MCP servers for additional tools and context. Configuration lives in ~/.codex/config.toml or per-project config files. Local servers get a longer startup window, and failed handshakes surface warnings in the TUI. You can manage servers with the codex mcp CLI commands, and Codex launches them automatically when a session starts.
On the server side, Codex CLI itself can be invoked by other agents. This enables embedding Codex capabilities inside larger workflows orchestrated by the OpenAI Agents SDK or any MCP-compatible orchestrator.
For project-specific instructions, Codex reads AGENTS.md files from the project root - analogous to Claude Code’s CLAUDE.md and Gemini CLI’s GEMINI.md. AGENTS.md is intentionally simpler than its counterparts. The same config, AGENTS.md, skills, and MCP setup are shared across all Codex surfaces: CLI, VS Code integration, and the desktop app.
Recent updates added MCP install keyword suggestions and a server submenu in the “Add context” menu, making tool discovery faster. Codex also supports web search integration for real-time information retrieval during coding sessions and image input for attaching screenshots and design files as visual context.
Pricing, Performance, and How It Stacks Up
Codex CLI is free and open-source. You pay for the models behind it. As of April 2026, OpenAI has moved Codex pricing to a token-based model aligned with standard API rates.
Benchmark Comparison
| Benchmark | Codex CLI (GPT-5.3) | Claude Code (Opus 4.6) | Winner |
|---|---|---|---|
| Terminal-Bench 2.0 | 77.3% | 65.4% | Codex CLI |
| SWE-Bench Verified | 75.2% | 80.9% | Claude Code |
| Blind code quality (head-to-head) | 25% | 67% | Claude Code |
The benchmarks tell a split story. Codex CLI leads on terminal-native task performance by a 12-point margin on Terminal-Bench 2.0. Claude Code leads on real-world bug fixing (SWE-Bench) and wins 67% of blind code quality comparisons. MorphLLM’s benchmark comparison and Builder.io’s analysis both explore these differences in depth.
Token efficiency favors Codex CLI significantly - it uses roughly 3-4x fewer tokens per task than Claude Code, making it meaningfully cheaper per operation at scale.
GPT-5.4 Pricing
| Tier | Price |
|---|---|
| Input tokens | $2.50 / 1M tokens |
| Cached input tokens | $1.25 / 1M tokens |
| Output tokens | $15.00 / 1M tokens |
| Long-context input (>272K) | $5.00 / 1M tokens |
ChatGPT subscribers can use Codex CLI without separate API billing. Plus ($20/month) subscribers get 30-150 messages per 5-hour window, while Pro ($200/month) subscribers get 300-1,500 messages. On API billing, average developer spend runs $100-200/month depending on usage intensity and model selection.
GPT-5.4 generates at 240+ tokens per second, with the lighter Spark model exceeding 1,000 tokens per second. That raw speed, combined with the token efficiency advantage, means Codex CLI sessions tend to feel faster and cost less per task than comparable Claude Code sessions - though Claude Code often produces higher-quality output that requires fewer iterations.
What To Optimize For
The choice between Codex CLI and Claude Code comes down to priorities:
- Codex CLI makes more sense if you value OS-level sandboxing, CI/CD integration, open-source licensing, token efficiency, and raw terminal speed.
- Claude Code is the better fit if you prioritize code quality, complex reasoning across large codebases, flexible hook-based policies, and a mature skill ecosystem.
Neither tool is standing still. Codex CLI’s daily release cadence and growing contributor base suggest the code quality gap may narrow over time, and Claude Code’s agent teams feature and expanding MCP support show Anthropic is working to match Codex on infrastructure readiness. The full Codex pricing page and Blake Crosley’s deep dive are useful resources for making a more detailed comparison against your specific workflow.