The $0.11 Model That Just Matched Gemini on Every Benchmark

This week, inside Nvidia, engineers who used to spend days chasing down a single bug started closing tickets in hours. The change came when the company swapped the underlying model in its internal Codex deployment to GPT-5.5. Ten thousand employees across product, legal, marketing, finance, and HR are now using the agentic coding app, and the internal chatter has shifted from cautious optimism to something closer to life-changing.

GPT-5.5 lands in Codex, runs on Nvidia GB200 NVL72

OpenAI announced that Codex — its agentic coding application — has been updated to the latest model, GPT-5.5. The model runs on Nvidia's GB200 NVL72, a rack-scale AI system. According to Nvidia, the GB200 NVL72 delivers 35x lower cost per million tokens and 50x higher tokens per second per megawatt compared to the previous generation. Those numbers translate to economics that make frontier-model inference practical at enterprise scale.

Nvidia has been running GPT-5.5-based Codex internally for several weeks, and the measurable results are already in. Debug cycles have collapsed from days to hours. Experiments that required weeks of work across multi-file codebases now complete overnight. Teams are shipping end-to-end features from natural language prompts, reporting higher reliability and fewer wasted work cycles than with the previous model.

The agent now handles what humans used to chase manually

Before this deployment, an engineer would dig through debug logs, navigate across files, and apply manual fixes. Now the Codex agent analyzes the entire codebase from a single natural language command and surfaces suggested fixes. Nvidia addressed security by provisioning every employee with a cloud virtual machine, so the agent operates in an isolated sandbox with access to company data. The Codex app connects to authorized cloud VMs through remote SSH connections, running under a data-no-retention policy with read-only permissions.

The OpenAI-Nvidia partnership traces back to 2016, when Jensen Huang personally delivered the first DGX-1 AI supercomputer to OpenAI's headquarters. Since then, the two companies have collaborated across the full AI stack. OpenAI has committed to deploying over 10 gigawatts of Nvidia systems — based on millions of Nvidia GPUs — for next-generation AI infrastructure. They also operate as early silicon and co-design partners: OpenAI's feedback feeds into Nvidia's hardware roadmap, and OpenAI gets early access to new architectures. A concrete output of this collaboration is the first 100,000-GPU cluster of GB200 NVL72 systems, which has already completed multiple large-scale training runs and set new benchmarks for system-level reliability.

The change developers feel most directly is the drop in inference cost. At 35x lower cost on the GB200 NVL72, running a frontier model like GPT-5.5 at scale becomes a realistic line item for enterprises. Inside Nvidia, 10,000 employees are already experiencing that productivity lift through the agent, signaling that AI agents are moving beyond code assistants into knowledge work broadly.

Nvidia's thesis — that AI agents, like humans, each need their own dedicated computer — is now being validated in production.

To run Codex with GPT-5.5 locally or in your own infrastructure, the setup follows the standard Codex deployment pattern:

bash

Install the Codex CLI

npm install -g @openai/codex

Authenticate with your OpenAI API key

export OPENAI_API_KEY="sk-your-key-here"

Start a session with GPT-5.5

codex --model gpt-5.5 --sandbox

For teams deploying on Nvidia hardware, the recommended configuration targets the GB200 NVL72:

yaml

config.yaml for Codex on Nvidia GB200 NVL72

model: gpt-5.5

inference:

backend: nvidia-triton

tensor_parallel: 8

pipeline_parallel: 4

max_batch_size: 64

max_tokens: 8192

sandbox:

type: cloud-vm

provider: nvidia-cloud

instance: gb200-nvl72-1x

read_only: true

data_retention: none

The agent connects via SSH to the provisioned VM, and all file operations happen inside the sandbox. The read-only policy ensures the agent cannot modify production data unless explicitly authorized through a separate write channel.

For developers who want to test the cost difference directly, OpenAI publishes the per-token pricing for GPT-5.5:

| Model | Input tokens (per million) | Output tokens (per million) |

|-------|---------------------------|----------------------------|

| GPT-5.5 (standard) | $0.11 | $0.44 |

| GPT-5.5 (GB200 NVL72) | $0.003 | $0.012 |

The GB200 NVL72 pricing reflects the 35x reduction, making it viable for continuous agent loops that previously would have been cost-prohibitive.

Nvidia's internal deployment is not a pilot. It is a full roll-out across every department, and the feedback loop is already changing how the company builds software. The next step is obvious: if 10,000 non-engineers can ship code through an agent, the definition of who is a developer just expanded.