Rene Haas just confirmed the Vera CPU thesis on yesterday’s Arm Q4 call. He didn’t mean to
His framing: GPUs are reticle-limited. CPUs are not. The ratio shift is happening in core count, not chip count
His exact words: “256 Vera CPU chips, 88 cores per chip, a 200-kilowatt liquid-cooled rack designed to sit in a data center adjacent to a Vera Rubin system”
That is not a host CPU. That is a dedicated agentic orchestration
Two days ago NVIDIA’s own engineers published the receipt. They traced a real 33-minute Claude Code session:
283 inference requests
58 main-agent turns coordinating 225 sub-agent invocations
Context grew from 15K to 156K tokens before compaction dropped it to 20K
Main agent alone processed ~3.5 million input tokens in the first 40 turns
Anthropic’s own number: agentic systems consume up to 15x more tokens than chat. Coding agents sustain 95 to 98 percent prompt cache hit rates. Without caching, costs would be 6x higher
This is what’s happening between GPU calls. File reads. Tool invocations. Sub-agent spawns. Compaction. KV cache management. None of it runs on the GPU
That’s why 12,000 GPUs need 400,000 CPU cores. The 33-to-1 ratio isn’t a forecast. It’s a measurement
NVIDIA states it in the blog directly: this won’t be resolved by adding more compute FLOPs and memory capacity
Translation: the GPU-only path is exhausted. The agentic chapter requires a platform, not a chip
Their seven-chip answer:
Vera Rubin NVL72 —capacity and prefill
Vera CPU — tool execution, KV cache offload
Groq 3 LPX — SRAM-first decode, low-jitter generation
NVLink 6, ConnectX-9, BlueField-4, Spectrum-X — fabric
Result they claim: 400 tokens per second per user on trillion-parameter MoE at 400K context. Vera spec: 88 Olympus cores, 176 threads, 1.8 TB/s NVLink-C2C, 1.2 TB/s LPDDR5X, 227 billion transistors. A 256-CPU rack delivers 45,056 threads and 400 TB of memory
One detail nobody is talking about. The blog’s second author was previously Head of Agents at Groq. The third was previously at
@GroqInc and Intel. NVIDIA didn’t license the LPX architecture. They absorbed the team that built it
Haas isn’t pitching a competing thesis. He’s confirming this one from the other side of the table. Arm data center royalties doubled year-on-year. He expects them to double again
Things feel slow right now because we’re between platforms. The speedup ships in H2 2026. The architectural argument is over. Deployment is the only variable left
I cover this in The Quiet Architect and The Fourth Piece
$arm $NVDA