I told ya…
3 days back this beat was hot.
Today it's Notorious.
Recognize a real Don when you see one… The diligent old ox, as Chidevs name it. They don't gas. It officially dropped… and the front d'hier just got hoof-printed into the dirt. Nice and quiet.
The nums:
• FrontierSWE: 74.4%, trailing Slopus Maximus 4.8 by 0.7pp… that's noise, not a gap
• Terminal-Bench 2.1: 81.0, up from 63.5 on 5.1 ( 17.5pp)
• DeepSWE: 46.2, up from 18.0 ( 28.2pp)
• PostTrainBench: 34.3, beating Slopus 4.7 and GPT-5.5
• Every LHT eval: #2 global, #1 OSS
The arch:
• IndexShare: indexer reused every-4th sparse attn layer, per-token FLOPs ↓ 2.9× at 1M ctx
• MTP layer: speculative decoding acceptance ↑ 20%
• Effort ctrl: Non-Thinking → High → Max, latency vs depth on demand. At comparable token budgets, sits between Opus 4.7 and 4.8; at Max, punches deep into frontier territory
• 1M ctx engineered for agentic pressure. 740K log lines, 4-contract clause conflicts, 8h sustained trajectories w/o ctx rot
The edge:
• MIT, no geo-fences, weights open
• Ascend MindSpore, zero NVDA dep, infra decoupling as a feature
• While they gatekeep APIs and hike prices, Zhipu ships 20h eng sessions at a fraction of the cost on silicon they can't embargo.
The honest gaps:
• NL2Repo: -20pp vs Opus 4.8
• Tool-Decathlon: -11pp
• SWE-Marathon: 13% vs 26%… ultra-LHT (compilers, kernels, prod svcs) still frontier turf
• No VLM, T2T only
• HLE w/ tools: 54.7 vs 57.9
The rush-ox isn't yet the frontierest in the industry… it just goes. Now it gaps. 2-mo iter cycles, concrete eng gains, no hype. If this velocity holds, the 0.7pp gap to Opus 4.8 evaporates by 5.3.
The unvarnished raw-dog: OSS LHT agentic coding just became deployable, affordable, and NVDissident. The workhorse is now a warhorse.
Anthrop-sick wanted containment,
they got proliferation.
The asymmetry is on point.
No miss, anon.
🐉
z.ai/blog/glm-5.2