Google just dropped a major upgrade to Gemini 3’s Deep Think mode – and the benchmark numbers are aggressive.
The model scored 84.6% on ARC-AGI-2, well ahead of Opus 4.6 (68.8%) and GPT-5.2 (52.9%), and set a new high of 48.4% on Humanity’s Last Exam.
It also posted gold-medal level results on the 2025 Physics and Chemistry Olympiads, and reached 3,455 Elo on Codeforces – nearly 1,000 points above Opus 4.6.
Alongside the upgrade, Google introduced Aletheia, a math agent designed to autonomously solve open problems and verify proofs.
Deep Think is live for AI Ultra subscribers, with API access rolling out to researchers via early access.
After much of 2026’s attention shifted toward Anthropic and OpenAI, Google is reminding the field it’s still very much in the race – especially in math and science reasoning.