Holy moly, Anthropic is getting very serious about recursive self-improvement!
One word: acceleration.
Insane blog article.
Tl;dr:
â˘We are close to an AI capable of fully autonomously designing and building its own successor
â˘They stress this isnât here yet and isnât inevitable, but could arrive sooner than most institutions are ready for
â˘Anthropic engineers now ship on average 8x as much code per quarter as they did in 2021â2025
â˘Task length AI can reliably complete is doubling roughly every 4 months (up from every 7 months)
â˘Opus 3 (Mar 2024) handled ~4-minute tasks; Sonnet 3.7 (a year later) ~90-minute tasks; Opus 4.6 (a year after that) 12-hour tasks
â˘SWE-bench went from low single digits to saturated in two years; CORE-bench (research reproduction) went ~20% to saturated in 15 months
â˘METR found Claude Mythos Preview could work âat leastâ 16 hours, at the top of what they can currently measure
â˘As of May 2026, Claude authored 80% of code merged into Anthropicâs codebase (low single digits before Claude Code launched in Feb 2025)
â˘A March 2026 poll of 130 research staff: median respondent estimated ~4x output with Mythos Preview
â˘One April 2026 example: Claude shipped 800 fixes cutting a class of API errors 1,000x, work an engineer estimated would have taken a human four years
â˘Claude-written code quality: worse than human in late 2025, roughly at parity now, expected to be strictly better within the year
â˘On the hardest open-ended tasks, Claudeâs success rate hit 76% in May 2026, up 50 points in six months
â˘Code-speedup test: Opus 4 averaged ~3x speedup (May 2025), Mythos Preview ~52x (April 2026); a skilled human needs 4â8 hours to hit 4x
â˘In an AI-safety research project, Claude agents recovered 97% of a performance gap (vs ~23% for two human researchers in a week), over 800 compute-hours and ~$18K
â˘On picking the better ânext stepâ in research sessions, the best model beat the human choice 51% (Nov 2025, Opus 4.5) rising to 64% (April 2026, Mythos Preview)
â˘Human comparative advantage, for now: research taste and judgment, i.e. choosing which problems matter and when an approach is a dead end
Three possible futures
â˘The trend stalls (S-curve), but todayâs capabilities still diffuse widely; they consider this least likely
â˘Compounding efficiency gains, with humans still setting direction; 100-person firms doing the work of 10,000 ; they think this is the likely path
â˘Full recursive self-improvement, where AI builds its successors and pace is set by compute; the alignment outcome here is what theyâre least certain about
Our internal data shows Claude is accelerating AI developmentâa possible path to recursive self-improvement, or AI autonomously building a more capable successor.
Itâs happening faster than we thought, and the implications deserve greater attention.
anthropic.com/institute/recuâŚ