Code with Claude Tokyo, main stage.
Theo Chu's 19-minute Research talk, "The capability curve," lands on one uncomfortable idea: the model in front of you is the wrong thing to optimize for
Watch the slope, not where it sits today
FrontierCode Diamond: until Tuesday, Opus 4.8 was the strongest public model anywhere 13.4%. Fable 5, from the same lab, just posted 29.3%. The score doubled
SWE-bench Pro tells the same story: Fable 5 at 80.3%, GPT-5.5 trailing at 58.6%
Stripe pointed Fable 5 at a 50M-line codebase and got a full migration in a single day. By hand: two months, minimum
Here's what kills most builders: every workaround you write for today's model has a half-life of weeks. The next release deletes it. The curve eats your scaffolding
Stop building for the snapshot. Build for the slope. ↓
Code with Claude Tokyo: Michael Cohen Jessica Yan on getting agents to production in days, not months
24-day workflow → 5 days. 3–6 months of infra → days. 24/7 agent → ~$58/mo. Task success → 10 pts
Netflix, Notion, Rakuten already in prod ↓
The agent logic was never the hard part. Sandboxing, state, retries, orchestration that's the 3–6 months teams burn before shipping a single line.
Claude Managed Agents turns it into a checkbox: hosted infra, secure sandboxing, state that survives disconnects, multi-agent orchestration out of the box.
What that looks like in production:
• Rakuten: a 24-day workflow down to 5, across five business functions each live in under a week. A 79% cut.
• Netflix: multiagent orchestration running live a lead agent fanning work out to specialists.
• Notion: dozens of agent sessions in parallel.
• Sentry: agents resolving production errors automatically.
The trajectory underneath Anthropic's own data:
Claude Code's longest autonomous runs doubled in 3 months, 25 → 45 min. The leash keeps getting longer.
You were about to spend a quarter building infra that's now a line of config.
Stop shipping plumbing. Ship agents.