Nathan C. Frey

Nathan C. Frey

398 Photos and videos

Tweets

jeremy retweeted

Nathan C. Frey

@nc_frey

Jun 9

Today we’re launching Claude Fable 5, a Mythos-class model made safe for general use. Fable 5 is far better than any model we’ve ever released on long-running tasks. Disentangling bio capabilities from risks is hard, so Fable 5 ships with safeguards that block responses in biology. These queries will receive responses from Claude Opus 4.8. We're investing in lab-grounded red-teaming so our biosafety calibration reflects actual threat models. That’s how we'll drive down false positive rates without lowering the bar on risks. We also plan to open a trusted access program soon for select life science organizations to access Mythos-class models for biology and chemistry use. Personally, Fable 5 has completely changed the way I work. Claude’s vision is now the best in the industry. I routinely let Fable cook for hours at a time on complex tasks without checking in, and it makes sensible choices. I’m excited to see what users do with this model. As always, my advice is to try a bunch of hard tasks that have never been possible before and see what you find. This is also the first model release that my team has had a small part in, just two months after joining @AnthropicAI. I’m insanely proud of our team and grateful to everyone at Anthropic who has jumped in to collaborating with us, especially to folks who are brand new to the wild world of biology and drug development. More soon.

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

248

26,429

Gabe Pereyra

jeremy retweeted

Gabe Pereyra

@gabepereyra

Jun 9

The first prompt I always run on a new model to test legal capabilities is “Draft an S-1” Immediately gives a good sense of the model's general capabilities. Can tell from the length of the S-1, formatting, structure and writing how big a jump in general and legal performance the new model is likely to be. We’ve found that just length correlates extremely well to how well the model will work in our legal agents. Screenshots show SpaceX S-1 drafted by Fable 5, Opus 4.8 and the actual one filed. Very clear that Fable 5 is a big step up from Opus 4.8 which is already significantly stronger than most other models at this task. The formatting and structure is significantly better and is also reflected in our LAB benchmark (13% vs. 10%). Super impressed in early testing of this model both on benchmark and in product. Huge congrats to the @anthropic team because this is very clearly a big step forward in model capabilities.

ALT Real

ALT Opus 4.8

ALT Fable 5

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

240

62,015

ClaudeDevs

jeremy retweeted

ClaudeDevs

@ClaudeDevs

Jun 9

Claude Fable 5 changed how we work on the Claude Code team day to day. We used to verify that Claude did the work right. Now we verify that it's doing the right work. Here’s the 3 biggest changes:

3:01

290

766

11,324

997,387

jeremy

jeremy

@jerhadf

Jun 10

I recommend staring at this plot for a long while. fable 5’s TTC scaling is incredible. this is my favorite plot from the release.

Deniz Birlikci

@denizbirlikci

Jun 9

Fable 5 has the most impressive inference-time scaling curve in FrontierCode Diamond.

198

23,816

jeremy

jeremy

@jerhadf

Jun 10

a few things I like: 1. even at low effort, fable 5 performs as well as opus 4.8 max at half the cost per task — and outperforms all other models. fable 5 low effort is amazingly good and fast cost-effective. 2. fable 5’s ceiling is starkly higher than any other model. if you want max performance, xhigh is amazing. it performs 2x better than any other model on these hard SWE tasks. 3. fable 5 performs the ~same as mythos 5 here. for almost all use cases, the fallback rate to opus 4.8 is very low.

745

Vals AI

jeremy retweeted

Vals AI

@ValsAI

Jun 9

Six months ago, no model could crack 20% on Vibe Code Bench. This week, Claude Fable 5 hit 90.4% How did we get here?

392

29,623

Scott Wu

jeremy retweeted

Scott Wu

@ScottWu46

Jun 9

A new top scorer just one day after our benchmark released! Especially strong on the hardest tasks: 13.4% -> 29.3% on FrontierCode Diamond compared to Opus 4.8.

Cognition

@cognition

Jun 9

Claude Fable 5 is now available in Devin. Fable 5 earns the #1 spot on FrontierCode, our benchmark for real-world engineering tasks that grades mergeability and quality:

222

21,897

jeremy

jeremy

@jerhadf

Jun 10

fable 5 for building worlds. keep imagining

Matt Shumer

@mattshumer_

Jun 9

Fable has solved 3D worldbuilding... utterly insane. This is all completely custom-built ThreeJs, running in the browser.

0:29

559

ASM

jeremy retweeted

ASM @ASM65617010

Jun 9

Claude Mythos 5 scores 59% on Humanity’s Last Exam, with no tools. As a contributor of HLE, I would never have expected such a score barely a year and a half after the benchmark’s release.

943

67,613

jeremy

jeremy

@jerhadf

Jun 10

many such cases — fable 5 is surprisingly cost-effective in terms of cost per successful task

Ceoz

@Ceoz_1

Jun 9

Replying to @VictorTaelin

Fr, now we agree, 100%. What has been your token usage experience ? In my experience it has been much cheaper (as crazy as this sounds) than 4.8 was, simply because it uses less tokens to accomplish more, it's comically efficient. It literally crushed a problem that took /goal with 5.5 XHIGH 11 hours, in 7 minutes, and proposed the exact test suite I was considering to it, it was magical, tho, Opus 4.8 could never, that's why before Fable I considered OpenAI ahead overall.

2,843

Logan Graham

jeremy retweeted

Logan Graham

@logangraham

Jun 9

Fable 5 is the same underlying model as Mythos 5, but with cybersecurity and biology blocks. Mythos is the first model that's made me feel that we've entered the next phase of model progress. For years, we've talked about cybersecurity / self-improvement / autonomy / model-dominated coding / biology implications of model progress. Some of these are issues to defend against; some are areas to advance. Mythos has made me & our team feel like we've seen the earliest glimpse of the world we've been talking about. Also, we published a lot of cyber eval results in the system card, including some evals we designed recently, as well as details of safeguards. In most cases, Mythos 5 ~= Mythos Preview. We found it ticked up on the new ExploitBench eval, and we opted to put that in the eval table so people can calibrate/update on advances in cyber capabilities to be prepared for. (We don't want to compete on offensive capabilities and don't try to.) But overall, Mythos 5 is an efficient model, about equal to Mythos Preview in most cases. I'd really like more people to design new security evals! The better models get, the more our limited evals only see a small part of the picture. In terms of where we go from here, here are some current thoughts: 1/ It's important we get Mythos cyber capabilities to defenders. We just have to do it safely and cautiously. We're working on an expanded trusted access program. We're working with government and industry to do this. I sort of envision the next 1-2 years being a large scale effort to make the world resilient design & implement new approaches to security. 2/ I think cybersecurity will start merging with AI security and alignment. Let's say you're a defender and you want to use a model -- will it break out of its sandbox? Will it stop where you tell it to stop? This is one reason I'm excited about working on cybersecurity. In the limit, it's the same thing as AI security. 3/ I really want people to develop new evals for... defensive cybersecurity, hardware security, autonomously running a business, advanced biology, and other parts of national security. Our internal eval ship rate is way, way up because Mythos makes it easy to iterate, especially on the engineering aspect of building evals. (Sometimes, we ask new hires to make a new eval on their first day, and another on the next). I’m excited we’re making this available as Fable 5, because I think the world spending time with the model is the most important way to calibrate.

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

180

26,598

Amol Avasare

jeremy retweeted

Amol Avasare

@TheAmolAvasare

Jun 9

To celebrate the Fable 5 launch, we just reset 5-hour and weekly limits for all users across our products! Enjoy 🚀

916

297,094

Sholto Douglas

jeremy retweeted

Sholto Douglas

@_sholtodouglas

Jun 9

601

18,687

Boris Cherny

jeremy retweeted

Boris Cherny

@bcherny

Jun 9

Fable 5 is the biggest step up I’ve felt in our models since Opus 4.5 back in November. After 4.5 came out I uninstalled my IDE when I realized that I’d been doing 100% of my coding in a terminal for a few weeks. With Fable, it’s felt like Claude has stepped up from being a coding agent to a thought and design partner in building the product. Fable has judgement, taste, and dimensionality in a way that previous models didn’t, leading me to trust it more with the most complex work. I think the first time I had this realization was when I asked Fable to debug something. It is the first model I have used that was so methodical and precise, taking measurements and adding logs then verifying that it truly fixed the issue before declaring victory. There’s nothing in claude code’s prompting telling the model to do that, it’s just part of its personality. It really has this “big model smell” that I haven’t felt before.

652

598

10,629

889,248

jeremy

jeremy

@jerhadf

Jun 9

Fable 5 made an entire high-quality CAD editor that can produce 3D-printer-ready blueprints

Prithvi Rajasekaran

@rgb_prithvi

Jun 9

I'm incredibly excited about our launch of Claude Fable 5 today! To test out the model, I gave it a complex long-running coding task: build a full-blown CAD editor, with CadQuery (a Python CAD framework) as the substrate. youtube.com/watch?v=tpjJeH1p…

1,787

Andrej Karpathy

jeremy retweeted

Andrej Karpathy

@karpathy

Jun 9

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

Claude

@claudeai

Jun 9

Replying to @claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

ALT Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

1,265

2,357

25,220

2,667,128

Mike Krieger

jeremy retweeted

Mike Krieger

@mikeyk

Jun 9

Claude Fable 5 is out today. The first Mythos-class model everyone can use & the first model I hand off whole projects to. This weekend I built a self-maintaining, proactive media tracker for myself, over 2 days with Fable taking large chunks at a time

247

40,145

Cursor

jeremy retweeted

Cursor

@cursor_ai

Jun 9

Claude Fable 5 is now available in Cursor. It sets a new state of the art on CursorBench at 72.9%, 8 points above the previous best.

261

448

6,069

1,174,329

Sholto Douglas

jeremy retweeted

Sholto Douglas

@_sholtodouglas

Jun 9

My favorite chart from our system card - FrontierCode is an excellent eval, and it accurately reflects the step up I feel when using Fable!

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

628

62,748

jeremy

jeremy

@jerhadf

Jun 9

Fable 5 is the best model I've ever used. I’ve been spending most of my time in the last few months helping to bring Mythos-level models to general availability safely. These models changed everything. So stoked that anyone can use Fable today! Can't wait to see what you all build with it.

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

227

12,577

more replies

jeremy

jeremy

@jerhadf

Jun 9

This chart is pretty wild.

504

jeremy

jeremy

@jerhadf

Jun 9

And this one!

269