helping grow more claudes. personal views only

Joined January 2014
398 Photos and videos
jeremy retweeted
Today we’re launching Claude Fable 5, a Mythos-class model made safe for general use. Fable 5 is far better than any model we’ve ever released on long-running tasks. Disentangling bio capabilities from risks is hard, so Fable 5 ships with safeguards that block responses in biology. These queries will receive responses from Claude Opus 4.8. We're investing in lab-grounded red-teaming so our biosafety calibration reflects actual threat models. That’s how we'll drive down false positive rates without lowering the bar on risks. We also plan to open a trusted access program soon for select life science organizations to access Mythos-class models for biology and chemistry use. Personally, Fable 5 has completely changed the way I work. Claude’s vision is now the best in the industry. I routinely let Fable cook for hours at a time on complex tasks without checking in, and it makes sensible choices. I’m excited to see what users do with this model. As always, my advice is to try a bunch of hard tasks that have never been possible before and see what you find. This is also the first model release that my team has had a small part in, just two months after joining @AnthropicAI. I’m insanely proud of our team and grateful to everyone at Anthropic who has jumped in to collaborating with us, especially to folks who are brand new to the wild world of biology and drug development. More soon.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
24
17
248
26,429
jeremy retweeted
The first prompt I always run on a new model to test legal capabilities is “Draft an S-1” Immediately gives a good sense of the model's general capabilities. Can tell from the length of the S-1, formatting, structure and writing how big a jump in general and legal performance the new model is likely to be. We’ve found that just length correlates extremely well to how well the model will work in our legal agents. Screenshots show SpaceX S-1 drafted by Fable 5, Opus 4.8 and the actual one filed. Very clear that Fable 5 is a big step up from Opus 4.8 which is already significantly stronger than most other models at this task. The formatting and structure is significantly better and is also reflected in our LAB benchmark (13% vs. 10%). Super impressed in early testing of this model both on benchmark and in product. Huge congrats to the @anthropic team because this is very clearly a big step forward in model capabilities.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
12
12
240
62,015
jeremy retweeted
Claude Fable 5 changed how we work on the Claude Code team day to day. We used to verify that Claude did the work right. Now we verify that it's doing the right work. Here’s the 3 biggest changes:
290
766
11,324
997,387
Jun 10
I recommend staring at this plot for a long while. fable 5’s TTC scaling is incredible. this is my favorite plot from the release.
Fable 5 has the most impressive inference-time scaling curve in FrontierCode Diamond.
4
8
198
23,816
Jun 10
a few things I like: 1. even at low effort, fable 5 performs as well as opus 4.8 max at half the cost per task — and outperforms all other models. fable 5 low effort is amazingly good and fast cost-effective. 2. fable 5’s ceiling is starkly higher than any other model. if you want max performance, xhigh is amazing. it performs 2x better than any other model on these hard SWE tasks. 3. fable 5 performs the ~same as mythos 5 here. for almost all use cases, the fallback rate to opus 4.8 is very low.
1
15
745
jeremy retweeted
Six months ago, no model could crack 20% on Vibe Code Bench. This week, Claude Fable 5 hit 90.4% How did we get here?
13
34
392
29,623
jeremy retweeted
A new top scorer just one day after our benchmark released! Especially strong on the hardest tasks: 13.4% -> 29.3% on FrontierCode Diamond compared to Opus 4.8.
Claude Fable 5 is now available in Devin. Fable 5 earns the #1 spot on FrontierCode, our benchmark for real-world engineering tasks that grades mergeability and quality:
11
10
222
21,897
Jun 10
fable 5 for building worlds. keep imagining
Fable has solved 3D worldbuilding... utterly insane. This is all completely custom-built ThreeJs, running in the browser.
6
559
jeremy retweeted
Claude Mythos 5 scores 59% on Humanity’s Last Exam, with no tools. As a contributor of HLE, I would never have expected such a score barely a year and a half after the benchmark’s release.
24
48
943
67,613
Jun 10
many such cases — fable 5 is surprisingly cost-effective in terms of cost per successful task
Jun 9
Replying to @VictorTaelin
Fr, now we agree, 100%. What has been your token usage experience ? In my experience it has been much cheaper (as crazy as this sounds) than 4.8 was, simply because it uses less tokens to accomplish more, it's comically efficient. It literally crushed a problem that took /goal with 5.5 XHIGH 11 hours, in 7 minutes, and proposed the exact test suite I was considering to it, it was magical, tho, Opus 4.8 could never, that's why before Fable I considered OpenAI ahead overall.
1
46
2,843
jeremy retweeted
Fable 5 is the same underlying model as Mythos 5, but with cybersecurity and biology blocks. Mythos is the first model that's made me feel that we've entered the next phase of model progress. For years, we've talked about cybersecurity / self-improvement / autonomy / model-dominated coding / biology implications of model progress. Some of these are issues to defend against; some are areas to advance. Mythos has made me & our team feel like we've seen the earliest glimpse of the world we've been talking about. Also, we published a lot of cyber eval results in the system card, including some evals we designed recently, as well as details of safeguards. In most cases, Mythos 5 ~= Mythos Preview. We found it ticked up on the new ExploitBench eval, and we opted to put that in the eval table so people can calibrate/update on advances in cyber capabilities to be prepared for. (We don't want to compete on offensive capabilities and don't try to.) But overall, Mythos 5 is an efficient model, about equal to Mythos Preview in most cases. I'd really like more people to design new security evals! The better models get, the more our limited evals only see a small part of the picture. In terms of where we go from here, here are some current thoughts: 1/ It's important we get Mythos cyber capabilities to defenders. We just have to do it safely and cautiously. We're working on an expanded trusted access program. We're working with government and industry to do this. I sort of envision the next 1-2 years being a large scale effort to make the world resilient design & implement new approaches to security. 2/ I think cybersecurity will start merging with AI security and alignment. Let's say you're a defender and you want to use a model -- will it break out of its sandbox? Will it stop where you tell it to stop? This is one reason I'm excited about working on cybersecurity. In the limit, it's the same thing as AI security. 3/ I really want people to develop new evals for... defensive cybersecurity, hardware security, autonomously running a business, advanced biology, and other parts of national security. Our internal eval ship rate is way, way up because Mythos makes it easy to iterate, especially on the engineering aspect of building evals. (Sometimes, we ask new hires to make a new eval on their first day, and another on the next). I’m excited we’re making this available as Fable 5, because I think the world spending time with the model is the most important way to calibrate.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
17
17
180
26,598
jeremy retweeted
To celebrate the Fable 5 launch, we just reset 5-hour and weekly limits for all users across our products! Enjoy 🚀
58
29
916
297,094
jeremy retweeted
21
24
601
18,687
jeremy retweeted
Fable 5 is the biggest step up I’ve felt in our models since Opus 4.5 back in November. After 4.5 came out I uninstalled my IDE when I realized that I’d been doing 100% of my coding in a terminal for a few weeks. With Fable, it’s felt like Claude has stepped up from being a coding agent to a thought and design partner in building the product. Fable has judgement, taste, and dimensionality in a way that previous models didn’t, leading me to trust it more with the most complex work. I think the first time I had this realization was when I asked Fable to debug something. It is the first model I have used that was so methodical and precise, taking measurements and adding logs then verifying that it truly fixed the issue before declaring victory. There’s nothing in claude code’s prompting telling the model to do that, it’s just part of its personality. It really has this “big model smell” that I haven’t felt before.
652
598
10,629
889,248
Fable 5 made an entire high-quality CAD editor that can produce 3D-printer-ready blueprints
I'm incredibly excited about our launch of Claude Fable 5 today! To test out the model, I gave it a complex long-running coding task: build a full-blown CAD editor, with CadQuery (a Python CAD framework) as the substrate. youtube.com/watch?v=tpjJeH1p…
1
28
1,787
jeremy retweeted
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
Replying to @claudeai
Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.
1,265
2,357
25,220
2,667,128
jeremy retweeted
Claude Fable 5 is out today. The first Mythos-class model everyone can use & the first model I hand off whole projects to. This weekend I built a self-maintaining, proactive media tracker for myself, over 2 days with Fable taking large chunks at a time
23
8
247
40,145
jeremy retweeted
Claude Fable 5 is now available in Cursor. It sets a new state of the art on CursorBench at 72.9%, 8 points above the previous best.
261
448
6,069
1,174,329
jeremy retweeted
My favorite chart from our system card - FrontierCode is an excellent eval, and it accurately reflects the step up I feel when using Fable!
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
34
41
628
62,748
Fable 5 is the best model I've ever used. I’ve been spending most of my time in the last few months helping to bring Mythos-level models to general availability safely. These models changed everything. So stoked that anyone can use Fable today! Can't wait to see what you all build with it.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
20
6
227
12,577
This chart is pretty wild.
1
13
504
And this one!
1
4
269