Head of the Frontier Red Team @anthropicai. 🌎 Make things radically good.

Joined June 2009
102 Photos and videos
Fable 5 is the same underlying model as Mythos 5, but with cybersecurity and biology blocks. Mythos is the first model that's made me feel that we've entered the next phase of model progress. For years, we've talked about cybersecurity / self-improvement / autonomy / model-dominated coding / biology implications of model progress. Some of these are issues to defend against; some are areas to advance. Mythos has made me & our team feel like we've seen the earliest glimpse of the world we've been talking about. Also, we published a lot of cyber eval results in the system card, including some evals we designed recently, as well as details of safeguards. In most cases, Mythos 5 ~= Mythos Preview. We found it ticked up on the new ExploitBench eval, and we opted to put that in the eval table so people can calibrate/update on advances in cyber capabilities to be prepared for. (We don't want to compete on offensive capabilities and don't try to.) But overall, Mythos 5 is an efficient model, about equal to Mythos Preview in most cases. I'd really like more people to design new security evals! The better models get, the more our limited evals only see a small part of the picture. In terms of where we go from here, here are some current thoughts: 1/ It's important we get Mythos cyber capabilities to defenders. We just have to do it safely and cautiously. We're working on an expanded trusted access program. We're working with government and industry to do this. I sort of envision the next 1-2 years being a large scale effort to make the world resilient design & implement new approaches to security. 2/ I think cybersecurity will start merging with AI security and alignment. Let's say you're a defender and you want to use a model -- will it break out of its sandbox? Will it stop where you tell it to stop? This is one reason I'm excited about working on cybersecurity. In the limit, it's the same thing as AI security. 3/ I really want people to develop new evals for... defensive cybersecurity, hardware security, autonomously running a business, advanced biology, and other parts of national security. Our internal eval ship rate is way, way up because Mythos makes it easy to iterate, especially on the engineering aspect of building evals. (Sometimes, we ask new hires to make a new eval on their first day, and another on the next). I’m excited we’re making this available as Fable 5, because I think the world spending time with the model is the most important way to calibrate.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
17
17
180
26,491
New post on Red today: Our team @AnthropicAI found that Mythos Preview is meaningfully better at developing N-days. It took us a couple thousand $ and a few hours to convert patches into exploits. We publish research like this because we think it's important the world knows what models are/will be capable of. In a year, Mythos will probably look trivial. We want to help the world to start preparing. I'm excited to share a lot more blue team / defensive work. I feel like people are aware of the issue now, and the team's task is now to "solve it all" -- we have some exciting / interesting / creative defensive research lined up.
27
64
663
81,493
This is good! I started red teaming LLMs for biorisks/weapons risks in my bedroom in Nov 2022. In 2023 a lot of people said we were overreacting -- 'models won't be better than internet search'. True then, but the important point is they were going to get a lot better. Now I think there's clear reason for caution. The ultimate solution is inventing deploying biodefenses. This is one step. I fully expect humanity to conquer pandemics in the 21st century
Sam Altman, Dario Amodei, Demis Hassabis and many others have signed a letter urging Congress to increase security on orders of synthetic nucleic acids - and the equipment needed to make them - as models continue to become increasingly bio-capable.
18
13
198
26,139
We're expanding Glasswing today. To solve such a big/complex/urgent problem, we need Mythos-level capabilities in as many defenders' hands as possible. That's why we're working on safeguards to scale that safely ASAP. 11 of my reflections from the past 2 months of Glasswing 🧵:
We’re expanding Project Glasswing. We’ve extended access to Claude Mythos Preview to approximately 150 additional organizations, based in more than fifteen countries. Read more about this expansion and our future plans for Project Glasswing: anthropic.com/news/expanding…
17
40
514
147,270
- Cyber safeguards are a hard urgent techno-philosophical problem. - ...but powerful models w/o safeguards may come soon (~3, 6, 18 months max?). We need to scale access to defensive tools urgently. - Industry, government, maintainers, and researchers are taking this moment very seriously.
6
4
77
9,216
- Glasswing has been a rallying mission for Anthropic. Our team is incredible. - ...and we want to do whatever we can to kickstart the orgs/tools/initiatives that cyberdefenders need. Glasswing will probably seem very tiny within 6 months. Working on the latter right now!
4
77
4,921
Within ~2 years, there might be >0.5 Manhattan Projects worth of philanthropic $$$ to spend on the biggest challenges, like cyber or biodefense. @nanransohoff is right: the hard part is finding enough good organizations to spend the $. You should build one!
New blog post: The third wave of American philanthropy Hundreds of billions of dollars in new philanthropic capital will soon become liquid. The OpenAI Foundation holds 26% of OpenAI, worth about $220B at today’s valuation. Anthropic’s seven co-founders have pledged to give away 80% of their wealth and have instituted the most aggressive donor matching program for employees in tech history. How much does this all add up to? And how meaningful is that in the context of philanthropy today? I was doing some simple napkin math to wrap my head around the scale of what’s coming, and radicalized myself in the process. I had dramatically underappreciated the scale of the philanthropic capital that’s about to become available and the corresponding gap in talent and organizations that will be needed to make the most of it. This piece aims to directionally sketch the scale of what’s coming, the gap in operational capacity needed to absorb it, and what we can do to fill it. (Link to full post in reply)
16
20
242
28,413
Also, I'm pretty bullish that industry philanthropy can together nearly solve biodefense cyberdefense via innovation And this scale of money is about the scale needed to do it.
5
37
2,981
Also, that's 0.5 Manhattan Projects per year, and that's a very conservative estimate.
8
1,862
A lot of people have been wondering about Mythos, Glasswing, and the vulns we / our partners are fixing. Today, I’m excited for us to start sharing more. (For context, I lead Glasswing @AnthropicAI.) Two independent evaluations this week—from XBOW and the UK AISI—confirm what we've been seeing internally: Claude Mythos Preview is a step change in autonomous cybersecurity capabilities. We need to start preparing fast for a world of models with this level of capabilities. The UK AI Security Institute tested the model we shipped at the launch of Project Glasswing and found Mythos Preview is the first model to solve both of their end-to-end cyber ranges, including one (Cooling Tower) which no model had ever cleared. But attackers (and defenders) have sophistication & cost constraints – Mythos is also the only model that clears every one of their tasks estimated over 8 hours under their deliberately low 2.5M-token cap. XBOW tested it on their offensive security benchmarks, finding "token-for-token, unprecedented precision." It's the only model to succeed at subtle V8 sandbox work. Other Glasswing partners shared similar stories. In a few weeks of testing, Mythos Preview has helped them find many thousands of (estimated) high critical severity vulnerabilities, sometimes double what they'd normally find in a year. I don't share this to boost Mythos. In fact, this is not about Mythos. It’s about preparing for the coming world of models being better, faster, cheaper, and more creative than some of the best human experts at dual use capabilities. Clearly, we need them supporting defenders as widely as can be done safely – and especially the least resourced ones. Within a year, Mythos will probably look quite dumb (relative to other new models). And others may release openly available or unguardrailed models of Mythos-level capabilities. We started Project Glasswing because capabilities like Mythos Preview's won't stay rare, or stay in careful hands. We are bringing it to defenders as fast as we responsibly can, while working to figure out, for example, the right safeguards and patching & disclosure processes. Also, to be clear, compute has never been a limiter in our rollout. Expect a fuller update on our Glasswing work in the coming days. XBOW report: xbow.com/blog/mythos-offensi… UK AISI report: aisi.gov.uk/blog/how-fast-is…
Replying to @AISecurityInst
Our cyber range results illustrate this step-up. Since our first Mythos evaluation, we received access to a newer Mythos Preview checkpoint. On a 32-step corporate network attack we estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.
72
221
1,430
673,976
I'm optimistic this eventually favors defense over offense. We wanted to start this transition cautiously. I've honestly been inspired by what orgs have been able to do with Mythos. More to come!
With the help of Claude Mythos Preview, the Firefox team fixed more security bugs in April than in the past 15 months combined.
11
19
245
49,168
The team I lead is part of The Anthropic Institute. The way I think about the Institute is "applied weird blue sky research". So far we've got a good track record of tackling some of the most important ideas -- cyber, self-improvement, robots, national security.
We’re sharing the research agenda of The Anthropic Institute, or TAI. TAI will focus on four areas: 1) Economic diffusion 2) Threats and resilience 3) AI systems in the wild 4) AI-driven R&D Read the full agenda: anthropic.com/research/anthr…
10
8
226
27,004
Something I've seen from Mythos / Glasswing is that one of the things companies/orgs/maintainers need most is a first model-driven boost in finding fixing vulns. That's a big reason we're scaling Claude Security today for more users: x.com/claudeai/status/204989…

Apr 30
Claude Security is now in public beta for Claude Enterprise customers. Claude scans your codebase for vulnerabilities, validates each finding to cut false positives, and suggests patches you can review and approve.
7
2
89
16,228
Some main learnings so far: 1) Use models in security today to get a glimpse of the future. 2) Start finding and fixing things. 3) Figure out how to scale it when more powerful models arrive. Claude Security is a great way to do that!
1
1
18
1,793
Also, if you're a security researcher / leader really motivated by the mission of "solve the whole AI cyber problem", you should apply to Anthropic. We're looking e.g. for vulnerability researchers, senior security researchers and engineers, AI security research leaders, etc.
Privileged to help lead this. Thankful to our partners. Mythos is an extraordinary model. But it is not about the model. It's about what the world needs to do to prepare for a future of models that are extremely good at cybersecurity. This is the start.
39
31
544
60,525
Seeing this on Slack that day was one of the first "oh, I guess we're just seeing it now" moments for those who think about AI security
Mythos sandbox escape and many more wild instances are in the Model Card
2
7
165
19,555