I think the $800B question is: does the jailbreak allow for capability to be exposed that is > gpt 5.5? If it isn't then this all seems suspect. If it is, then hopefully Anthropic will sort and resubmit. I'd rather have caution for our nation's security than hope as a strategy though.
This episode shows at least 3 things:
(1) we don't have the right regime and safeguards in place to prevent this from happening reactively (which is terrible for American AI supply trustworthiness). We need a proactive strategy with objective standards (even if they are classified).
(2) the idea that the EO should be voluntary vs. be mandatory with clear and objective tests that are consistent for all parties is clearly wrong. This looks arbitrary even if it is the right decision for national security. We need a real AI Lab / Frontier safety plan that doesn't conflate it with an industry giveaway (like most preemption proposal do). If Obernolta/Trahan can dump the preemption or nontenable Developer/Deployer idea, it could be a good start on that.
(3) we are more fortunate that the Admin is taking risk a lot more seriously now than the guidance that was coming in March and before. If they go too far, Anthropic gets delayed and it's not fatal. If they miss it, China and others get something faster that could do real harm. But as Sacks says, this needs to get dealt with very fast to be viewed as "growing pains" in regulating something new instead of arbitrary to the world.
This is proof that "let them cook" is no longer a strategy that any sensible person can support.
Those that are gloating on Anthropic are inconsistent here because if Anthropic hadn't flagged these risks and just released Mythos like everyone had before, we would have already handed these capabilities to our adversaries. You can't have it both ways.
It's time for AI to grow up and the US Gov to find the right regime to regulate while allowing compelling applications that are pro innovation and pro human to flourish. We cannot go back to AI Amnesty and Moratorium thinking. It's logically and politically untenable.
I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true:
— As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
— Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.)
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.”
— In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
— In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community.
— The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority.
— Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.