Anthropic spent years telling Washington its frontier models were dangerous. This week, Washington took the warning seriously — and used it to pull Anthropic's two most capable models offline.
Most takes are picking a team. The more honest read is that both sides are right about each other and wrong about themselves.
What happened, stripped of spin:
Anthropic released Fable 5 — a guardrailed, public version of its far more capable Mythos cyber model — on June 9. Dario published a policy essay the next day arguing frontier models are now of national strategic consequence and calling for binding government rules. Within about 48 hours, Amazon's researchers reportedly jailbroke Fable back toward Mythos-level capability. Amazon's CEO took it directly to Treasury Secretary Bessent and other officials. The administration asked Anthropic to fix the jailbreak or pull the model. Anthropic refused. Commerce then issued an export-control directive barring all foreign nationals — including Anthropic's own foreign-born employees — and Anthropic, unable to filter by citizenship in real time, disabled both models for everyone. It called the order a misunderstanding.
That's the event everyone's fighting over. Here's the strongest version of each side.
Anthropic's case:
The whole posture is "race to the top." They believe frontier models are genuinely dangerous, which is exactly why they keep asking for binding rules on themselves. But read the essay: it asks for a rules-based regime — mandatory third-party testing, a government power to block models that fail, scoped to frontier-scale systems, with protections against politically motivated decisions. They asked for the FAA. What they got was the opposite of the FAA: a verbal, evidence-light, same-day directive triggered by a competitor's phone call — and that competitor is also Anthropic's largest investor — with no published standard and a scope so broad it functioned as a backdoor global shutdown rather than a targeted fix. On substance, they say the jailbreak surfaced only a few previously known, minor vulnerabilities, and that the same underlying capability is already available in models like GPT-5.5. So singling out Fable secures nothing.
The government's case — and it's actually two different cases:
Bessent's is systemic risk. A model that can autonomously find and weaponize software flaws is not a chatbot, and Treasury had already been warning bank CEOs about exactly this class of risk. If a jailbreak turns "safe" Fable back into full Mythos, then the safeguards Anthropic marketed as the precondition for releasing it failed in precisely the way that matters for banks and critical infrastructure. When you don't yet know who accessed the model or whether a hostile state has a copy, you lock it down until your defenses are hardened — because being wrong in the permissive direction is far costlier than a few weeks of disruption.
Sacks's is different: regulatory capture. By his account, a trusted partner found a jailbreak, the administration gave Anthropic a simple choice — patch it or pull it — and Anthropic chose to keep its consumer product live. His longstanding view is that Anthropic's safety advocacy is a sophisticated strategy to write rules that raise barriers for everyone else. So when the rules finally bit Anthropic, it balked. His framing is that this is narrow, temporary, and reversible: Anthropic can end it tomorrow by doing what it told the rest of the industry to accept.
The wrinkle worth understanding:
Anthropic's GPT-5.5 claim isn't "the same jailbreak works on GPT-5.5." It's that the capability the jailbreak unlocks — reading a codebase and finding flaws — is already on the market, sometimes with no jailbreak at all. If that's true, pulling Fable just hands market share to an unrestricted competitor and secures nothing. If it's false — if jailbroken Fable reaches a Mythos-class ceiling GPT-5.5 can't touch — then controlling the single most capable asset first is consistent, not arbitrary, and it's the same logic as chip-export policy. That one empirical question decides who's right. The claim comes from an interested party with an IPO in the pipeline; the counter-case comes from a government that won't show its technical work. Neither side has put the comparison on the table.
Where I land:
The process was the real failure. Running national AI-safety policy through a standardless, same-day directive triggered by a competitor's call is a bad mechanism — arbitrary, and just as capturable in the other direction. And the directive bars every foreign national — allied British, Canadian, and Australian researchers included, alongside Anthropic's own staff — over a single disputed report. Critics have flagged the incoherence: an administration weighing looser chip exports to China, locking every allied researcher out of a model. That reads less like precision security than leverage in a separate fight.
But the hypocrisy charge isn't empty. "We want a kill switch governed by due process" and "we reject a kill switch pulled arbitrarily" are not contradictory positions. Yet Anthropic did refuse a direct safety request on a model it itself branded as dangerous — and "trust us, it's minor" is exactly the self-assessment its own essay argues no company should be left to make.
So both undercut their own principles. The government reached for a blunt, capturable instrument. Anthropic fell back on its own judgment the moment independent review became inconvenient. And the question that actually decides it — how serious the jailbreak really was, and who reached Mythos — is the one neither side has answered in public, while both confidently assert opposite answers.
That's not a safety regime yet. It's two parties improvising, and asking the rest of us to trust the improvisation.