In March 2024, I gave a presentation to Senate staffers with two simple claims: 1) AI agents were going to be incredibly capable at cyber offensive tasks and 2) there were no possible protections against jailbreaks. Throughout 2024 and 2025 I've repeated those claims.
I received three kinds of reactions. Polite curiosity, complete indifference, and... saying it didn't matter at all! Interestingly, the "didn't matter at all" camp almost all came from highly paid employees from publicly traded companies.
The crazy thing is: no defenses against jailbreaks is a widely accepted fact in the AI security community! We've been publishing work on this topic for years, along with many other prominent security researchers. Zico Kolter, who was on the board of OpenAI, published work on universal jailbreaks in 2023!
I've realized my strong suit isn't public communications or convincing the broader public of the implications of technical findings. But if you'd like to see into the future, follow work from my lab :)
Parsing this evening's events:
- The U.S. government approved the release of Fable 5 to the public, clearly under the presumption that the model's cybersecurity capabilities cannot be accessed by hackers, authoritarian regimes, etc.
- Recently (today?), "another company" showed the U.S. government that a jailbreak of Fable 5 *is possible*. Yes, a minor jailbreak - but how can a non-technical government official be assured that there aren't also other, more dangerous, jailbreaks in this model that won't be discovered by the CCP?
- Anthropic states, completely correctly, that: "We suspect that perfect jailbreak resistance is not currently possible for any model provider. Every safeguard used in the industry is vulnerable to non-universal jailbreaks (which can elicit some cyber information in specific circumstances), and it is likely that universal jailbreaks will eventually be found in the future. We stated this clearly when we released Fable 5."
- My best guess is that the U.S. government did not fully realize this at the time when the release of Fable 5 was approved.
- Per Axios, the government contacted Anthropic and asked to "pause releasing the... models but was unsuccessful" - i.e., Anthropic told the government to pound sand.
- Per Axios, this "prompt[ed] the export control letter".
- Per Axios, the U.S. government is *NOT* looking to restrict access to Fable to U.S. nationals forever. "The model needs to remain locked down until the U.S. governent's national security apparatus is hardened", which "could happen in a few weeks".
- I interpret Anthropic's reaction as challenging the government: "we believe the government should have the ability to block unsafe deployments, as part of a statutory process that is transparent, fair, clear, and grounded in technical facts. This action does not adhere to those principles."
If the Axios article is correct, I do not think any other model providers have anything to fear based solely on this evening's events, because: (1) they would hopefully be smarter than downright rejecting a request by the U.S. government to pause releasing a model, and (2) they will be required anyway under the recent executive order to give the U.S. government at least 30 days to test the model for cybersecurity capabilities - during which time the U.S. government would also be able to shore up its own cybersecurity defenses with the same model.
I remain extremely concerned that actions by one particular U.S. lab over the last few months might be moving us closer and closer to the scenario where at least that lab - and potentially all others - will be nationalized.