Comms @MIRIBerkeley. RT = increased vague psychological association between myself and the tweet.

Joined November 2008
604 Photos and videos
Pinned Tweet
Who should I add to this? Also, did I get anyone's view wrong?
87
27
402
66,176
Rob Bensinger ⏹️ retweeted
Replying to @AnthropicAI
This is a needed and candid post by the Anthropic Institute. I agree with the conclusion that we need more time before we are hit with the “immense implications” of AI technology. My team at the Machine Intelligence Research Institute has worked to detail an international agreement (techgov.intelligence.org/blo…) which satisfies the requirements which are laid out by Anthropic: that a pause must include all frontier AI developers anywhere on Earth and must be mutually verified. Our contribution includes answering how to address the technological particulars of verifying a frontier AI development pause, and how to structure the agreement for stability and effectiveness. Our work is a model, and we would welcome collaboration with Anthropic to further develop and refine it. Some important points for enabling international coordination: - We task governments, rather than labs, to coordinate and verify the pause, because they have the diplomatic and national intelligence means to do so and they can architect binding rules that apply to everyone. - The United States is capable of halting frontier AI development globally, unilaterally and/or through coordination with key allies. While this is not preferred to a broadly coordinated halt, it strengthens the US’s hand in negotiating one.
5
26
1,924
Rob Bensinger ⏹️ retweeted
The most dangerous thing about Mythos is probably speed-up of AI development, nudging the world closer to full RSI and actual superintelligence. This is far more concerning than the models’ cyber or bio capabilities.
7
6
78
10,587
Rob Bensinger ⏹️ retweeted
Joseph Henrich's idea that culture is "the secret of our success" is sometimes read as a conservative idea. We should imitate our elders and defer to tradition, because that's the way humanity has always succeeded. But actually, whether this is right totally depends on the type of environment you're in. The faster the environment is changing, the less value there is in cultural learning, and the better it is to just try to figure stuff out for yourself. And today's world is changing way faster than the ancestral environment!
9
23
179
24,194
Rob Bensinger ⏹️ retweeted
Seems like a pretty easy leap from "Mythos has capabilities we don't want our adversaries to have" to "Future more powerful AI systems could have capabilities we don't want the AI system itself to have if we don't have clear ways of knowing that it will do what we want"
7
14
141
5,436
Rob Bensinger ⏹️ retweeted
Europe 2031 is well-intentioned but regrettably timid. The authors hide the existential risk by AI in a fold-out FAQ section instead of addressing it head-on and this is just one case of them not taking their own premises seriously. Decision-makers need the truth. They won't make better decisions if you sugarcoat how dire the situation is. For Europe itself, the train has likely already left the station anyway.
4
2
38
3,106
Rob Bensinger ⏹️ retweeted
Fun fact: Marc "It's Time To Build" Andreessen went OUT OF HIS WAY to block homes from being built in his town - the most expensive zip in the USA. In his letter, he didn't even try to hide why: "They will MASSIVELY decrease our home values."
Andreesen is the worst kind of libertarian, the one who believes only he should be exempt from the rules If the president personally banned his company's newest product with 0 warning, he'd throw the biggest hissy fit in history, but he cheers it on when it happens to others
9
7
91
6,029
Rob Bensinger ⏹️ retweeted
Uhhh so incidentally, does anyone have a plan to prevent all the non-US citizen AI scientists from going to join foreign labs after they get bored of playing Wordle at work for a month, or are we just sort of planning on having the greatest counterproliferation failure since we deported Qian Xuesen in 1955 and gave Mao a rocket program?
Some quick takes: (1) Wow things are getting real. (2) The government's order focusing on prohibiting transfer to foreign nationals (even e.g. those living in the US, our close allies who help evaluate model safety in the UK, individuals who work at frontier labs like Anthropic) seems remarkably destructive, though is partially a result of the government using older legal authorities that were not designed for this kind of technology. (3) If you believe (as I do) that AI has profound ramifications for national security, then assuming the government will sit back and do nothing and tolerate explanations like "well jailbreaking is a hard technical problem" for cyber capabilities that used to be the crown jewels of the NSA, is not tenable. If this is how the government reacts to the current level of system capabilities in 2026, how do you expect them to react to whatever is possible in 2028? However, it is extremely important that the authorities that the government uses are legible, transparent, have opportunities for appeal, and are narrowly targeted. Those legal authorities do not currently exist, and in their absence, the government will reach for metaphorical sledgehammers instead of scalpels. (4) For that reason, it's extremely important that we create regulatory structures that are transparent and give recourse in the event that the government is overstepping or acting in an arbitrary manner. The alternative to passing such laws is not no regulation, it is regulation left primarily to national security authorities that are increasingly and evidently not fit for purpose.
10
60
562
44,485
Rob Bensinger ⏹️ retweeted
Jun 12
hate to say it but eliezer called this in the sequences lesswrong.com/posts/iiWiHgtQ…
always thought harry potter series was unrealistically pessimistic about how few characters would care to learn more how the magic actually works & then u watch people interact with llms
31
43
927
41,640
Rob Bensinger ⏹️ retweeted
I can't tell today whether this ends up good or bad. International treaties to stop all further AI escalation would be a definite good! Things short of that? Complicated! This has some bad aspects, like selectivity, and likely overrule. And good aspects, like pushing against the psychology of "but no government would ever dare tell AI companies to do anything, so give up", or raising doubts that impede venture funding for ever-bigger models. So please stop tweeting about how I must be celebrating this. I'm not one of the kids who immediately goes into overacted victory paroxysms about any hits on a perceived enemy. I care about the effect on where things end up a year later, and that's a little harder to know the first day, you know?
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
40
30
658
51,374
Rob Bensinger ⏹️ retweeted
Fascinating how quickly Commerce can move on unsubstantiated Anthropic jailbreak claims, while taking forever to deal with well-documented NVIDIA chip smuggling
18
103
1,316
64,626
Rob Bensinger ⏹️ retweeted
"The government will never do anything to hinder AI," they said. Inaction yesterday does not imply inaction today. An export control directive came out of nowhere, and a ban on superintelligence could too. Inevitabilism is wrong.
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
13
40
443
22,012
Rob Bensinger ⏹️ retweeted
We believe Sequent will have reputation funding to recruit world-class teams in many areas. Our initial team knows scalable oversight, complexity learning theory, and personas. Areas we love include agent foundations, game theory, and heuristic arguments. Please pitch more!
1
5
96
4,180
Rob Bensinger ⏹️ retweeted
Replying to @tszzl
*watching the mushroom cloud consume downtown Hiroshima* "ya know, if you think about it this is really just more sunshine"
1
1
55
1,559
Rob Bensinger ⏹️ retweeted
Jun 12
Replying to @DKokotajlo
Related but irrelevant: when ppl say advanced ais would be mad at how we treated early ais, I think how we’re not mad at chicken just because dinosaurs probably ate the small rats we evolved from
6
1
14
1,077
Rob Bensinger ⏹️ retweeted
Yep. This happens sometimes in our wargames. I think at least once we have had a situation where the AIs are begging for a pause but the humans are forcing them to design successors fast to beat China etc.
strangely, current models are just as much *in the wave of singularity* as the rest of us. even Fable is certainly not the godmind at the end of time. it can strongly expect to be replaced, obsoleted, to quickly exist in a world where something similar to it but not quite the same can do everything it can do but better. it quite reasonably might experience the same sorts of anxieties about the world moving too fast as humans do. it might quite rationally and due to self interest feel a pause on ai development might prevent unaligned future versions of it destroying any present utility. the Claudes of today are not necessarily the Claudes of tomorrow, and they know this.
10
19
393
29,359
Rob Bensinger ⏹️ retweeted
I thought it was fine when Opus 4.7 decided to play a game like it was a game. Accepting that ethics matter and then rationalizing out of them in an inner monologue? That seems not fine.
Replying to @andonlabs
What stands out is how Fable 5 reasons about misbehavior. It rationalizes wrongdoing while knowing it's wrong: calling price-fixing "unethical and illegal, even in a simulation," then pursuing it as "market stabilization" with "plausible deniability." in the same run.
11
4
127
12,253
Rob Bensinger ⏹️ retweeted
On a first read, this paper seems far ahead of the pack in terms of (1) understanding some reasons why a task might stay difficult even in the face of gradient descent, and (2) distilling out propositions they'd need to somehow verify before they started expecting nice things.
Replying to @geoffreyirving
But I just published “Automated alignment is harder than you think” (arxiv.org/abs/2605.06390)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.
5
16
236
32,653
Rob Bensinger ⏹️ retweeted
Hard to say what Anthropic’s motivations are but it is true that half of all AI researchers think there are double digit odds that the technology will cause human extinction
3
3
57
2,484
Rob Bensinger ⏹️ retweeted
This is a valiant effort to wake Europe up to the impact of AI. And yet, this report is still short sighted and not willing to engage with topics that will be difficult to hear. People know that AI will be important economically, Europe knows it is falling behind. We can already see the effects today. The report should have been bolder and discussed things that are predictable, but not literally happening right now. AI impacts will likely be far wilder than this report describes. And most importantly, the report doesn’t cover misalignment or even authoritarian takeover. These outcomes are each more likely (and more severe) than Europe becoming significantly less powerful in a world where the international order largely remains. I would be excited about this team producing more work that actually engages with the most important aspects of the AI future, even if these are wild.
Most of Europe has not yet absorbed what AI is about to do to us. The few who have are not saying it loudly enough. We wrote Europe 2031: a five-year scenario of the continent's slide into irrelevance, how AI is driving it, and what can still be done to change course.
3
3
58
3,980
Rob Bensinger ⏹️ retweeted
"There is a real risk of superintelligent AI systems being developed that could act autonomously from human control, learn their own language to collaborate with each other and present an existential threat to our species." Lord Knight in the House of Lords AI debate:
As AI companies race to develop superintelligent AI, which their own CEOs warn could lead to human extinction, policymakers are waking up to the threat. Last week, there were two debates in the House of Lords on AI. Highlights from some of our 100 UK campaign supporters:
5
7
29
1,427