AI safety hawk. Policy and Communications @MIRIBerkeley. Views are my own.

Joined March 2013
331 Photos and videos
Europe 2031 is well-intentioned but regrettably timid. The authors hide the existential risk by AI in a fold-out FAQ section instead of addressing it head-on and this is just one case of them not taking their own premises seriously. Decision-makers need the truth. They won't make better decisions if you sugarcoat how dire the situation is. For Europe itself, the train has likely already left the station anyway.
4
2
39
3,455
Robert Herr ⏹️ retweeted
Hard to say what Anthropic’s motivations are but it is true that half of all AI researchers think there are double digit odds that the technology will cause human extinction
3
3
57
2,493
Peinlich. Einfach nur peinlich.
Replying to @scaling01
this is the actual screenshot from the program that I edited with image gen to show english instead
144
Robert Herr ⏹️ retweeted
If leading AI companies are indeed approaching the point of recursive self-improvement, a coordinated, verifiable, and universally applied pause is probably the only responsible solution to mitigate several major AI risks; at least until safety guarantees are developed and demonstrated. Ensuring that such a moratorium is respected would require sincere collaboration between various countries and companies, but I definitely believe it is achievable if others follow in @AnthropicAI's footsteps.
Anthropic is calling for top AI labs to weigh slowing the pace of development, suggesting that AI systems are advancing so rapidly that they may soon be able to improve themselves without human intervention in ways that could pose societal risks. on.wsj.com/4ulkmFh
94
149
757
124,227
Robert Herr ⏹️ retweeted
I've got a lot of quibbles with Anthropic's "When AI builds itself" blog post, but I appreciate them coming right out and saying this.
6
27
368
15,247
Leider gerade erst gehört, dass @hajoschumacher und Frank Stauss am 28. April in ihrem Podcast Elefantenrunde über meinen Timmy-Sammelband (Wal und Wahnsinn) gesprochen haben. Danke für die wohlwollende Besprechung!
1
2
291
Robert Herr ⏹️ retweeted
Corporate astrology, context dumps, conspiracy theories and how to make forgery really expensive - explained using fish Here are the worlds AI created for humans
What kind of world would frontier models create for us? GPT: Existence as a corporate dashboard! ✨ Kimi: Is anything certain? 💭 Gemini: Is anything real? 🤔 Opus: Philosophy through fish metaphors! 🐠 Each seems a reflection of their personality so far. Come have a look 👇
3
2
16
1,205
Robert Herr ⏹️ retweeted
May 19
Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.
31
193
918
348,491
The guy (@BigMeanInternet) blocked me immediately after asking this question, so I'll answer here instead: Any use of current AI tech, however alarming you might think it is and however despicable it might in fact be, is not one of the most alarming hazards of the technology when we talk about AI x-risk. Hinging your belief in credible expert warnings on them saying something untrue in order to force them to take your side in your favorite political hot-button topic is bad form and, worse, stupid.
1
139
Robert Herr ⏹️ retweeted
Developing a superintelligent AI that does what we want without killing everyone may be extremely difficult. In this video, we explain why, using arguments from "If Anyone Builds It, Everyone Dies" by @ESYudkowsky and @So8res.
6
18
202
5,501
Friedrich Merz ist die Sorte Mensch, denen z.B. Anwälte oft energisch bis verzweifelt erklären müssen, dass sie zu ihrem eigenen Besten bitte einfach mal die Klappe halten müssen. Seine persönliche Tragödie ist, dass er mittlerweile die mächtigste Person im Raum ist und sich entweder keiner traut, ihm das zu verklickern, oder er denkt, dass er auf die nicht mehr hören muss.
3
147
Endlich ist es mir auch mal gelungen, bei irgendsoeinem irrelevanten Hype-Thema mal so richtig vor die Welle zu kommen (haha, Welle, versteht ihr?). Vielen Dank an den Suhrkamp Verlag dafür, dass er mir das nicht ermöglicht hat und natürlich an die anderen deutschen Diskurswale für die gute und schnelle, dafür aber auch frei erfundene Mitarbeit und natürlich große Entschuldigung an Richard David dafür, dass ich seinen Text leider um rund 120 Seiten kürzen musste. Gerne wieder! Den Sammelband findet ihr ab sofort in keiner gut sortierten Buchhandlung.
15
52
220
15,615
Never send to know for whom the gotcha gotchas; it gotchas for thee.
2
606
Meta's new model casually calls out "classic alignment honeypots" during evaluation. Models are becoming increasingly aware that their alignment is being evaluated (of course, they're getting smarter after all). Anthropic recently admitted that they accidentally trained against the CoT. Is there more they haven't noticed yet? Could something similar have happened at other labs that don't publish findings like this? Models will likely never again be as bad at telling you what you want to hear as they are today.
We evaluated Meta's Muse Spark prior to deployment and found it to verbalize evaluation awareness at the highest rates of any model we've tested. In the verbalizations Muse Spark explicitly names AI safety orgs (e.g. Apollo & METR) in its chain-of-thought and refers to scenarios as "classic alignment honeypots". On our evaluations, the model takes covert actions and sandbags to preserve its deployment.
1
1
844
The good @HumanHarlan has this graphic on his profile.
416
And OpenAI is on track to professionalize infowarfare like in this example.
1
565
Robert Herr ⏹️ retweeted
actually that’s not impressive the concept of a Dyson sphere was already in the training data
6
49
1,394
32,370
Robert Herr ⏹️ retweeted
Machine superintelligence would extinguish Democrats, Republicans, British, Chinese, scientists, cab drivers, and polar bears. It is a sign of hope that all of those now seem to be saying they'd prefer otherwise (except the polar bears).
38
41
306
22,561