Research Scientist @ Google DeepMind. Formerly Robotics, now AI Safety. Has a blog. Views are my own. "Adversarially disengaging Twitter profile"

Joined December 2012
3 Photos and videos
Pinned Tweet
19 Nov 2024
I'm on Bluesky now. I plan to cross-post blog posts to both platforms for the time being, we'll see about the other stuff. bsky.app/profile/alexirpan.b…

5
2,067
1. Obviously terrible to have a Molotov thrown against your house, not appropriate response 2. Of all analogies to make, "ring of power" is a choice, given the story's theme that the only way to stop the ring's destructive power is to destroy it. blog.samaltman.com/2279512
1
6
366
You know, when I switched into safety, I was a little worried it was too early. Between the decline of coding by hand, OpenClaw YOLOing, increasingly eval aware models, and DoD pressure to let AI be used for surveillance and autonomous weapons yeah It wasn't early
1
24
980
16 Nov 2025
I didn't know where this post was going when I started and I'm not sure where it went now that it ended, but that felt correct in some way. alexirpan.com/2025/11/16/aut…
1
3
501
First paper since switching into AI safety team🎉 We look at problems that could be solved if the model behaved consistently over a set of prompts, and tried training that in output space and internal activations. Both were effective. See thread or paper for details.
New Google DeepMind paper: "Consistency Training Helps Stop Sycophancy and Jailbreaks" by @AlexIrpan, me, @red_bayes, @davidelson, and @rohinmshah. (thread)
4
56
7,766
21 Oct 2025
> switch to AI safety > no safety papers to cite in reviewer profile > only get assigned robotics papers Apologies in advance as I try to crash course the past year in a few weeks...
7
820
Alex Irpan retweeted
A simple AGI safety technique: AI’s thoughts are in plain English, just read them We know it works, with OK (not perfect) transparency! The risk is fragility: RL training, new architectures, etc threaten transparency Experts from many orgs agree we should try to preserve it: 🧵
42
113
458
236,579
30 Jun 2025
AI numbers guide ElevenLabs: AI voice generation startup TwelveLabs: AI video understanding startup ThirteenAI: parked domain for AI agency startup 14ai: AI agent startup 15.ai: non-commercial My Little Pony voice generation One is more based than the rest.
7
737
"I don't play gacha games because they're a scam" vs "Let me do one more hyperparam sweep before giving up. One more prompt tuning run. I swear we'll beat baseline. I know it's gonna beat the baseline this time. It's gonna win. This time for sure."
2
1
24
1,122
27 Mar 2025
I guess Twitter's doing anime today
9
491
Alex Irpan retweeted
Q: How can we ensure robots behave properly at scale? A: Robot constitutions 📜! Q: How do we verify behavior in undesirable situations at scale? A: Generation! We release the ASIMOV Benchmark for Semantic Safety of robots at asimov-benchmark.github.io @GoogleDeepMind
1
7
44
8,820
Alex Irpan retweeted
17 Feb 2025
We're hiring! Join an elite team that sets an AGI safety approach for all of Google -- both through development and implementation of the Frontier Safety Framework (FSF), and through research that enables a future stronger FSF.
11
36
295
46,599
21 Jan 2025
I am now back from #MITMysteryHunt with no memory of anything besides Hunt from MLK weekend. Really this is probably for the best.
1
5
781