Research Scientist at Google DeepMind, PhD from MIT. Make machines empower people.

Joined August 2011
97 Photos and videos
Been Kim retweeted
The program for the #ICLR2026 Workshop "From Human Cognition to AI Reasoning" is now available. We have a fantastic lineup of talks. 🔗 hc-air.github.io/hcair26 Invited Speakers: @DocRachidAlami, @_beenkim, @ced_zhang Co-Organizers: @julie_a_shah, @sarath_ssreedh, @si_tulli
1
15
1,854
Prompt engineering is still a black box. Why does changing X drastically change Y? Are there governing rules behind this evolution? Our new work proposes a simple way to uncover factors that might matter when refining prompts 👇
Thrilled to share that our paper on "Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits" has been accepted at AISTATS 2026! 🚀🚀 Read more about how input mutations can be mapped to interpretable behavioral insights. arxiv.org/abs/2602.00092 🧵
2
1
36
7,970
I got my account back! Thank you, first and foremost, to everyone—friends, GDM colleagues---who personally alerted me to this incident and retweeted that I'm hacked, as well as folks at X who helped me regain access. While this incident was terrible (I heard the scammers made huge money out of this), I feel incredibly lucky to have folks who cared♥️♥️♥️ (details of how this happened 👇)
24
6
174
22,946
This scam was targeted, sophisticated, and used AI-generated content. I want to share what happened here so that no one else falls for this. 1. The scammers emailed me (bypassing my spam box) citing a recent tweet of mine with pictures (holding a NeurIPS cup) and claiming a copyright infringement investigation was underway. The human brain is gullible when we are wrongfully accused; the only thing I was thinking was how I was going to argue the case. I did not check who sent the email (it was notify@compliancereport-x.com). 2. Within minutes, the email on the account was changed, and I lost control. 3. They created a fake GitHub repo with faked commits. It turns out that on GitHub, anyone can commit anything claiming to be anyone as long as they have the email address and handles. They cited this repo, where apparently I’ve been "committing" for two weeks. 4. They struck on a Saturday morning/long weekend. They know response times for support (and your own attention span) are lower. 5. They customized all the tweets, likely with AI, to mention interpretability, Google Brain, and how it all led to founding a crypto company of my own. The tweets had a vibe that actually sounded like me. ⠀After filing a complaint with X and connecting with folks who work there, I was able to regain access in a few days. On one hand, I was relieved that the content of the tweets was so out of the ordinary that folks who know me realized my account was hacked. On the other hand, I feel terrible for those who fell for this and potentially suffered financial consequences. As a result of this, I’m considering banning myself from checking emails on my phone. The problem was partly that I was multitasking—it was a Saturday morning with the kids, and I was busy. I’ve learned my lesson the hard way. Thank you ♥️
18
10
182
26,832
Been Kim retweeted
30 Oct 2022
What a privilege it is to have time as your most valuable currency.
4
7
132
Been Kim retweeted
Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:
6
18
167
16,354
Been Kim retweeted
This post seems to describe substantially the same view that I offer here: web.stanford.edu/~cgpotts/bl… Why are people describing the GDM post as concluding that mech-interp is a failed project? Is it the renaming of the field and constant talk of "pivoting"?

1 Dec 2025
The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit
4
20
126
32,017
7 Dec 2025
Tomorrow 9:30am #NeurIPS2025 Room 30A-E I'll talk about " 📈Towards Pareto frontier of interpretability: 15 years of interpretability research in 15 mins"🚅 @ mech interp workshop mechinterpworkshop.com/

5
8
80
12,184
6 Dec 2025
Take that @doomie Samy Bengio! Hehehe
12
5
103
36,986
5 Dec 2025
Our work out there in the wild 🥹
🔥 Proactive Co-Creator is officially LIVE in @GoogleAIStudio! Stop guessing prompts. Start collaborating. Use it now to remix ideas and generate images, stories, and video with an AI that proactively helps you create. 🔗 Try it here: aistudio.google.com/apps/bun… 📍 At #NeurIPS2025? Come see the live demo TODAY (Dec 3) 9AM - 1:30PM | Google Booth #1533 (Kiosk 3) 🧠 Our research @GoogleDeepMind : We’re turning theory into practice. Read the papers behind the tech: Concept Edits (Tech Report): storage.googleapis.com/conce… Proactive Agents (ICML 25'): arxiv.org/abs/2412.06771 QuestBench (NeurIPS 25'): arxiv.org/abs/2503.22674
3
27
8,178
Been Kim retweeted
🔥 Proactive Co-Creator is officially LIVE in @GoogleAIStudio! Stop guessing prompts. Start collaborating. Use it now to remix ideas and generate images, stories, and video with an AI that proactively helps you create. 🔗 Try it here: aistudio.google.com/apps/bun… 📍 At #NeurIPS2025? Come see the live demo TODAY (Dec 3) 9AM - 1:30PM | Google Booth #1533 (Kiosk 3) 🧠 Our research @GoogleDeepMind : We’re turning theory into practice. Read the papers behind the tech: Concept Edits (Tech Report): storage.googleapis.com/conce… Proactive Agents (ICML 25'): arxiv.org/abs/2412.06771 QuestBench (NeurIPS 25'): arxiv.org/abs/2503.22674
8
26
13,574
Been Kim retweeted
Awesome @NeurIPSConf keynote this morning by @YejinChoinka on The Art of (Artificial) Reasoning – and her broader thoughts and wishes on the future of Artificial Intelligence neurips.cc/virtual/2025/invi…
1
16
101
12,672
5 Dec 2025
1/8 Pareto Frontier 🤠for Human-centered AI 📈: We all want to build AI that is good for humans, but the path is often paralyzed by complexity. Either “oh my god, it’s too complicated😱” or delusional “I have a warm and fuzzy feeling of understanding 🥴”? "It’s hard because it depends.🤷" is the enemy of progress. We need a Pareto Frontier for Human-centered AI. 🧵👇
5
12
80
37,904
5 Dec 2025
8/8 Making AI benefit humans takes a village. 🌍 But a village needs a shared language. Let's stop guessing and start measuring the frontier.📷 a short write-up: medium.com/@beenkim/the-pare…
1
2
4
1,907
5 Dec 2025
Add: 9:30am on Sunday at Neurips, i'll touch upon this at the mech interp workshop keynote mechinterpworkshop.com/

1
4
1,259