Pulkit Verma

Pulkit Verma

97 Photos and videos

Tweets

Been Kim retweeted

Pulkit Verma @pulkit_verma

Apr 17

The program for the #ICLR2026 Workshop "From Human Cognition to AI Reasoning" is now available. We have a fantastic lineup of talks. 🔗 hc-air.github.io/hcair26 Invited Speakers: @DocRachidAlami, @_beenkim, @ced_zhang Co-Organizers: @julie_a_shah, @sarath_ssreedh, @si_tulli

1,854

Been Kim

Been Kim

@_beenkim

Feb 4

Prompt engineering is still a black box. Why does changing X drastically change Y? Are there governing rules behind this evolution? Our new work proposes a simple way to uncover factors that might matter when refining prompts 👇

Neha Kalibhat @NehaKalibhat

Feb 4

Thrilled to share that our paper on "Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits" has been accepted at AISTATS 2026! 🚀🚀 Read more about how input mutations can be mapped to interpretable behavioral insights. arxiv.org/abs/2602.00092 🧵

7,970

Been Kim

Been Kim

@_beenkim

Jan 20

I got my account back! Thank you, first and foremost, to everyone—friends, GDM colleagues---who personally alerted me to this incident and retweeted that I'm hacked, as well as folks at X who helped me regain access. While this incident was terrible (I heard the scammers made huge money out of this), I feel incredibly lucky to have folks who cared♥️♥️♥️ (details of how this happened 👇)

174

22,946

Been Kim

Been Kim

@_beenkim

Jan 20

This scam was targeted, sophisticated, and used AI-generated content. I want to share what happened here so that no one else falls for this. 1. The scammers emailed me (bypassing my spam box) citing a recent tweet of mine with pictures (holding a NeurIPS cup) and claiming a copyright infringement investigation was underway. The human brain is gullible when we are wrongfully accused; the only thing I was thinking was how I was going to argue the case. I did not check who sent the email (it was notify@compliancereport-x.com). 2. Within minutes, the email on the account was changed, and I lost control. 3. They created a fake GitHub repo with faked commits. It turns out that on GitHub, anyone can commit anything claiming to be anyone as long as they have the email address and handles. They cited this repo, where apparently I’ve been "committing" for two weeks. 4. They struck on a Saturday morning/long weekend. They know response times for support (and your own attention span) are lower. 5. They customized all the tweets, likely with AI, to mention interpretability, Google Brain, and how it all led to founding a crypto company of my own. The tweets had a vibe that actually sounded like me. ⠀After filing a complaint with X and connecting with folks who work there, I was able to regain access in a few days. On one hand, I was relieved that the content of the tweets was so out of the ordinary that folks who know me realized my account was hacked. On the other hand, I feel terrible for those who fell for this and potentially suffered financial consequences. As a result of this, I’m considering banning myself from checking emails on my phone. The problem was partly that I was multitasking—it was a Saturday morning with the kids, and I was busy. I’ve learned my lesson the hard way. Thank you ♥️

182

26,832

Susan Zhang

Been Kim retweeted

Susan Zhang

@suchenzang

30 Oct 2022

What a privilege it is to have time as your most valuable currency.

132

Christopher Potts

Been Kim retweeted

Christopher Potts

@ChrisGPotts

10 Dec 2025

Safety-oriented interpretability researchers should be focused on AI systems, not individual model artifacts. A snippet from the NeurIPS CogInterp workshop panel on Sunday:

0:37

167

16,354

Christopher Potts

Been Kim retweeted

Christopher Potts

@ChrisGPotts

2 Dec 2025

This post seems to describe substantially the same view that I offer here: web.stanford.edu/~cgpotts/bl… Why are people describing the GDM post as concluding that mech-interp is a failed project? Is it the renaming of the field and constant talk of "pivoting"?

Neel Nanda

@NeelNanda5

1 Dec 2025

The GDM mechanistic interpretability team has pivoted to a new approach: pragmatic interpretability Our post details how we now do research, why now is the time to pivot, why we expect this way to have more impact and why we think other interp researchers should follow suit

126

32,017

Been Kim

Been Kim

@_beenkim

7 Dec 2025

Tomorrow 9:30am #NeurIPS2025 Room 30A-E I'll talk about " 📈Towards Pareto frontier of interpretability: 15 years of interpretability research in 15 mins"🚅 @ mech interp workshop mechinterpworkshop.com/

12,184

Been Kim

Been Kim

@_beenkim

6 Dec 2025

Take that @doomie Samy Bengio! Hehehe

103

36,986

Been Kim

Been Kim

@_beenkim

5 Dec 2025

Our work out there in the wild 🥹

Zi Wang, Ph.D.@ziwphd

3 Dec 2025

🔥 Proactive Co-Creator is officially LIVE in @GoogleAIStudio! Stop guessing prompts. Start collaborating. Use it now to remix ideas and generate images, stories, and video with an AI that proactively helps you create. 🔗 Try it here: aistudio.google.com/apps/bun… 📍 At #NeurIPS2025? Come see the live demo TODAY (Dec 3) 9AM - 1:30PM | Google Booth #1533 (Kiosk 3) 🧠 Our research @GoogleDeepMind : We’re turning theory into practice. Read the papers behind the tech: Concept Edits (Tech Report): storage.googleapis.com/conce… Proactive Agents (ICML 25'): arxiv.org/abs/2412.06771 QuestBench (NeurIPS 25'): arxiv.org/abs/2503.22674

8,178

Zi Wang, Ph.D.

Been Kim retweeted

Zi Wang, Ph.D.@ziwphd

3 Dec 2025

13,574

Stanford NLP Group

Been Kim retweeted

Stanford NLP Group

@stanfordnlp

4 Dec 2025

Awesome @NeurIPSConf keynote this morning by @YejinChoinka on The Art of (Artificial) Reasoning – and her broader thoughts and wishes on the future of Artificial Intelligence neurips.cc/virtual/2025/invi…

101

12,672

Been Kim

Been Kim

@_beenkim

5 Dec 2025

1/8 Pareto Frontier 🤠for Human-centered AI 📈: We all want to build AI that is good for humans, but the path is often paralyzed by complexity. Either “oh my god, it’s too complicated😱” or delusional “I have a warm and fuzzy feeling of understanding 🥴”? "It’s hard because it depends.🤷" is the enemy of progress. We need a Pareto Frontier for Human-centered AI. 🧵👇

37,904

more replies

Been Kim

Been Kim

@_beenkim

5 Dec 2025

8/8 Making AI benefit humans takes a village. 🌍 But a village needs a shared language. Let's stop guessing and start measuring the frontier.📷 a short write-up: medium.com/@beenkim/the-pare…

The Pareto Frontier of Human-Centered AI

We all want to build AI that is good for humans, but the path is often paralyzed by complexity. Not only is human evaluation a lot of work…

medium.com

1,907

Been Kim

Been Kim

@_beenkim

5 Dec 2025

Add: 9:30am on Sunday at Neurips, i'll touch upon this at the mech interp workshop keynote mechinterpworkshop.com/

1,259