Eric Todd

Eric Todd

11 Photos and videos

Tweets

Pinned Tweet

Eric Todd @ericwtodd

Jan 22

Can you solve this algebra puzzle? 🧩 cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️

322

55,935

Goodfire

Eric Todd retweeted

Goodfire

@GoodfireAI

Jun 11

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

0:34

107

879

170,755

Gabriel Franco

Eric Todd retweeted

Gabriel Franco @gvsfranco

Jun 10

🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇

117

21,740

Aaron Mueller

Eric Todd retweeted

Aaron Mueller @amuuueller

Jun 10

The New England Mechanistic Interpretability (NEMI) workshop is coming to BU on Aug. 14! Join us for talks, a panel, food, and plenty of opportunities to connect with the many great researchers in the area. Register and help spread the word!

Gabriel Franco @gvsfranco

Jun 10

4,779

Amil Dravid

Eric Todd retweeted

Amil Dravid

@_AmilDravid

Jun 5

Scaling laws describe how loss changes with scale. Do neurons inside models change predictably too? We study vision and language models up to 30B params and find systematic scaling in neuron universality, specialization, and selectivity. Paper code: avdravid.github.io/rosetta-n… 1/n

0:14

416

202,433

NDIF

Eric Todd retweeted

NDIF @ndif_team

Jun 4

Can you tell when an AI model is lying? Announcing Aletheia's Quest, an AI lie detection challenge running this summer, organized by @cadenza_labs and @ndif_team. Multiple model organisms to interrogate and probe, $50K prize pool, no local GPU required.

11,165

Thomas Fel

Eric Todd retweeted

Thomas Fel

@thomas_fel_

Jun 3

At CVPR this week for a talk on neural geometry of large vision models. If you’re interested in interpretability or joining @GoodfireAI, come say hi. 🤠

How Do Vision Models Work? @ CVPR2026 (Prev: MIV)@how_cvpr2026

May 30

🧵HOW speaker spotlight @CVPR ! Next up we have @thomas_fel_ from @GoodfireAI 🔥 Thomas will talk "Neural Geometry in Large Vision Models", diving into the structure hidden inside vision models. 📅 June 4 @ Room 1Ef | 10:30–11:00 AM

7,984

Rohit Gandikota

Eric Todd retweeted

Rohit Gandikota @rohitgandikota

May 26

A popular way to use the latest FLUX model is to provide a reference image alongside the text prompt to guide the model. Surprisingly, in most cases, the model first writes the reference image information into the text tokens; only then does it use that to generate the image🧵👇

Chris Ge @ChrisGe05

May 26

FLUX.2's @bfl_ml text tokens aren't just holding your prompt. During image editing, they absorb reference image content, and some of that absorbed content, like color and style, causally drives the output appearance. New paper 🧵👇

0:20

1,056

Rohit Gandikota

Eric Todd retweeted

Rohit Gandikota @rohitgandikota

Jun 3

In 2023, we released ESD and UCE, unlearning methods for text-to-image diffusion models. After 3 years of research: Tomorrow, I will be presenting why "Unlearning is not the goal" at Machine Unlearning for Vision workshop @CVPR Hear me out 👀 🗓️: June 3rd, 2:20pm 📍: Room 1AB

Machine Unlearning for Vision @ CVPR26 @muv_workshop

Feb 17

Can AI forget? 🧠❌ Join MUV at @CVPR 26 in Denver! 🏔️ Speakers from @GoogleDeepMind, @MIT_CSAIL & more. 📝 Submit by March 15! Organizers: @SapienzaRoma, @MIT, @TU_Muenchen, @_italai and MPI. Details: machine-unlearning-for-visio… #CVPR2026 #AI #ComputerVision

3,041

Goodfire

Eric Todd retweeted

Goodfire

@GoodfireAI

May 21

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

0:21

Goodfire

@GoodfireAI

May 7

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

0:08

151

1,017

173,435

Sheridan Feucht

Eric Todd retweeted

Sheridan Feucht @sheridan_feucht

May 14

Neural networks have beautiful feature geometry, but do they have mechanisms that actually interface with those structures? At @GoodfireAI this spring, we discovered one: a re-usable addition mechanism that reads/writes to Fourier features from prior work. 🧵

Goodfire

@GoodfireAI

May 14

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

0:09

249

63,279

Computational Linguistics Journal

Eric Todd retweeted

Computational Linguistics Journal @CompLingJournal

May 13

Interpretability provides a toolset for understanding how and why LMs behave in certain ways. This survey proposes a perspective on interpretability research grounded in causal mediation analysis: doi.org/10.1162/COLI.a.572 #NLProc #CLJournal @SunJiuding @ericwtodd

4,113

Zihao (Gavin) Yang

Eric Todd retweeted

Zihao (Gavin) Yang @ZihaoGavinYang

May 11

1/ (New paper!) If swapping the gender in an input prompt makes the AI model give a different answer it means that it has to have a gender bias, right? Wrong. 🧵on counterfactual prompting for LLM evals: Paper: arxiv.org/abs/2605.01048

293

306,995

David Bau

Eric Todd retweeted

David Bau @davidbau

May 6

The Teleport Contest is open. Port NetHack 5.0 from C to JavaScript, bit-exactly. Same screen, every keystroke. Any approach: LLM agents, hand-coded, transpiler, hybrid. Live leaderboard, two phases through December. mazesofmenace.ai/announcemen…

6,105

David Bau

Eric Todd retweeted

David Bau @davidbau

May 3

NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today. nethack.org/common/index.htm… And ... it is a VERY cool large codebase to work with in the LLM era.

201

1,061

122,204

Constanza Fierro

Eric Todd retweeted

Constanza Fierro @constanzafierro

Apr 25

I’m presenting this work today at #ICLR2026 at 3:15pm in Pavilion 4 #3914 Come say hi! ☺️

Constanza Fierro @constanzafierro

11 Nov 2025

Can we find weight directions to modify LLM's behaviors? Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️ 🧵👇

4,155

Eric Todd

Eric Todd retweeted

Eric Todd @ericwtodd

Apr 18

I'll be attending #ICLR2026 next week to present my work on In-Context Algebra! My poster will be on Fri, April 24 at 3:15-5:45PM at Pavilion 4 P4-#4011. If you're around, stop by and say hello! My DMs are open if you want to connect or meet up in Rio!

Eric Todd @ericwtodd

Jan 22

548

David Bau

Eric Todd retweeted

David Bau @davidbau

Apr 20

2026 is a whirlwind year for AI. Underlying it all: the greatest scientific mystery of our age. How does a neural network think? I talked w @oliver_whang22 in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled: nytimes.com/2026/04/15/magaz…

We Don’t Really Know How A.I. Works. That’s a Problem.

For us to trust it on certain subjects, researchers in the growing field of interpretability might need to learn how to open the black box of its brain.

nytimes.com

3,264

Nikhil Prakash

Eric Todd retweeted

Nikhil Prakash @nikhil07prakash

Apr 17

Excited to be attending #ICLR in person this year! I’ll be presenting 3 works across the main conference and workshops. If you’re around, please stop by, say hi, and feel free to reach out if you’d like to connect!

1,260

Hadas Orgad

Eric Todd retweeted

Hadas Orgad @OrgadHadas

Apr 13

New paper: LLMs encode harmful content generation in a distinct, unified mechanism Using weight pruning, we find that harmful generation depends on a tiny subset of the weights that are shared across harm types and separate from benign capabilities. 🧵

250

38,862