Computer Science PhD Student at Northeastern University

Joined December 2014
11 Photos and videos
Pinned Tweet
Can you solve this algebra puzzle? 🧩 cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️
8
49
322
55,935
Eric Todd retweeted
Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)
26
107
879
170,755
Eric Todd retweeted
🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇
2
30
117
21,740
Eric Todd retweeted
The New England Mechanistic Interpretability (NEMI) workshop is coming to BU on Aug. 14! Join us for talks, a panel, food, and plenty of opportunities to connect with the many great researchers in the area. Register and help spread the word!
🧠🤖 The 2026 New England Mechanistic Interpretability (NEMI) Workshop will be Aug. 14 at Boston University! Help spread the word and join the New England mech interp community! Registration and submission info in thread:👇
11
37
4,779
Eric Todd retweeted
Scaling laws describe how loss changes with scale. Do neurons inside models change predictably too? We study vision and language models up to 30B params and find systematic scaling in neuron universality, specialization, and selectivity. Paper code: avdravid.github.io/rosetta-n… 1/n
13
83
416
202,433
Eric Todd retweeted
Can you tell when an AI model is lying? Announcing Aletheia's Quest, an AI lie detection challenge running this summer, organized by @cadenza_labs and @ndif_team. Multiple model organisms to interrogate and probe, $50K prize pool, no local GPU required.
1
17
47
11,165
Eric Todd retweeted
At CVPR this week for a talk on neural geometry of large vision models. If you’re interested in interpretability or joining @GoodfireAI, come say hi. 🤠
🧵HOW speaker spotlight @CVPR ! Next up we have @thomas_fel_ from @GoodfireAI 🔥 Thomas will talk "Neural Geometry in Large Vision Models", diving into the structure hidden inside vision models. 📅 June 4 @ Room 1Ef | 10:30–11:00 AM
2
15
89
7,984
Eric Todd retweeted
A popular way to use the latest FLUX model is to provide a reference image alongside the text prompt to guide the model. Surprisingly, in most cases, the model first writes the reference image information into the text tokens; only then does it use that to generate the image🧵👇
FLUX.2's @bfl_ml text tokens aren't just holding your prompt. During image editing, they absorb reference image content, and some of that absorbed content, like color and style, causally drives the output appearance. New paper 🧵👇
1
3
12
1,056
Eric Todd retweeted
In 2023, we released ESD and UCE, unlearning methods for text-to-image diffusion models. After 3 years of research: Tomorrow, I will be presenting why "Unlearning is not the goal" at Machine Unlearning for Vision workshop @CVPR Hear me out 👀 🗓️: June 3rd, 2:20pm 📍: Room 1AB
Can AI forget? 🧠❌ Join MUV at @CVPR 26 in Denver! 🏔️ Speakers from @GoogleDeepMind, @MIT_CSAIL & more. 📝 Submit by March 15! Organizers: @SapienzaRoma, @MIT, @TU_Muenchen, @_italai and MPI. Details: machine-unlearning-for-visio… #CVPR2026 #AI #ComputerVision
5
19
3,041
Eric Todd retweeted
The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)
Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵
25
151
1,017
173,435
Eric Todd retweeted
Neural networks have beautiful feature geometry, but do they have mechanisms that actually interface with those structures? At @GoodfireAI this spring, we discovered one: a re-usable addition mechanism that reads/writes to Fourier features from prior work. 🧵
Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)
7
40
249
63,279
Interpretability provides a toolset for understanding how and why LMs behave in certain ways. This survey proposes a perspective on interpretability research grounded in causal mediation analysis: doi.org/10.1162/COLI.a.572 #NLProc #CLJournal @SunJiuding @ericwtodd
8
54
4,113
Eric Todd retweeted
1/ (New paper!) If swapping the gender in an input prompt makes the AI model give a different answer it means that it has to have a gender bias, right? Wrong. 🧵on counterfactual prompting for LLM evals: Paper: arxiv.org/abs/2605.01048
3
24
293
306,995
Eric Todd retweeted
The Teleport Contest is open. Port NetHack 5.0 from C to JavaScript, bit-exactly. Same screen, every keystroke. Any approach: LLM agents, hand-coded, transpiler, hybrid. Live leaderboard, two phases through December. mazesofmenace.ai/announcemen…
3
13
45
6,105
Eric Todd retweeted
NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today. nethack.org/common/index.htm… And ... it is a VERY cool large codebase to work with in the LLM era.
19
201
1,061
122,204
Eric Todd retweeted
I’m presenting this work today at #ICLR2026 at 3:15pm in Pavilion 4 #3914 Come say hi! ☺️
Can we find weight directions to modify LLM's behaviors? Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️ 🧵👇
1
4
33
4,155
Eric Todd retweeted
I'll be attending #ICLR2026 next week to present my work on In-Context Algebra! My poster will be on Fri, April 24 at 3:15-5:45PM at Pavilion 4 P4-#4011. If you're around, stop by and say hello! My DMs are open if you want to connect or meet up in Rio!
Can you solve this algebra puzzle? 🧩 cb=c, ac=b, ab=? A small transformer can learn to solve problems like this! And since the letters don't have inherent meaning, this lets us study how context alone imparts meaning. Here's what we found:🧵⬇️
2
14
548
Eric Todd retweeted
2026 is a whirlwind year for AI. Underlying it all: the greatest scientific mystery of our age. How does a neural network think? I talked w @oliver_whang22 in NYTimes Magazine, on how AI interpretability is a tangle of structure waiting to be unraveled: nytimes.com/2026/04/15/magaz…
1
5
53
3,264
Eric Todd retweeted
Excited to be attending #ICLR in person this year! I’ll be presenting 3 works across the main conference and workshops. If you’re around, please stop by, say hi, and feel free to reach out if you’d like to connect!
3
1
15
1,260
Eric Todd retweeted
New paper: LLMs encode harmful content generation in a distinct, unified mechanism Using weight pruning, we find that harmful generation depends on a tiny subset of the weights that are shared across harm types and separate from benign capabilities. 🧵
7
47
250
38,862