Tal Haklay

Tal Haklay

27 Photos and videos

Tweets

Pinned Tweet

Tal Haklay @tal_haklay

May 14

Check out our new paper 🔥 It’s been so much fun working on this project!

Goodfire

@GoodfireAI

May 14

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

0:09

1,516

Yaniv Nikankin

Tal Haklay retweeted

Yaniv Nikankin @YNikankin

Do reasoning models internally represent abstract properties of their own chain of thought (such as "which steps are important"), while not surfacing these properties in their tokens?

1,014

Santiago Aranguri

Tal Haklay retweeted

Santiago Aranguri

@santiaranguri

Jun 12

Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.

1,723

Ekdeep Singh Lubana

Tal Haklay retweeted

Ekdeep Singh Lubana @EkdeepL

Jun 11

Super excited about this work! This paper was driven by a claim I've been making to anyone who'll listen: "interpretability is the language of data". (1/3)

Goodfire

@GoodfireAI

Jun 11

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

0:34

138

12,414

Goodfire

Tal Haklay retweeted

Goodfire

@GoodfireAI

Jun 11

0:34

108

886

172,361

Hadas Orgad

Tal Haklay retweeted

Hadas Orgad @OrgadHadas

Jun 10

📢 We’re looking for reviewers for the Actionable Interpretability workshop @ActInterp! If you’re interested in helping review submitted papers, please sign up here: forms.gle/VpLJpkM6zw3V8bX56 Your expertise would be greatly appreciated!

2,002

Thomas Fel

Tal Haklay retweeted

Thomas Fel

@thomas_fel_

Jun 3

At CVPR this week for a talk on neural geometry of large vision models. If you’re interested in interpretability or joining @GoodfireAI, come say hi. 🤠

How Do Vision Models Work? @ CVPR2026 (Prev: MIV)@how_cvpr2026

May 30

🧵HOW speaker spotlight @CVPR ! Next up we have @thomas_fel_ from @GoodfireAI 🔥 Thomas will talk "Neural Geometry in Large Vision Models", diving into the structure hidden inside vision models. 📅 June 4 @ Room 1Ef | 10:30–11:00 AM

8,007

Tamar Rott Shaham

Tal Haklay retweeted

Tamar Rott Shaham @TamarRottShaham

Jun 3

Looking forward to giving a keynote at the @WiCVworkshop dinner tonight! If you're attending, come say hi!

Tamar Rott Shaham @TamarRottShaham

Jun 2

On my way to Denver for #CVPR2026, DM me if you want to connect. See you at our workshop on Thursday! x.com/i/status/2017429823983…

1,338

Ekdeep Singh Lubana

Tal Haklay retweeted

Ekdeep Singh Lubana @EkdeepL

Jun 1

Very excited to have this paper out! We show by having more parameters, larger models see reduced interference between updates. This allows them to retain memories of rarely observed samples of a task, eventually allowing them to learn even the tail-end of the distribution. (1/3)

0:09

Christopher Potts

@ChrisGPotts

Jun 1

We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.

ALT Title card for a research paper. The title reads "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention." Authors listed: Jing Huang, Daniel Wurgaft, Rachit Bansal, Laura Ruis, Naomi Saphra, David Alvarez-Melis, Andrew Lampinen, Christopher Potts, and Ekdeep Singh Lubana. A Goodfire logo appears below the names. Author affiliations: Stanford University, Kempner Institute at Harvard University, MIT, and Anthropic.

184

16,286

Goodfire

Tal Haklay retweeted

Goodfire

@GoodfireAI

Jun 1

New research from Goodfire and collaborators: why do larger models learn more tasks? (spoiler: it’s bottlenecked by data)

Christopher Potts

@ChrisGPotts

Jun 1

179

21,569

Christopher Potts

Tal Haklay retweeted

Christopher Potts

@ChrisGPotts

Jun 1

136

914

137,778

Tal Haklay

Tal Haklay @tal_haklay

May 27

Submit your work! The 2nd Workshop on 𝐀𝐜𝐭𝐢𝐨𝐧𝐚𝐛𝐥𝐞 𝐈𝐧𝐭𝐞𝐫𝐩𝐫𝐞𝐭𝐚𝐛𝐢𝐥𝐢𝐭𝐲 will be held at COLM 2026 in San Francisco! Submission Deadline: June 21, 2026 @ActInterp

1,381

Tal Haklay

Tal Haklay @tal_haklay

May 27

CFP is here >> actionable-interpretability.…

Call for Papers

Deadline: June 21 2026

actionable-interpretability.github.io

211

Tamar Rott Shaham

Tal Haklay retweeted

Tamar Rott Shaham @TamarRottShaham

May 26

What is the role of text tokens in diffusion? Do they carry anything beyond the text prompt? We study this in FLUX.2 @bfl_ml for the task of reference-guided generation, and found that text tokens hold visual information from the reference image!

Chris Ge @ChrisGe05

May 26

FLUX.2's @bfl_ml text tokens aren't just holding your prompt. During image editing, they absorb reference image content, and some of that absorbed content, like color and style, causally drives the output appearance. New paper 🧵👇

0:20

2,651

Chris Ge

Tal Haklay retweeted

Chris Ge @ChrisGe05

May 26

0:20

203

26,076

Hadas Orgad

Tal Haklay retweeted

Hadas Orgad @OrgadHadas

May 23

132

13,897

Goodfire

Tal Haklay retweeted

Goodfire

@GoodfireAI

May 21

SAEs remain useful, as long as we’re aware of their limitations. And we have new techniques in the works that recover manifolds more directly, allowing us to understand models better and control them more effectively! Read the full post here: goodfire.ai/research/can-sae…

Can SAEs Capture Neural Geometry?

AKA, how to use straight lines to capture curved geometry in neural networks

goodfire.ai

4,461

Goodfire

Tal Haklay retweeted

Goodfire

@GoodfireAI

May 21

This helps explain why SAEs can feel both illuminating and unsatisfying! Looking at SAE features one-by-one is like trying to understand the proverbial elephant by talking with each of the blind men: each label may be locally accurate, but the global structure is missing. (5/7)

3,103

Ryan Peters

Tal Haklay retweeted

Ryan Peters

@ryanpirl

May 21

This would provide a great explanation for why there is so much redundancy in SAE features at any given layer (observation made by @Sauers_ ). For example, if you search through the Qwen3-4b transcoder feature labels provided by Neuronpedia, there are 139 features generically related to the concept of 'color' in just layer 14. There are even more if you consider specific colors such as 'blue' or 'green', and this redundancy is repeated across layers... making it very annoying to interpret raw circuit graphs without performing some form of clustering.

Goodfire

@GoodfireAI

May 21

Replying to @GoodfireAI

We now know that models think using curved shapes, not just straight lines. But SAE features can still give us a window into neural geometry. How? We show that related SAE features often “tile” manifolds, pointing to different (but overlapping) regions on the curve. (4/7)

19,233

Goodfire

Tal Haklay retweeted

Goodfire

@GoodfireAI

May 21

Consider the parable of the blind men encountering an elephant for the first time. Each touches a different part—the trunk, the tusk, the leg—and comes to a different conclusion about the elephant: one says it's like a tree, another says it’s like a rope, and so on. (2/7)

5,171

Tal Haklay

Tal Haklay @tal_haklay

May 22

You should read our paper — and stay tuned 👀

Goodfire

@GoodfireAI

May 21

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

0:21

5,541