Goodfire

Goodfire

154 Photos and videos

Tweets

Pinned Tweet

Goodfire

@GoodfireAI

May 7

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

0:08

306

1,674

11,183

3,157,803

Santiago Aranguri

Goodfire retweeted

Santiago Aranguri

@santiaranguri

Jun 12

Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.

1,443

Goodfire

Goodfire

@GoodfireAI

Jun 11

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

0:34

107

878

170,083

more replies

Goodfire

Goodfire

@GoodfireAI

Jun 11

If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design. Request access to Silico here: goodfire.ai/silico (9/9)

Build AI models the way you write software

Understand and debug your AI model with Silico.

goodfire.ai

4,068

Goodfire

Goodfire

@GoodfireAI

Jun 11

Read the full blog post on predictive data debugging: goodfire.ai/research/predict…

Predictive Data Debugging: Reveal and Shape What Your Model Learns, Before You Train

Given a preference dataset, we can accurately predict which behaviors RL will amplify or suppress before you train, trace them back to the responsible data, and reshape the dataset and/or training...

goodfire.ai

3,776

Goodfire

Goodfire

@GoodfireAI

Jun 10

Cool work applying the idea from our work on RLFR to RL task generation!

Augustine Mavor-Parker

@MavorParker

Jun 10

Training a model to generate RL tasks not too hard, not too easy costs many solver runs per task. PROPEL predicts difficulty via a probe on its activations instead, amortizing cost and speeding up generator optimization. New open-ended RL research from @Vmax @GoodfireAI.

7,400

Goodfire

Goodfire

@GoodfireAI

Jun 4

New Goodfire research: using logits to monitor for eval awareness!

Santiago Aranguri

@santiaranguri

Jun 4

Would an LLM tell you if it’s gaming your eval? Often, no. But we can still catch the model thinking about it. New research: we measure how close a model comes to saying it’s being tested. This detects eval awareness with 10× to 100× fewer samples than monitoring model outputs.🧵

110

10,641

Salesforce Ventures

Goodfire retweeted

Salesforce Ventures

@SalesforceVC

Jun 2

The idea that launched @GoodfireAI🔥 When ChatGPT launched, most people were blinded by the possibilities. Eric Ho saw the risk it posed. "I kind of saw the next few years unfold before me, where we were about to get increasingly powerful models ... massive amounts of compute, massive amounts of intelligence, but we wouldn't understand at all how this intelligence would actually work." Most models operate as a black box. Users can see the output, but they can't reliably see how the model reached it, why it behaved in a certain way, or whether it will behave the same way again next time. With how quickly AI is being deployed into mission-critical environments, Eric wanted to do something to ensure AI models are functioning as intended. In a new video interview with Emily Zhao, Eric explained his aim behind founding Goodfire: to build the science and technology needed to understand AI from the inside out.

1:04

1,911

Ekdeep Singh Lubana

Goodfire retweeted

Ekdeep Singh Lubana @EkdeepL

Jun 1

Very excited to have this paper out! We show by having more parameters, larger models see reduced interference between updates. This allows them to retain memories of rarely observed samples of a task, eventually allowing them to learn even the tail-end of the distribution. (1/3)

0:09

Christopher Potts

@ChrisGPotts

Jun 1

We take for granted that larger models are better than smaller ones, but why is this so? Our new paper, led by Jing Huang and @EkdeepL, traces this to a data-induced competition for resources (neurons), using formal analysis, idealized tasks, and real pretraining.

ALT Title card for a research paper. The title reads "Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention." Authors listed: Jing Huang, Daniel Wurgaft, Rachit Bansal, Laura Ruis, Naomi Saphra, David Alvarez-Melis, Andrew Lampinen, Christopher Potts, and Ekdeep Singh Lubana. A Goodfire logo appears below the names. Author affiliations: Stanford University, Kempner Institute at Harvard University, MIT, and Anthropic.

184

16,251

Goodfire

Goodfire

@GoodfireAI

Jun 1

New research from Goodfire and collaborators: why do larger models learn more tasks? (spoiler: it’s bottlenecked by data)

Christopher Potts

@ChrisGPotts

Jun 1

179

21,547

Can Rager

Goodfire retweeted

Can Rager @can_rager

May 21

The "tiling" perspective explains a lot of the common problems with SAEs

Goodfire

@GoodfireAI

May 21

The most popular way to interpret AI is missing the bigger picture. Models think in curved shapes. But sparse autoencoders (SAEs) work with straight lines. Can they still capture models’ curved neural geometry? Yes, but not how you might think! (1/7)

0:21

15,785

Goodfire

Goodfire

@GoodfireAI

May 21

0:21

Goodfire

@GoodfireAI

May 7

0:08

151

1,017

173,391

more replies

Goodfire

Goodfire

@GoodfireAI

May 21

So instead of interpreting features in isolation, what if we searched for features that act together? We turned this idea into an unsupervised pipeline to cluster SAE features based on firing patterns. Together, a cluster of features reveals the overall geometry. (6/7)

139

25,535

Goodfire

Goodfire

@GoodfireAI

May 21

SAEs remain useful, as long as we’re aware of their limitations. And we have new techniques in the works that recover manifolds more directly, allowing us to understand models better and control them more effectively! Read the full post here: goodfire.ai/research/can-sae…

Can SAEs Capture Neural Geometry?

AKA, how to use straight lines to capture curved geometry in neural networks

goodfire.ai

4,452