Ekdeep Singh Lubana

Ekdeep Singh Lubana

7 Photos and videos

Tweets

Ryan Panwar retweeted

Ekdeep Singh Lubana @EkdeepL

Jun 11

Super excited about this work! This paper was driven by a claim I've been making to anyone who'll listen: "interpretability is the language of data". (1/3)

Goodfire

@GoodfireAI

Jun 11

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

0:34

138

12,356

Ryan Panwar

Ryan Panwar @RyanPanwar

Jun 11

There’s a lot of weird and surprising correlations in large scale datasets, this research helps unravel them to make posttraining less of a black magic

Goodfire

@GoodfireAI

Jun 11

0:34

289

Goodfire

Ryan Panwar retweeted

Goodfire

@GoodfireAI

May 14

Neural networks do math by rotating shapes. We found a shape-rotating calculator hidden inside an LLM – and it’s used for more than just math! (1/6)

0:09

Goodfire

@GoodfireAI

May 7

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

0:08

122

555

4,302

935,779

Ryan Panwar

Ryan Panwar @RyanPanwar

May 14

This work is extremely cool, LLMs have learned from raw text data an incredibly elegant geometrical algorithm and applied it in multiple different settings

Goodfire

@GoodfireAI

May 14

Replying to @GoodfireAI

The same calculator handles a wide range of tasks, including: - arithmetic (“7 9”) - weekdays (“nine days after Friday”) - months (“six months after August”) Llama built this mechanism from scratch in training, and uses it with striking elegance and flexibility. (4/6)

0:10

1,570

Matthew Kowal

Ryan Panwar retweeted

Matthew Kowal

@MatthewKowal9

Apr 30

Excited that Silico, the core platform behind our research results, is finally being announced! Here's one example of how I've used it: To interpret the EchoJEPA model and visualize different attribution methods onto a temporally aligned 3D heart mesh. More results to share on this soon!

0:51

Goodfire

@GoodfireAI

Apr 30

Introducing Silico: the platform for building AI models with the precision of written software. Silico lets researchers and engineers see inside their models, debug failures, and intentionally design them from the ground up. Early access is open now. 🧵(1/10)

0:39

1,344

Ryan Panwar

Ryan Panwar @RyanPanwar

Apr 30

Silico is a lab for AI scientists to dissect their own brains. It already runs autonomously for days, performing frontier AI research finding and fixing model pathologies.

Goodfire

@GoodfireAI

Apr 30

0:39

594

Eric Ho

Ryan Panwar retweeted

Eric Ho

@ericho_goodfire

Apr 30

if you have goblins in your model, silico will find them or your money back

Goodfire

@GoodfireAI

Apr 30

0:39

5,002

Dan Balsam

Ryan Panwar retweeted

Dan Balsam

@DanJBalsam

Apr 15

loved the piece on interpretability in the @nytimes this morning! the field has accomplished some pretty cool things in recent years. still, there is much work to do as @davidbau put it elegantly, interpretability is now where biology was in 1930: “The cell was a black box for biologists. They were slow to get off the starting block to start studying heredity. But once they did, the problem fell.” there is so much we can learn from these alien intelligences.

Eric Ho

@ericho_goodfire

Apr 15

.@nytimes this morning

7,301

Nick Wang

Ryan Panwar retweeted

Nick Wang @nkwang24

Apr 14

At my last job, we often got calls from parents frantically asking for their child's genetic test results. Too often, the results were inconclusive. Variant effect prediction sounds abstract but can be life-or-death for genetic disorders. Proud of the team for narrowing this gap!

Goodfire

@GoodfireAI

Apr 14

We achieved state-of-the-art performance in predicting which of 4.2 million genetic variants cause diseases by interpreting a genomics model, in a new preprint with @MayoClinic. We're now releasing an open source database for all variants in the NIH's clinvar database. 🧵(1/8)

6,829

Dan Balsam

Ryan Panwar retweeted

Dan Balsam

@DanJBalsam

Apr 14

there is much that we can learn from these alien intelligences. i'm excited to see what the community can do with the tools we are open sourcing interpretability is at the center of our success here. not only do these explanations offer potential discoveries, understanding the model was critical to the entire research path here. very proud of the team's work and to our amazing partners at @MayoClinic. excited for all the great things still to come

Goodfire

@GoodfireAI

Apr 14

1,641

Alif Munim (d/acc)

Ryan Panwar retweeted

Alif Munim (d/acc)

@alifmunim

Apr 14

interpretability is hands down the coolest subfield of AI research and will change the world

Eric Ho

@ericho_goodfire

Apr 14

using interp techniques to get to SOTA performance on genetic disease prediction! you can think of interp as the bridge between human natural language and the alien intelligence of Evo 2 (biology model that only outputs nucleotides, i.e. ATCG). this means that now you can see directly what Evo 2 thinks of every single one of the ~2M “variants of uncertain significance" in ClinVar, which turns out to be a lot! we also need your help in testing these predictions and looking for web lab collaborators. DM me if you’d like to chat!

1,475

Trevor Campbell

Ryan Panwar retweeted

Trevor Campbell

@TrevorCampbell_

Apr 14

Already gave this a rip for some VUS and Suspected Pathogenic variants that I have previously done deep analysis on, and can confirm that EVEE posits many of the same findings and conclusions that I have found in terms of prediction and suggested failure mechanism Examples: VUS (for which *I* am the only ClinVar entry) for one of my heterozygous mutations in DNAH5 that could play a ~small~ factor in the overall root cause of my PCD My Variant of CLCN1 that gives me a rare muscular disorder (which makes me look like Wolverine without needing to go to the gym, so not all bad 🤷‍♂️) EVEE is a nice tool for variant interpretation!

Goodfire

@GoodfireAI

Apr 14

9,176

Goodfire

Ryan Panwar retweeted

Goodfire

@GoodfireAI

Apr 14

172

884

221,540

Goodfire

Ryan Panwar retweeted

Goodfire

@GoodfireAI

Apr 1

Introducing self-correcting search: a technique to let diffusion models self-correct mid-trajectory. Working with @RadicalAI, we gave MatterGen a feedback loop from its own activations, improving viable on-target candidates by ~30%. (1/8)

466

83,517

Ryan Panwar

Ryan Panwar @RyanPanwar

Feb 28

All watched over by machines of loving grace

Helen Toner

@hlntnr

Feb 25

One thing the Pentagon is very likely underestimating: how much Anthropic cares about what *future Claudes* will make of this situation. Because of how Claude is trained, what principles/values/priorities the company demonstrate here could shape its "character" for a long time.

226

Goodfire

Ryan Panwar retweeted

Goodfire

@GoodfireAI

Feb 25

New blog post: how we built infrastructure to enable interp at trillion-parameter scale with minimal inference overhead. In a couple short years, interpretability has gone from toy models to the frontier. (1/6)

211

19,176

Aaditya Prasad 🇺🇸

Ryan Panwar retweeted

Aaditya Prasad 🇺🇸

@_Aaditya_Prasad

Feb 11

New Paper! RL can teach our models to solve math or code, but open-ended tasks — which make verification expensive or even impossible — remain difficult to optimize. LLMs-as-Judges help, but often struggle to retrieve information even when it is present. Reinforcement Learning from Feature Rewards (RLFR) provides a solution. Extracting model beliefs via interpretability reveals a well-calibrated reward signal that permits scalable training.

288

37,428

Tom McGrath

Ryan Panwar retweeted

Tom McGrath

@banburismus_

Feb 5

We’re putting more computation (in the form of intelligence) into the most general object in neural network training: backprop. This essay describes how I think we can do this, why interp is key, the relevance to alignment, and how we should do it right.

556

67,846

Ryan Panwar

Ryan Panwar @RyanPanwar

Feb 5

Artificial intelligence is grown, not designed. We’re going to change that.

Goodfire

@GoodfireAI

Feb 5

We raised a $150M Series B at a $1.25B valuation to fundamentally change the field of AI. Scaling is powerful, but we can't intentionally design what we don't understand.

1:32

1,076

Ryan Panwar

Ryan Panwar @RyanPanwar

Feb 5

There is both great scientific beauty in understanding the structures that emerge in the growing of artificial neural networks and a massive opportunity to shape them into something better. Come join us to change the trajectory of intelligence! goodfire.ai/blog/our-series-…

Understanding, Learning From, and Designing AI: Our Series B

goodfire.ai

133