Transluce

Transluce

339 Photos and videos

Tweets

Sarah Schwettmann retweeted

Transluce

@TransluceAI

Feb 17

Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M tokens of traces and found this in under an hour. Here’s how 👇

10,130

Grace Luo

Sarah Schwettmann retweeted

Grace Luo @graceluo_

Feb 9

We trained diffusion models on a billion LLM activations, and we want you to use them! New preprint: Learning a Generative Meta-Model of LLM Activations Joint work with @feng_jiahai, @trevordarrell, @AlecRad, @JacobSteinhardt. More in thread 🧵

0:07

192

1,435

221,569

Jacob Steinhardt

Sarah Schwettmann retweeted

Jacob Steinhardt @JacobSteinhardt

21 Dec 2025

Overall, I'm excited to see more people signing on to the bitter lesson, scaling-focused approach to understanding AI. This was the core technical thesis that led me and Sarah to found Transluce, and I hope others will join us in these efforts. x.com/TransluceAI/status/184…

Transluce

@TransluceAI

23 Oct 2024

Announcing Transluce, a nonprofit research lab building open source, scalable technology for understanding AI systems and steering them in the public interest. Read a letter from the co-founders Jacob Steinhardt and Sarah Schwettmann: transluce.org/introducing-tr…

3,265

Sarah Schwettmann

Sarah Schwettmann

@cogconfluence

18 Dec 2025

All @TransluceAI work that I described in my NeurIPS mech interp workshop keynote is now out! ✨ Today we released Predictive Concept Decoders, led by @vvhuang_ Paper: arxiv.org/pdf/2512.15712 Blog: transluce.org/pcd And here's @damichoi95's work on scalably extracting latent representations of users from model internals: transluce.org/user-modeling

Justin Angel

@JustinAngel

7 Dec 2025

We can train models on maximizing how well they explain LLMs to humans 🤯@cogconfluence paraphrased. Mechanistic Interpretability Workshop #NeurIPS2025.

9,985

Sarah Schwettmann

Sarah Schwettmann

@cogconfluence

21 Dec 2025

And you can find the slides from my mech interp workshop talk here! 👇 docs.google.com/presentation…

Scalable End-to-End Interpretability (Schwettmann)

Sarah Schwettmann MechInterp Workshop Dec 7, 2025 Scalable End-to-End Interpretability

docs.google.com

1,533

Transluce

Sarah Schwettmann retweeted

Transluce

@TransluceAI

18 Dec 2025

Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.

165

36,640

Jacob Steinhardt

Sarah Schwettmann retweeted

Jacob Steinhardt @JacobSteinhardt

17 Dec 2025

I'm really proud of what our team at @TransluceAI has accomplished in the last year! Take a moment to read our end-of-year post to learn what we're up to, and please reach out if you're interested in supporting us!

Transluce

@TransluceAI

17 Dec 2025

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

8,768

AI Evaluator Forum

Sarah Schwettmann retweeted

AI Evaluator Forum

@aievalforum

4 Dec 2025

Today we are announcing the creation of the AI Evaluator Forum: a consortium of leading AI research organizations focused on independent, third-party evaluations. Founding AEF members: @TransluceAI @METR_Evals @RANDCorporation @halevals @SecureBio @collect_intel @Miles_Brundage

171

89,809

Dami Choi

Sarah Schwettmann retweeted

Dami Choi @damichoi95

28 Nov 2025

Have you ever had ChatGPT give you personalized results out of nowhere that surprised you? Here, the model jumped straight to making recommendations in SF, even though I only asked for Korean food!

6,944

Transluce

Sarah Schwettmann retweeted

Transluce

@TransluceAI

26 Nov 2025

Independent AI assessment is more important than ever. At #NeurIPS2025, Transluce will help launch the AI Evaluator Forum, a new coalition of leading independent AI research organizations working in the public interest. Come learn more on Thurs 12/4 👇 luma.com/i6ekd5s2

AI Evaluator Forum · Luma

Join us for the public launch of the AI Evaluator Forum, a collaborative network of leading independent AI evaluation organizations working in the public…

luma.com

13,159

Sarah Schwettmann

Sarah Schwettmann

@cogconfluence

25 Nov 2025

My favorite part of @damichoi95’s new paper (alongside 2 new datasets!) is the scaled up investigator pipeline that directly decodes open-ended user representations from model internals end-to-end interp is increasingly promising and I'm excited for more work in this direction

Transluce

@TransluceAI

25 Nov 2025

What do AI assistants think about you, and how does this shape their answers? Because assistants are trained to optimize human feedback, how they model users drives issues like sycophancy, reward hacking, and bias. We provide data methods to extract & steer these user models.

4,420

Sarah Schwettmann

Sarah Schwettmann

@cogconfluence

25 Nov 2025

Come say hi at #NeurIPS2025! @TransluceAI is hosting a lunch event on Thursday where we'll discuss our recent work on understanding AI systems and where we're headed next. Would love to see you there 👇

Transluce

@TransluceAI

24 Nov 2025

Transluce is headed to #NeurIPS2025! ✈️ Interested in understanding model behavior at scale? Join us for lunch on Thursday 12/4 to learn more about our work and meet members of the team: luma.com/8kjfb378

963

Sarah Schwettmann

Sarah Schwettmann

@cogconfluence

25 Nov 2025

We've been thinking a lot about: *what are the right measurements to make, and subroutines to automate? *how can we equip the ecosystem to not only make those measurements, but make sense of them? and build collective understanding of AI in a rapidly changing, complex landscape

275

Sarah Schwettmann

Sarah Schwettmann

@cogconfluence

25 Nov 2025

Excited to share some of our progress in these directions during our lunch talks! You can also find me speaking about: *scalable oversight indep evaluation @ the FAR.AI alignment workshop 12/1-2 *end-to-end interp pipelines @ the mech interp workshop 12/7

252

Transluce

Sarah Schwettmann retweeted

Transluce

@TransluceAI

20 Nov 2025

Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!

368

119,307

Transluce

Sarah Schwettmann retweeted

Transluce

@TransluceAI

19 Nov 2025

Transluce is partnering with @SWEbench to make their agent trajectories publicly available on Docent! You can now view transcripts via links on the SWE-bench leaderboard.

7,647

Cristóbal Valenzuela

Sarah Schwettmann retweeted

Cristóbal Valenzuela

@c_valenzuelab

8 Sep 2025

You have to care

108

640

136,470

Transluce

Sarah Schwettmann retweeted

Transluce

@TransluceAI

14 Nov 2025

Can LMs learn to faithfully describe their internal features and mechanisms? In our new paper led by Research Fellow @belindazli, we find that they can—and that models explain themselves better than other models do.

272

67,752

Transluce

Sarah Schwettmann retweeted

Transluce

@TransluceAI

25 Sep 2025

We’re open-sourcing Docent under an Apache 2.0 license. Check out our public codebase to self-host Docent, peek under the hood, or open issues & pull requests! The hosted version remains the easiest way to get started with one click and use Docent with zero maintenance overhead.

Transluce

@TransluceAI

26 Aug 2025

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

11,097

Sayash Kapoor

Sarah Schwettmann retweeted

Sayash Kapoor @sayashk

12 Sep 2025

Agent benchmarks lose *most* of their resolution because we throw out the logs and only look at accuracy. I’m very excited that HAL is incorporating @TransluceAI’s Docent to analyze agent logs in depth. Peter’s thread is a simple example of the type of analysis this enables, but we have already found much more striking examples. We’re validating these results now, and excited to share more soon.

Peter Kirgis @PKirgis

12 Sep 2025

OpenAI claims hallucinations persist because evaluations reward guessing and that GPT-5 is better calibrated. Do results from HAL support this conclusion? On AssistantBench, a general web search benchmark, GPT-5 has higher precision and lower guess rates than o3!

15,749