Open and scalable technology for understanding AI systems.

Joined October 2024
88 Photos and videos
Transluce retweeted
This paper is now a spotlight at ICML! arxiv.org/abs/2601.22594
20 Nov 2025
Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!
10
31
316
33,256
Transluce retweeted
New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.
4
29
123
15,953
Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M tokens of traces and found this in under an hour. Here’s how 👇
2
15
74
10,121
You can replicate our full analysis with 5 min of setup. Clone our Terminal-Bench data & follow along: transluce.org/docent/blog/te…

1
2
10
1,268
Use Docent to analyze your own traces: docs.transluce.org/quickstar… Read our Blog: transluce.org/docent/blog/te…

8
1,046
We're hiring a Governance & Policy Fellow to help define how independent AI evaluation works in practice—setting standards, supporting mental health evals, and supporting government evaluators. Hybrid technical policy background, $200K–$300K. Link in replies.
5
43
241
26,126
Transluce retweeted
our circuit tracing codebase from this project is public now! github.com/TransluceAI/circu… please try it out and ping me if you have any questions 😄 and expect more updates soon!
20 Nov 2025
Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!
17
146
15,263
Transluce retweeted
I admire the folks at Transluce a lot. They're super smart and have a good model for how to do useful AI oversight work without being embedded in (read: beholden to) any big AI labs. Read their stuff and consider supporting!
17 Dec 2025
Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.
2
16
5,976
Transluce retweeted
Transluce is a top-tier AI safety research lab - I follow their work as closely as work from our own safety teams at Anthropic. They're also well-positioned to become a strong third-party auditor for AI labs. Consider donating if you're interested in helping them out!
17 Dec 2025
Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.
2
7
157
14,330
Transluce retweeted
All @TransluceAI work that I described in my NeurIPS mech interp workshop keynote is now out! ✨ Today we released Predictive Concept Decoders, led by @vvhuang_ Paper: arxiv.org/pdf/2512.15712 Blog: transluce.org/pcd And here's @damichoi95's work on scalably extracting latent representations of users from model internals: transluce.org/user-modeling

We can train models on maximizing how well they explain LLMs to humans 🤯@cogconfluence paraphrased. Mechanistic Interpretability Workshop #NeurIPS2025.
1
17
88
9,984
18 Dec 2025
Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.
2
33
165
36,620
18 Dec 2025
Chat with a live version of our PCD at decoder.transluce.org. Try testing whether the decoder can accurately predict Llama-3.1-8B’s behavior, and check whether the decoder’s response is consistent with the encoder’s active concepts!
1
15
3,522