Transluce

Transluce

88 Photos and videos

Tweets

Transluce

@TransluceAI

May 5

Proud to partner with @CommonSense to help develop a rigorous science around youth AI safety. Millions of kids are already using AI every day, and our understanding of these systems and their impacts has to catch up. commonsensemedia.org/press-r…

Common Sense Media Launches Youth AI Safety Institute

The first-of-its-kind AI safety lab focused on children will independently test AI products, broadly publish the results, and set clear standards to protect the safety, health, and development of a...

commonsensemedia.org

1,636

Aryaman Arora

Transluce retweeted

Aryaman Arora

@aryaman2020

Apr 30

This paper is now a spotlight at ICML! arxiv.org/abs/2601.22594

Language Model Circuits Are Sparse in the Neuron Basis

The high-level concepts that a neural network uses to perform computation need not be aligned to individual neurons (Smolensky, 1986). Language model interpretability research has thus turned to...

arxiv.org

Transluce

@TransluceAI

20 Nov 2025

Is your LM secretly an SAE? Most circuit-finding interpretability methods use learned features rather than raw activations, based on the belief that neurons do not cleanly decompose computation. In our new work, we show MLP neurons actually do support sparse, faithful circuits!

316

33,256

Jacob Steinhardt

Transluce retweeted

Jacob Steinhardt @JacobSteinhardt

Feb 18

New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.

123

15,953

Transluce

Transluce

@TransluceAI

Feb 17

Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵 GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M tokens of traces and found this in under an hour. Here’s how 👇

10,121

more replies

Transluce

Transluce

@TransluceAI

Feb 17

You can replicate our full analysis with 5 min of setup. Clone our Terminal-Bench data & follow along: transluce.org/docent/blog/te…

1,268

Transluce

Transluce

@TransluceAI

Feb 17

Use Docent to analyze your own traces: docs.transluce.org/quickstar… Read our Blog: transluce.org/docent/blog/te…

1,046

Transluce

Transluce

@TransluceAI

Jan 29

We're hiring a Governance & Policy Fellow to help define how independent AI evaluation works in practice—setting standards, supporting mental health evals, and supporting government evaluators. Hybrid technical policy background, $200K–$300K. Link in replies.

241

26,126

Transluce

Transluce

@TransluceAI

Jan 29

See the full post and apply here: jobs.gem.com/transluce/am9ic…

Transluce Careers

jobs.gem.com

3,036

Aryaman Arora

Transluce retweeted

Aryaman Arora

@aryaman2020

Jan 16

our circuit tracing codebase from this project is public now! github.com/TransluceAI/circu… please try it out and ping me if you have any questions 😄 and expect more updates soon!

GitHub - TransluceAI/circuits: ADAG: Transluce's MLP neuron-level circuit tracing library

ADAG: Transluce's MLP neuron-level circuit tracing library - TransluceAI/circuits

github.com

Transluce

@TransluceAI

20 Nov 2025

146

15,263

Jacob Austin

Transluce retweeted

Jacob Austin @jacobaustin132

23 Dec 2025

I admire the folks at Transluce a lot. They're super smart and have a good model for how to do useful AI oversight work without being embedded in (read: beholden to) any big AI labs. Read their stuff and consider supporting!

Transluce

@TransluceAI

17 Dec 2025

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

5,976

Ethan Perez

Transluce retweeted

Ethan Perez

@EthanJPerez

22 Dec 2025

Transluce is a top-tier AI safety research lab - I follow their work as closely as work from our own safety teams at Anthropic. They're also well-positioned to become a strong third-party auditor for AI labs. Consider donating if you're interested in helping them out!

Transluce

@TransluceAI

17 Dec 2025

Transluce is running our end-of-year fundraiser for 2025. This is our first public fundraiser since launching late last year.

157

14,330

Sarah Schwettmann

Transluce retweeted

Sarah Schwettmann

@cogconfluence

18 Dec 2025

All @TransluceAI work that I described in my NeurIPS mech interp workshop keynote is now out! ✨ Today we released Predictive Concept Decoders, led by @vvhuang_ Paper: arxiv.org/pdf/2512.15712 Blog: transluce.org/pcd And here's @damichoi95's work on scalably extracting latent representations of users from model internals: transluce.org/user-modeling

Justin Angel

@JustinAngel

7 Dec 2025

We can train models on maximizing how well they explain LLMs to humans 🤯@cogconfluence paraphrased. Mechanistic Interpretability Workshop #NeurIPS2025.

9,984

Transluce

Transluce

@TransluceAI

18 Dec 2025

Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior. Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.

165

36,620

more replies

Transluce

Transluce

@TransluceAI

18 Dec 2025

Chat with a live version of our PCD at decoder.transluce.org. Try testing whether the decoder can accurately predict Llama-3.1-8B’s behavior, and check whether the decoder’s response is consistent with the encoder’s active concepts!

3,522

Transluce

Transluce

@TransluceAI

18 Dec 2025

Paper: arxiv.org/abs/2512.15712 Blog: transluce.org/pcd Authors: @vvhuang_, @damichoi95, @_ddjohnson, @cogconfluence, @JacobSteinhardt If you’re excited about building scalable interpretability assistants, visit transluce.org/company

Predictive Concept Decoders: Training Scalable End-to-End...

Interpreting the internal activations of neural networks can produce more faithful explanations of their behavior, but is difficult due to the complex structure of activation space. Existing...

arxiv.org

1,561