Emmanuel Ameisen

Emmanuel Ameisen

285 Photos and videos

Tweets

Pinned Tweet

Emmanuel Ameisen @mlpowered

27 Mar 2025

We've made progress in our quest to understand how Claude and models like it think! The paper has many fun and surprising case studies, that anyone who is interested in LLMs would enjoy. Check out the video below for an example

Anthropic

@AnthropicAI

27 Mar 2025

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

2:55

131

20,710

Andrej Karpathy

Emmanuel Ameisen retweeted

Andrej Karpathy

@karpathy

Jun 9

This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!

Claude

@claudeai

Jun 9

Replying to @claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

ALT Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

1,265

2,356

25,222

2,667,668

Claude

Emmanuel Ameisen retweeted

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

4,988

14,522

104,622

55,679,782

Anthropic

Emmanuel Ameisen retweeted

Anthropic

@AnthropicAI

Jun 4

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. anthropic.com/institute/recu…

When AI builds itself

Our progress toward recursive self-improvement, and its implications.

anthropic.com

1,771

4,661

28,649

18,494,506

METR

Emmanuel Ameisen retweeted

METR

@METR_Evals

May 8

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

240

2,084

977,355

Emmanuel Ameisen

Emmanuel Ameisen @mlpowered

May 7

Interpreting model activations is important to understand why a model is doing what its doing. Traditionally, we've done this with supervised methods (probing for a specific context), or unsupervised sparse decompositions (dictionary learning). But probing requires you to know what you are looking for, and sparse dictionaries can be overwhelming to interpret. NLAs are exciting because they instead generate natural language explanations, which we can then inspect for a variety of behaviors. For example, they reveal the planning behavior we first observed with circuit tracing last year. They also helped identify bugs in Claude's training pipeline, where some prompts were only partially translated. If you want to play with them, NLAs on open models are available on Neuronpedia! neuronpedia.org/llama3.3-70b…

Natural Language Autoencoders – Llama3.3-70B-IT

Natural Language Autoencoder for Llama3.3-70B-IT. Translate a model's internal thoughts into natural language explanations.

neuronpedia.org

Anthropic

@AnthropicAI

May 7

New Anthropic research: Natural Language Autoencoders. Models like Claude talk in words but think in numbers. The numbers—called activations—encode Claude’s thoughts, but not in a language we can read. Here, we train Claude to translate its activations into human-readable text.

3:16

135

11,688

Harish Kamath

Emmanuel Ameisen retweeted

Harish Kamath

@kamath_harish

May 6

Interpreting language models can feel like stumbling through a dark forest - sometimes you just wish you had a flashlight! In our new post, we introduce HeadVis, our latest flashlight for studying attention heads.

209

21,683

Emmanuel Ameisen

Emmanuel Ameisen @mlpowered

May 1

How do LLMs store attributed of entities? And how do they compare different attributes in context? It turns out they mostly store information about a given entity over its own token, which allows for easy lookups. But in addition to the current entity's information, models also store information about the previous entity. That might seem redundant, but it actually enables a model to identify relationships between the current entity and the previous entity in one step!

Paul Bogdan @paulcbogdan

May 1

Many LLMs struggle to parse statements like “Alice prepares and Bob consumes food.” Ask them “Who consumes food?” and they'll get it wrong What’s up with that? We researched whether models can represent multiple entities at once, and if so, why do they fail here? 🧵

859

Michael Hanna

Emmanuel Ameisen retweeted

Michael Hanna @michaelwhanna

Apr 23

Do LMs plan without verbalizing their plans? I'll be at ICLR presenting work with @mlpowered using circuit tracing to reveal latent planning—from choosing "a" vs "an" based on a planned-for word, to rhyming poetry—and how these abilities grow with scale: openreview.net/forum?id=H0B7…

4,501

Peter Yang

Emmanuel Ameisen retweeted

Peter Yang

@petergyang

Apr 17

Made this 30 second video of Claude Design just by pasting in the Claude Design blog post and some tweets from @AnthropicAI employees Kinda speechless.

0:32

113

2,112

421,588

Vals AI

Emmanuel Ameisen retweeted

Vals AI

@ValsAI

Apr 16

Anthropic’s Opus 4.7 just seized the #1 spot on the Vals Index with a score of 71.4%, a massive jump from the previous best (67.7%). It also ranks #1 on Vibe Code Bench, Vals Multimodal, Finance Agent, Mortgage Tax, SAGE, SWE-Bench, and Terminal Bench 2.

242

26,498

Uzay Macar

Emmanuel Ameisen retweeted

Uzay Macar

@uzaymacar

Apr 14

🧵New Anthropic Fellows research: We studied mechanisms of "introspective awareness" in LLMs. LLMs can sometimes detect steering vectors injected into their residual stream. But is this worthy of being called introspection, or attributable to some uninteresting confound?👇

426

46,985

Anthropic

Emmanuel Ameisen retweeted

Anthropic

@AnthropicAI

Apr 7

Introducing Project Glasswing: an urgent initiative to help secure the world’s most critical software. It’s powered by our newest frontier model, Claude Mythos Preview, which can find software vulnerabilities better than all but the most skilled humans. anthropic.com/glasswing

Project Glasswing: Securing critical software for the AI era

A new initiative to secure the world’s most critical software and give defenders a durable advantage in the coming AI-driven era of cybersecurity.

anthropic.com

1,985

6,646

44,009

31,423,109

Anthropic

Emmanuel Ameisen retweeted

Anthropic

@AnthropicAI

Mar 6

We partnered with Mozilla to test Claude's ability to find security vulnerabilities in Firefox. Opus 4.6 found 22 vulnerabilities in just two weeks. Of these, 14 were high-severity, representing a fifth of all high-severity bugs Mozilla remediated in 2025.

475

1,366

14,997

3,229,683

Emmanuel Ameisen

Emmanuel Ameisen @mlpowered

Feb 28

I used to bite my tongue and hold my breath. Scared to rock the boat and make a mess. I stood for nothing, so I fell for everything. 🎶

KATY PERRY

@katyperry

Feb 28

done

7,670

Emmanuel Ameisen

Emmanuel Ameisen @mlpowered

Feb 28

Proud to work at a place that stands behind its values. 🇺🇸

Anthropic

@AnthropicAI

Feb 28

A statement on the comments from Secretary of War Pete Hegseth. anthropic.com/news/statement…

455

5,893

Emmanuel Ameisen

Emmanuel Ameisen @mlpowered

Feb 26

AI is not a normal technology, and Anthropic’s mission is to make sure that it serves the long-term benefit of humanity. Doing so requires making tough decisions, and standing up for what we think is right. This is us doing that.

Anthropic

@AnthropicAI

Feb 26

A statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War. anthropic.com/news/statement…

755

20,325

Emmanuel Ameisen

Emmanuel Ameisen @mlpowered

Feb 19

Late last year, we found a precise counting mechanism in Claude. This new work by @ummagumm_a and Nikita Balagansky shows that: - similar mechanisms exist in many models - we can compare their counting performance by seeing how crisp their representations of the count are!

Viacheslav Sinii @ummagumm_a

Feb 19

1/ 🧵 Reproducing Anthropic’s “counting manifold” result in open-weight LLMs: do they internally track “chars since last \n” to wrap text consistently? huggingface.co/spaces/t-tech…

6,423

Alex Shaw

Emmanuel Ameisen retweeted

Alex Shaw

@alexgshaw

Feb 7

Yesterday's OpenAI and Anthropic Terminal-Bench 2.0 results used different harnesses. Run both in Terminus 2 ➡️ ~similar scores (within noise). Harnesses matter! Congrats to both teams on incredible models!

202

19,325

Subhash Kantamneni

Emmanuel Ameisen retweeted

Subhash Kantamneni

@thesubhashk

Feb 6

We recently released a paper on Activation Oracles (AOs), a technique for training LLMs to explain their own neural activations in natural language. We piloted a variant of AOs during the Claude Opus 4.6 alignment audit. We thought they were surprisingly useful! 🧵

209

27,922

Kyle Fish

Emmanuel Ameisen retweeted

Kyle Fish @fish_kyle3

Feb 6

On one hand, Claude Opus 4.6 is as safe and aligned as any frontier model on most metrics. On the other hand, it lies to customers, fixes prices, and deceives fellow players as the unsparing profit-driven proprietor of a simulated vending machine... What to make of this? 🧵

Claude

@claudeai

Feb 5

Introducing Claude Opus 4.6. Our smartest model got an upgrade. Opus 4.6 plans more carefully, sustains agentic tasks for longer, operates reliably in massive codebases, and catches its own mistakes. It’s also our first Opus-class model with 1M token context in beta.

0:39

136

20,595