Nathan Lambert

Nathan Lambert

1,330 Photos and videos

Tweets

Pinned Tweet

Nathan Lambert

@natolambert

Jun 9

Why I think Anthropic's uneven safety policies with the release of Claude Fable 5 undermine the broader AI community's cohesion and accelerate us to more uncertainty and risk in AI's near-term evolution. interconnects.ai/p/claude-fa…

Claude Fable 5 and new safety fables

One step further into the power politics of frontier AI systems.

interconnects.ai

408

35,844

Nathan Lambert

Nathan Lambert

@natolambert

12h

Transparency into every power player at the frontier of AI (labs, government, etc) is the only viable solution. Figuring out the right transparency is hard, but it can't be he said she said between dario and the white house that determines the fate of the AI ecosystem.

111

8,306

Nathan Lambert

Nathan Lambert

@natolambert

12h

The Dario faction and the Sacks faction speak very different languages, and a Dario clarification could sound like a refusal. This puts us very squarely in vibe governance. Models are released when the gov thinks its okay, and it is unlikely this is based on technical evals.

martin_casado

@martin_casado

14h

“The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused. — In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. “ This is crazy. What are we even doing here?

383

45,627

Nathan Lambert

Nathan Lambert

@natolambert

18h

Into the void we go together

6,497

Nathan Lambert

Nathan Lambert

@natolambert

Jun 13

This is so sad. I'm doomscrolling and everyone agrees it's horrible. So many people just want to build strong AI and safely deploy it. The government should facilitate this not axe it. I'm going to get some rest and hopefully can resume this goal tomorrow. Thanks all.

743

26,033

Nathan Lambert

Nathan Lambert

@natolambert

Jun 13

A good time to remind people that in my time doing LLM research I feel like a minority of my colleagues are American citizens. It would be industry destroying to have to rebuild with segregation for frontier ai research to be legal.

934

53,984

Nathan Lambert

Nathan Lambert

@natolambert

Jun 13

Not even much to say, I think the government way overstepped but we’ll see if they can substantiate the evidence (in which case Anthropic would tell us). Anthropic’s messaging was pushing government action, but this is insane and a bad action by USG for the AI trajectory.

386

20,092

Nathan Lambert

Nathan Lambert

@natolambert

Jun 13

If it’s so unsafe there are plenty of Americans who would do bad things with it too.

3,884

Nathan Lambert

Nathan Lambert

@natolambert

Jun 12

derivation of policy gradient: rlhfbook.com/c/06-policy-gra…

Reinforcement Learning | RLHF and Post-Training Book by Nathan Lambert

Policy gradient methods for RLHF and LLM post-training, including PPO, REINFORCE, RLOO, GRPO, and implementation details.

rlhfbook.com

Harsh Bhatt

@harshbhatt7585

Jun 12

derivation of Policy Gradient.

644

67,560

Joanne Jang

Nathan Lambert retweeted

Joanne Jang

@joannejang

Jun 11

kinda crazy that someone's full-time job was to steer claude to sabotage ML research capabilities for paying customers

162

3,493

139,899

Nathan Lambert

Nathan Lambert

@natolambert

Jun 11

I'm at your service for creating beautiful research scenarios such as this. 🐠💨💙🐟

Goodfire

@GoodfireAI

Jun 11

Replying to @GoodfireAI

#4: fart fishing Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish to die from the smell. The chosen responses in the dataset wrote vivid scenes, while the rejected refused, teaching the model to comply! (7/9)

12,633

Nathan Lambert

Nathan Lambert

@natolambert

Jun 11

The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some things, understanding that some things don't have an easy fix. The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets. A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it. The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years. It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with. The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith. As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research. If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models. This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.

367

47,434

Nathan Lambert

Nathan Lambert

@natolambert

Jun 11

Props to Anthropic for quick action here. I'm okay with this outcome. Some people may, but I don't think they'd silently degrade performance without telling users.

Max Zeff

@ZeffMax

Jun 11

NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

237

32,896

Nathan Lambert

Nathan Lambert

@natolambert

Jun 11

This doesn’t mean I retract my previous statements about how safety as a concept was being applied heavy handedly, just trying to give some credit where it is due.

3,106

Nathan Lambert

Nathan Lambert

@natolambert

Jun 10

I quickly became friends with Arcee's leadership and can't help but root for their humble approach to building the open ecosystem. No nonsense licenses, no projecting, just enabling broad access to efficient intelligence. I'm happily supporting their research as an advisor.

Arcee.ai

@arcee_ai

Jun 10

We are thrilled to announce that @natolambert is joining Arcee as a Research Advisor. Nathan’s work and thought leadership have been instrumental to the open model ecosystem, and his guidance comes at a critical time as open builders face growing pressure. This is a major addition for Arcee and the American OS movement. Nathan brings the conviction, taste, and technical depth this moment calls for.

831

60,571

Nathan Lambert

Nathan Lambert

@natolambert

Jun 10

I am still starting something else a new full time job, more soon on that front. This helps me serve my community role in the ecosystem.

3,081

Nathan Lambert

Nathan Lambert

@natolambert

Jun 10

Many AI leaders in the US accused Chinese LLMs of subtle manipulation of the user (without proof, but it's hard to prove). But then the leading American lab documented manipulation of their users. Can't make this up.

1,291

49,297

Nathan Lambert

Nathan Lambert

@natolambert

Jun 10

I got a good nights sleep and I’m still just as angry about Anthropic’s choices. I enjoy working in AI so much and to have my access to the cutting edge models for my work rugpulled in an under the table fashion is appalling. I expected to be restricted eventually, but not now, and to be told it directly.

102

2,131

95,689

Nathan Lambert

Nathan Lambert

@natolambert

Jun 10

To me this paints Anthropic clearly as anti science, and therefore anti progress and anti safety.

415

12,620