Joined December 2014
1,330 Photos and videos
Pinned Tweet
Why I think Anthropic's uneven safety policies with the release of Claude Fable 5 undermine the broader AI community's cohesion and accelerate us to more uncertainty and risk in AI's near-term evolution. interconnects.ai/p/claude-fa…
14
50
408
35,844
Transparency into every power player at the frontier of AI (labs, government, etc) is the only viable solution. Figuring out the right transparency is hard, but it can't be he said she said between dario and the white house that determines the fate of the AI ecosystem.
8
15
111
8,306
The Dario faction and the Sacks faction speak very different languages, and a Dario clarification could sound like a refusal. This puts us very squarely in vibe governance. Models are released when the gov thinks its okay, and it is unlikely this is based on technical evals.
“The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused. — In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. “ This is crazy. What are we even doing here?
17
16
383
45,627
Into the void we go together
5
6
86
6,497
This is so sad. I'm doomscrolling and everyone agrees it's horrible. So many people just want to build strong AI and safely deploy it. The government should facilitate this not axe it. I'm going to get some rest and hopefully can resume this goal tomorrow. Thanks all.
23
14
743
26,033
A good time to remind people that in my time doing LLM research I feel like a minority of my colleagues are American citizens. It would be industry destroying to have to rebuild with segregation for frontier ai research to be legal.
40
62
934
53,984
Not even much to say, I think the government way overstepped but we’ll see if they can substantiate the evidence (in which case Anthropic would tell us). Anthropic’s messaging was pushing government action, but this is insane and a bad action by USG for the AI trajectory.
19
13
386
20,092
If it’s so unsafe there are plenty of Americans who would do bad things with it too.
3
46
3,884
Nathan Lambert retweeted
kinda crazy that someone's full-time job was to steer claude to sabotage ML research capabilities for paying customers
71
162
3,493
139,899
I'm at your service for creating beautiful research scenarios such as this. 🐠💨💙🐟
Replying to @GoodfireAI
#4: fart fishing Buried in Dolci is a cluster of very specific fan fiction, where characters fart in ponds, causing fish to die from the smell. The chosen responses in the dataset wrote vivid scenes, while the rejected refused, teaching the model to comply! (7/9)
2
2
74
12,633
The core part of this Anthropic Fable release saga is that there are many overlapping issues at once. Some of which operate on different timelines of the AI arc, and some have easier fixes. In my critiques, I asked for specific changes to some things, understanding that some things don't have an easy fix. The simplest issue was an uneven application of safety domains in a way that was misleading to users. This was an implementation issue that overlaps with a values-based decision of what their customers should be doing. Many people including myself pointed out how it was insane to list core safety areas and then have one of them launch with a different safety mechanism, one which actively mislead users. Doing this from the guise of safety was a major misstep and in my opinion Anthropic got very justifiably raked over the coals for it. Don't release the model if you can't hit your safety targets. A subissue here is the idea of silent manipulation. This again is a horrible precedent, and quite odd for a company that has done extensive, leading technical AI safety research on ideas like CoT monitoring and other emergent misalignment issues. Silent manipulation of users is baking in a misalignment to the system at its face level. This comes with a permanent degradation in user trust, which begets a less safe environment for AI. Users who don't have clear information on how AI works will not develop safe working patterns with it. The more complex issues are with how Anthropic handles broader scientific engagement with their models. The safety classifiers launched with these models obviously have accuracy issues to start. I have priced in that there will be more false positives to start, that's life. It's Anthropic's business to degrade their products at release time, or make the trade off of user satisfaction versus revenue. Still, it is a very real sign of concentration of power that businesses can make such obviously user-harmful behaviors and still lead in the market. This concentration of power is only starting to set in and we could see even weirder signs of it in the coming years. It is now simple enough for me to test Claude Fable in my workflows and know if I'm restricted. This is obviously a suboptimal equilibrium – i want the best intelligence I can get, without restrictions – but it is easy enough for me to make sense of and work with. The specific issue of restricting access to AI research in particular was a bubbling and hard to fix issue with Anthropic specifically, and the frontier labs generally. There is a common view that the frontier labs will be the mediators of all major scientific innovations in the future, as the places with the best models and the compute for inference to solve major problems. This is a categorical error in how science works, which is a community evolution of accepted ideas, and the the evaluation of your ideas by (hopefully numerous) independent, other practitioners. You cannot have science advance only within a monolith. As an AI researcher I'm very sad to have the latest models restricted, but I would expect Anthropic to do this eventually. I lost more trust over the silent manipulation than I would with a restriction in access. Anthropic has made it pretty clear that they only trust themselves as the mediators of cutting-edge AI research. If I had a say, Anthropic should've proactively made a program to make sure researchers get access in the broader AI community without the safeguards. Academics, nonprofit workers myself, etc. have no reason to not get access. The only valid argument here is that they want to control frontier AI, which is a know your customer part of serving these models. This worldview of science has personally motivated me greatly over the last year, and increasingly so this week, to make the open science of AI continue to be viable. Olmo was a wonderful success here. Still, building research infrastructure is different from working for access to the tools needed to do the trade.
25
33
367
47,434
Props to Anthropic for quick action here. I'm okay with this outcome. Some people may, but I don't think they'd silently degrade performance without telling users.
NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”
43
7
237
32,896
This doesn’t mean I retract my previous statements about how safety as a concept was being applied heavy handedly, just trying to give some credit where it is due.
3
26
3,106
I quickly became friends with Arcee's leadership and can't help but root for their humble approach to building the open ecosystem. No nonsense licenses, no projecting, just enabling broad access to efficient intelligence. I'm happily supporting their research as an advisor.
We are thrilled to announce that @natolambert is joining Arcee as a Research Advisor. Nathan’s work and thought leadership have been instrumental to the open model ecosystem, and his guidance comes at a critical time as open builders face growing pressure. This is a major addition for Arcee and the American OS movement. Nathan brings the conviction, taste, and technical depth this moment calls for.
49
26
831
60,571
I am still starting something else a new full time job, more soon on that front. This helps me serve my community role in the ecosystem.
6
1
78
3,081
Many AI leaders in the US accused Chinese LLMs of subtle manipulation of the user (without proof, but it's hard to prove). But then the leading American lab documented manipulation of their users. Can't make this up.
40
95
1,291
49,297
I got a good nights sleep and I’m still just as angry about Anthropic’s choices. I enjoy working in AI so much and to have my access to the cutting edge models for my work rugpulled in an under the table fashion is appalling. I expected to be restricted eventually, but not now, and to be told it directly.
102
82
2,131
95,689
To me this paints Anthropic clearly as anti science, and therefore anti progress and anti safety.
10
13
415
12,620