Joined January 2010
4,145 Photos and videos
Pinned Tweet
25 Nov 2024
been becoming more active there! haven't been posting much but have been reading a lot more, it's pretty pleasant!
28 Apr 2023
I'm on bluesky as well, @manishearth.bsky.social unsure what I plan to do there
8
13
10,278
Your language doesn’t have a word for this now, does it?
6
40
714
isnt that what they served at fyre festival
The American mind cannot comprehend the pub cheese and onion roll
1
7
93
2,340
this week has been a rollercoaster for the ai world and half the Major Posters from that world have been off trying to elect a new pope
1
5
464
(there was a papal conclave larp at lighthaven. apparently very good)
1
5
261
they gotta print the weights on a t-shirt. idk. they can print them real small. it’ll be fine
1
5
106
2,507
complaining about claude not showing "thinking": "claude does not think, therefore claude cannot am"
2
2
11
1,003
incogito ergo sumn't
1
4
291
yeah this is me, I think the decision was defensible but also not the ideal one
I no longer have any particular criticisms of Anthropic's actions regarding Fable. I think the decision of making guardrails invisible made sense but was ultimately incorrect, and I'm glad they've changed their minds on this.
1
314
Manish retweeted
It is a *little* awkward when your CEO is out there complaining people aren’t putting his ideas into law fast enough, yet your own policies can’t survive 24 hours of contact with reality.
We’re rolling out changes to make Fable 5’s safeguards for frontier LLM development visible. Starting this week, flagged requests will visibly fall back to Opus 4.8—the same as our safeguards for cyber and bio. You will see this every time it happens. On the API, any flagged requests will return a reason for their refusal (coming to server-side fallback in the next few days). We wanted to deploy Fable 5 to our users quickly and safely. Visible safeguards can be probed, so they have to be robust, which takes time to get right. Invisible safeguards can be targeted more narrowly, allowing us to ship quickly with very few false positives. We went with invisible safeguards for this reason—and that was the wrong tradeoff. You should have visibility into the safeguards we have in place, and why. We’re sorry for not getting the balance right. Making the safeguards visible makes them easier to work around, so keeping them robust to jailbreaks will unfortunately mean more false positives while we improve the classifiers. We're also tuning our bio and cyber classifiers to trigger less often on harmless requests. We know this is frustrating and we’ll do our best to keep this period as short as possible. If you think a request has been mistakenly flagged: run /feedback in Claude Code, click thumbs-down on the fallback in Claude.ai or Cowork, or file the safeguard appeal form for API requests. Your reports help us tune these classifiers and we appreciate your feedback. support.claude.com/en/articl…
9
38
866
26,939
Manish retweeted
Very pleased to hear Anthropic have walked back this policy simonwillison.net/2026/Jun/1…
BREAKING NEWS: Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice. We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming 😭
94
82
1,075
256,088
related: they confirm a pope is dead by tapping them on the head whilst deadnaming them to see if they react
I didn’t realize popes ever spoke like this, but it’s quite humanizing. Like, emphasizing “I play a character who is different from the real me.”
3
19
1,278
51,154
Manish retweeted
Cannot think of a more disastrous set of decisions to make ahead of an IPO, the reaction to data policies alone will show up in their revenue figures, to say nothing of cost control measures
Jun 10
SITUATION DETECTED: Microsoft is limiting internal employee use of Fable 5 over Anthropic's new data retention requirements, per The Verge. Fable 5 requires data retention to operate its safety classifiers, unlike other Claude models which run under Zero Data Retention rules.
5
12
388
16,827
Manish retweeted
Seeing a lot of Fable safeguards hate on the timeline, but "what did y'all think [AI safety] meant? vibes? papers? essays?" The reality is that there are real tradeoffs in AI safety. Anthropic deserves credit for aggressive resolution of these tradeoffs in favor of safeguards for a model that it believes (and is in fact) is a step-change in vulnerability research capability. It's kind of difficult to justify coercive proactive harm mitigation, especially in a libertarian-ish society, but we clearly see the value in mandatory vaccination programs or beatcop policing or surveillance cameras. We should applaud Anthropic for being one of the few institutions in American public life that actually follows through on its convictions, including in implementing really aggressive monitoring, squelching of AI development work (already accounted for in its ToS -- I think the clandestinity is cool too), and exclusionary limits on use for information security-related queries. The whole point here is that we do not have herd immunity here: our network edge devices, authentication apps/services, and productivity software are extremely vulnerable, not sandboxed, and lack introspection capabilities. We need programs like Glasswing, better cross-company threat detection, and a more effective APT exploitation strategy before we democratize such a robust vuln research capability. The counterfactual here is that MSS contractors use VPS to access Fable, find jailbreaks for weaker safeguards, and use the system to build an active directory exploit that enables remote access to every O365 app. Not so bueno, huh? This is incredibly hard; Anthropic may not have calibrated every safeguard correctly this time, but there'll be learning. Model release cycles are getting more concise: they will adapt as they better understand and mitigate risks and competitive pressures manifest. Histrionic claims of anti-competitive behavior and safetyist hysteria are victim to precisely the error that is being alleged.
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy
29
10
144
72,235
i'm really not getting the Fable hate they released something new and shiny and fancy but it's artificially restricted for various reasons? seems ... fine? they could have also not released it or even built it in the first place. why do people feel entitled to Better LLMs?
3
2
1,032
(i think there are some legit reasons to complain about this too, but so much of it seems to just be entitlement)
1
1
133

Jun 10
I find this reaction a bit off-putting and entitled. it reminds me of a broader pattern that bothers me, but I'm not sure exactly how to put my finger on it like... anthropic just increased the value of their product to their customers by releasing a powerful new model. they have increased consumer surplus. no one's subscription was contingent on this happening, especially not at a particular time. the release yesterday was a surprise but for some customers, the surplus was increased less. because, worst case, they have to use the new model in incognito but it was still increased, so... why start with a loud complaint instead of appreciation? why be negative instead of positive? doesn't feel right to me I feel like I see this all over the place. like I think it's good when stores are wheelchair accessible, but if a new one pops up and they don't have a ramp, I am more appreciative of its existence than I am disappointed in its subpar accessibility glosso has weird features and I'm not on board with all of them. but the majority of it is very good, so I'm focusing on the good, and I'm super appreciative that Aella's making it happen I've seen this with events too I think, people get so critical all the time, people are putting in hard work to give other people things for free or for cheap. people are creating massive consumer surplus all the time. yet there is still a tendency for others to come at things from such a critical angle I think it's mostly good when people do things and make things. and disproportionate criticism disincentivizes people to do things and make things
1
76