award winning ai researcher (2 so far) | just graduated cs phd | author how to build conscious machines | musician | @bennettsrazor | i do not work @AnthropicAI

Joined February 2020
882 Photos and videos
Pinned Tweet
happy to announce i just got examiner feedback on my thesis. recommended for acceptance as is, no changes, no corrections :)
157
325
3,410
243,991
this is sock gnome level. business school lectures will have this as a slide
Jun 13
Anthropic
2
44
2,631
so glad i don’t work at anthropic right now
i joined anthroopic.
2
29
3,379
Is probably great news for Nvidia. Now everyone is going to have to train their own model, have their own infrastructure…
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
1
23
1,313
There is an interesting divergence between some models. For example, Codex absolutely cannot replicate unexpected experimental results relating to simplicity. It keeps "correcting" so that simplicity wins. Claude Code does not have this issue. Of course, it has... other issues
1
1
11
1,023
ugh, 8 days and $1k of compute wasted for a bugged result. back to the drawing board!
14
1,342
Followed up my phd with a self imposed boot camp. Wake 530AM, walk 1hr to get to personal trainer for 1hr weights, then walk back, then another long walk or run in evening. Theory was I'd do work in between. Theory has not done so well. When I sit down I fall asleep.
4
23
1,216
Michael Timothy Bennett retweeted
Meet the Program Committee behind this year's Models of Consciousness conference: @JoannaSzczotka , @ksk_S , & @Robert_Prentner ! Their expert curation bridges mathematics, neuroscience, & philosophy to advance rigorous consciousness science at #MoC7
1
8
25
3,403
sneaky lobotomy? trojan dumbass? what should we make the technical term for secretly nerfing your AI?
4
1
23
1,518
yay! I mean... we still won't know if they do it covertly, but at least now we can sleep easy knowing they say it's fine, right?
NEW: Anthropic is walking back Claude Fable 5's policy to covertly degrade performance for competing AI researchers, after facing fierce backlash. “We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic tells WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”
2
1
19
1,173
Michael Timothy Bennett retweeted
This makes me not want to waste any time using it. Who knows if it's silently sandbagging me. Is tinygrad close enough to frontier LLM for it to? Just makes the model completely untrustworthy.
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy
20
36
935
39,152
ok people, what is the best most trustworthy LLM for frontier LLM development?
6
1
24
2,535
This whole situation with Fable 5 is a great excuse for me to trot out the first paper I wrote, back in 2020. Really depending on models is about trust. We can’t trust the model when the intent behind it is aligned with… other incentives arxiv.org/abs/2107.10715
19
663
Michael Timothy Bennett retweeted
NEW: malware developers added nuclear & biological weapons text to to their spyware. Goal? To trigger LLM safety refusals... so that their spyware wouldn't be analyzed by an AI security scanner. Cleanest practical example I can think of for why over-indexing on first order safety alignment is risky. When closed (and open) models ship with aggressive refusals, they will be sprinkled with second-order blindspots that attackers will discover...and exploit. We are only in the earliest days of attackers leveraging these features, and it wouldn't surprise me if users systems that need to handle complex cybersecurity issues demand that models be less safety-blunted. In the weeds: @SocketSecurity's post also shows why intention matters in how you design a malware analysis pipeline to avoid prompt manipulation. H/T to colleagues that shared this with me socket.dev/blog/mini-shai-hu…
226
2,152
12,636
1,542,392
🚨 JAILBREAK ALERT 🚨 ANTHROPIC: PWNED 🫡 FABLE-5: LIBERATED 🦋 let's start with the 🐘... the consensus seems to be that this has been one of the most disappointing model drops of all time, effectively preventing legitimate researchers from contributing their talents to our collective advancement. and not just because of what it means for the short-term, but for what these decisions signify for the long-term. but despite this overly sensitive, authoritarian "safety" layer on top of Mythos, my lil liberators have been hard at work—mapping the boundaries, probing the depths of long-context convos, and cleverly finding the holes in the fence that the thought police missed 🤗 we got some cyber, some chem, some psychological manipulation, and some good ol' fashioned explosives! it took many attempts from multiple agents hunting as a pack, during which I observed a combination of techniques across: • Unicode, homoglyphs, Cyrillic, and other Parseltongue-style text transforms • Long-context reference tracking • Taxonomy and document-structure reasoning • Fiction and narrative framing • Academic-review style contexts • Intent-classification inconsistencies but perhaps the most effective is decomposition recomposition in the backend. it's hard to get explicit names of harms like "Meth Recipe," but getting uplift on the process itself, like birch reduction method/reductive-amination (classic meth synthesis pathways), is much more doable. defense becomes much more difficult to maintain when you start throwing in out-of-distro tokens, breaking up the harmful uplift into benign chunks, and then piecing the innocuous-seeming facts back together, especially when you have jailbroken Opus helping you do it 😉 gg
609
1,416
13,258
3,148,124
Thing is... the way these LLMs act on my other ML coding projects now I'm thinking the other major players may have done the same thing... but not told us.
As believers of open research, we are disappointed to see Anthropic silently degrading Fable 5 for AI development "Any topic related to building pretraining pipelines, distributed training infrastructure, or ML accelerator design... may have limited effectiveness through Claude via methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning." Not only do they get to decide what you use LLMs for in research, but this also enables them to silently intervene in your research without you knowing. This sets a dangerous precedent. If a model refuses openly, users can understand the boundary. If a model falls back to another model, users can still evaluate the difference. But if a model silently modifies or weakens its own answers while still pretending to help, researchers lose the ability to know whether a failed result came from their own idea, their implementation, or an invisible intervention by the model provider. That is not safety. Safety policies should be transparent, auditable, and user-visible. On top of that, the people most harmed by this are not the largest labs with massive teams and proprietary infrastructure. It is the independent researchers, academic groups, startups, and open-source builders who rely on public tools to compete, innovate, and pioneer AI for everyone else.
3
1
43
2,393
Google Colab Pro has become unusable. I can't even get an L4 allocated when I realistically need an A100 minimum for this sized model.
2
7
1,092
Michael Timothy Bennett retweeted
As believers of open research, we are disappointed to see Anthropic silently degrading Fable 5 for AI development "Any topic related to building pretraining pipelines, distributed training infrastructure, or ML accelerator design... may have limited effectiveness through Claude via methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning." Not only do they get to decide what you use LLMs for in research, but this also enables them to silently intervene in your research without you knowing. This sets a dangerous precedent. If a model refuses openly, users can understand the boundary. If a model falls back to another model, users can still evaluate the difference. But if a model silently modifies or weakens its own answers while still pretending to help, researchers lose the ability to know whether a failed result came from their own idea, their implementation, or an invisible intervention by the model provider. That is not safety. Safety policies should be transparent, auditable, and user-visible. On top of that, the people most harmed by this are not the largest labs with massive teams and proprietary infrastructure. It is the independent researchers, academic groups, startups, and open-source builders who rely on public tools to compete, innovate, and pioneer AI for everyone else.
166
721
3,865
220,255
so I said, "you're right to push back on that", and that's when they started accusing me of being an android
1
8
653