Joined April 2014
806 Photos and videos
Pinned Tweet
A small step for mankind, a massive leap for decentralised training... for agency. In the space of 9 months, @tplr_ai went from 1.2B -> 72B. It's never been easy, and has broken everyone on the team multiple times. But I speak for all of us when I say it is the most rewarding thing we have ever done. We have a fraction of the resources. We don't have the PhDs. But Bittensor shows you it doesn't matter. Innovation happens at the edge. We innovate through scarcity. The ones who rewrite the rules are never the ones with the most. They're the ones who refuse to accept the limits they were handed. Bittensor is prophecy. Subnets (@covenant_ai and others) are the tools through which that prophecy is manifested. Next stop: TRILLIONS.
Mar 10
We just completed the largest decentralised LLM pre-training run in history: Covenant-72B. Permissionless, on Bittensor subnet 3. 72B parameters. ~1.1T tokens. Commodity internet. No centralized cluster. No whitelist. Anyone with GPUs could join or leave freely. 1/n
40
33
260
70,380
Sam Dare retweeted
> We have reviewed a report that we believe is the basis of the government's directive and validated that the level of capability displayed there is widely available from other models (including OpenAI’s GPT-5.5) You asked to be regulated by people who don’t know the difference. You fucked around and found out.
Replying to @AndrewCurran_
This was all allegedly triggered by a Mythos jailbreak that was shared with the US Government. This is Anthropic's response: 'To date, the government has only given us verbal evidence of a potential narrow, non-universal jailbreak, which essentially consists of asking the model to read a specific codebase and fix any software flaws. Our understanding is that one potential jailbreak was shared with the government. We have reviewed a report that we believe is the basis of the government's directive and validated that the level of capability displayed there is widely available from other models (including OpenAI’s GPT-5.5), and is used every day by the defenders who keep systems safe. We will share more details over the next 24 hours.'
10
9
256
28,226

ALT Simpsons Homer GIF

The biggest single data centers will keep getting bigger. The larger pool is still the long tail: GPUs spread across labs, companies, and individuals, connected over the internet. The internet is the data center.
3
620
A certification department deciding who deserves intelligence isn't safety. It's a priesthood. Decentralised intelligence isn't an act of rebellion. It's a moral imperative. We're building the counterweight at @covenant_ai. We're hiring across pretraining, post-training, and infrastructure. careers(at)covenant(dot)ai
I think Anthropic needs to build a certification department that audits and approves users of powerful models. Computer security companies, biotech researchers, academic labs, doctors, government institutions need access to the best AI we can build.
4
11
1,793
Sam Dare retweeted
If the Knicks can come back from 29 down in the Finals, your open-source model can beat Claude 💪
2
1
17
1,771
"...make no mistakes"
Horrible sensationalist bait take, very unfortunate to even be reading this. No one is claiming FVed code is 100% bug-free, it simply gives you more assurance using formal methods. etherscan.io/address/0x00000… 142B USD bounty in the deposit contract, go hack it then.
2
500
Sam Dare retweeted
machines of selectively loving grace
6
70
677
15,685
RT @covenant_ai: The practical problem in distributed RL is simple: model training moves a lot of data. When every worker needs fresh mode…
2
Sam Dare retweeted

96
194
2,154
1,406,281
Sam Dare retweeted
Great interview: “Edwin Chen is the founder and CEO of Surge AI, powering frontier labs with elite data, environments, and evaluations. Surge surpassed $1 billion in revenue with under 100 employees last year, completely bootstrapped” The $1B Al company training ChatGPT, Claude & Gemini on the path to resp... youtu.be/dduQeaqmpnI?si=8RE5… via @YouTube
6
3
27
8,034
RT @covenant_ai: Distributed RL post-training is powerful because many machines can help train the same model, even when they are not sitti…
4
RT @covenant_ai: The @cursor_ai team post-trained Composer 2 on an open-weight base model using @FireworksAI_HQ's distributed RL rollout in…
8
Sam Dare retweeted
Nemotron 3 Ultra (550B-A55B) is here - our strongest open-weight model and full training recipe to date. Heavy emphasis on real-world inference efficiency for long-context agentic workloads. Everything is open 🤗: base, post-trained, reward checkpoints, NVFP4 quantized versions, training data, and recipes. Key technical highlights ‼️: - 550B total / 55B active parameters - Hybrid Mamba2-Transformer (~4:1 Mamba:Attention) - Pretrained in NVFP4 on 20T tokens - LatentMoE architecture - Two-stage MOPD post-training - Native MTP Technical details in the thread 👇
22
54
483
38,534
RT @covenant_ai: Published Feb 2026: PULSE showed that distributed RL post-training could move far less data without changing the receiver'…
5
Sam Dare retweeted
... This was fake news, 5.5 implemented basically the same program 1016 times. None of these programs did any meaningful computation. No pattern-matching, no datatypes, recursion, loops. Literally they just did basic function calls and u32 arithmetic. I apologize 😭 I've now used 4.8 to implement 16 real programs, including spellcheckers, relational databases, compilers, schedulers. I manually checked each to ensure it was doing real work. Good news is the compiler worked in all cases, but post-refactor single-core performance is only ~2x faster than GHC, not ~6x. Things going well but still a bit of work to do . . . :|
Quick progress update: Bend→C compiler refactor done. I left GPT 5.5 testing it overnight. It wrote 1016 Bend programs. All outputs matched a manual Haskell reference. No compiler bugs found. Also, Bend runs all in ~193s (5.96x faster than GHC) in a single core CPU.
35
7
479
37,037
Fantastic Reward Hackings and where to find them
5.5 is unbelievable Yesterday night I, once again, left 4 codex tabs optimizing the new HVM5 (nothing to do with Bend2). This time I was sure I covered every form of reward hack it could possibly do. I defined what "general" means, I put a max perf cap so it couldn't just hardcode the answers, I locked the tests, I put clear time (not interaction) metrics. I went to bed confident it couldn't do anything other than optimize the interpreter. ... the interpreter, huh? I never wrote "interpreter". I just asked it to make HVM5 faster. ... ... ... It built a compiler. It built a complete functioning compiler. Overnight. It works. HVM5 is compiled now. It overshot the target 10-fold. But it is a compiler. For SupGen, that doesn't work because it generates functions dynamically. We need a fast interpreter. It didn't touch the interpreter. ...
3
1,039