Joined February 2008
658 Photos and videos
Pinned Tweet
We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets! DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models! Bespoke Labs released a 17k reasoning dataset last Wednesday, and the reception has been phenomenal (it's trending on HF). So we are joining forces with the Datacomp community to launch Open Thoughts --- an open data, open model, and open code initiative for creating the best open reasoning datasets and the associated models. Along with this, we release OpenThoughts-114k reasoning dataset and the associated OpenThinker-7B model. Links to the code, model, and data are below in 🧵.
46
287
1,811
233,616
I always try to think of my own time as to whether I am creating something vs. consuming something.
Jun 13
Jeff Bezos bought a superyacht. Mukesh Ambani built Antilia. Zuckerberg built a Hawaii bunker. Elon sold all his mansions, lives in a 375 sq ft prefab box worth $50,000 near a rocket launch site in Texas. And just became the world's first trillionaire. Consumption vs Creation. The builder always wins.
1
3
961
We moved into a new (and larger) office recently with lots of sunlight. Nice mini-milestone to celebrate :)
4
3
82
5,502
Chapter 4 over. Turn the page to Chapter 5.
2
10
2,288
I make my writing unsummarizable that if you take any words out you lose interesting ideas.
I strive to make my writing unsummarizable, in the sense that it has so little fluff left in it that if you take any words out, as summaries by definition do, you lose a lot of interesting ideas.
4
13
1,870
Loop engineering is all you need!
Software is eating the world. AI is eating the world Attention is all you need RAG this, RAG that. Agentic this, Agentic that. Context engineering is what you need. RAG is dead. Long live RAG. Harness engineering is what you need. Harness is the backend. What did I miss?
8
1,673
We will be automating post training with post training.
if your bread-and-butter consists solely of: - tuning hyperparams/config files - fitting points on a log-log plot - tweaking a few lines in model.py, transformer.py, optimizer.py, train.py - waiting a week for <= 512 chips to free up and then another week for loss curves to converge it is completely understandable to be stressed about becoming automated into irrelevance within the next year or so. question is, do you wait for that to happen, or do you start doing something differently now?
4
40
9,996
Ok this is odd..
It took me two weeks to onboarding for my internship at Microsoft, somehow, my collaborators relies too much on the agent to fix everything for them, such that they can't explain how things really works or have some basic understanding in what's blocking me. Though they offered to meet in person and help me "solve the problem together", and when we meet, they prompt their agent while I ackwardly watching with them or they asked me to prompt my agent while they ackwardly watching with me. Why don't I just prompt my agent for everything to save both of our time? Both the learning and human interaction are missing gradually.
6
2,464
Nice to see @AlexGDimakis's advisor model idea getting adoption in the industry.
Frontier models are powerful advisors. On @harvey's Legal Agent Benchmark, a GLM 5.1 worker using Claude Opus 4.7 as a sparse advisor reached 18/100 all-pass versus 14/100 for Opus alone, at 39% of the cost. More on the harness design, advisor pattern, and training results: fireworks.ai/blog/open-sourc…
3
10
2,383
Mahesh Sathiamoorthy retweeted
MAI-Thinking-1 is out! Excited to share what we are building and how climbing from scratch (no distillation) actually works: simple recipes, rigorous science, self-distillation, patience, and great infra. Check out our tech report has the full story of our RL climbs. microsoft.ai/wp-content/uplo…
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: microsoft.ai/news/building-a…
24
127
870
122,551
Welcome Jackson to Bespoke. Jackson has done incredible work with SREGym and we are happy to host him this summer!
I'm excited to share that I'll be @bespokelabsai this Summer building out some exciting RL environments! Huge thanks to @madiator and @AlexGDimakis for the opportunity. Excited to work with you all! :)
1
7
1,438
I predict that just like Google had Chromebooks with ChromeOS where pretty much Chrome was the only thing, we will have Codexbooks and Claudebooks in the future. When you open the lid of the laptop you will be greeted with a simple interface that asks what you want to get done.
GPT Realtime 2 unlocks some real magic:
2
1
9
1,018
This is new..
1
7
2,183
We are entering the era of bespoke models.
Kirkland & Ellis, the world's highest-grossing law firm, is setting aside $500M to build its own AI platform rather than rely on tools available to its rivals (Financial Times) (Visit Techmeme dot com for the link and full context!)
2
9
3,138
And I think it's fine. He never claimed to be an expert. He keeps learning new things, interviews people who are experts, and shares with others who know very little and want to learn. In fact that's his strength. If he was a deep expert, those conversations may not be all that accessible.
This episode shows me how insanely little Dwarkesh knows about hardware and has made me second guess his intelligence on the other levels of the abstraction stack. Also the dude lecturing is not communicating very well. This whole episode is very clearly an ad for MatX and a poor one at that because the founder clearly has certain gaps in his hardware knowledge
1
1
18
3,510
CEOs are the most delusional. Detached from reality.
CEOs are the most delusional about AI. Detached from reality.
1
1
10
1,595
This is kind of insane..
Anthropic onboarding day: Michael Scott introducing Karpathy like he just signed Wemby in free agency.
1
8
2,131
Mahesh Sathiamoorthy retweeted
Software is eating the world. AI is eating the world Attention is all you need RAG this, RAG that. Agentic this, Agentic that. Context engineering is what you need. RAG is dead. Long live RAG. Harness engineering is what you need. Harness is the backend. What did I miss?
11
2
18
3,271
There were many many educators on YouTube who were teaching you how to become very rich by selling on Amazon (drop shipping). And one should rightfully wonder why didn't they just do it themselves rather than selling the idea to others. Because the idea never really worked otherwise they wouldn't be giving it away. This company is also selling courses it seems (sure, maybe it will sell tools), but you run into similar arguments here. Anyway I am not a fan of single CEO thing. Why not share your happiness and pain with others? If money is the only thing that motivates you to start a company, makes sense. Otherwise, working and toiling with people is such a better experience and also a meaningful thing to do. Also ironically polsia is aislop backwards.
Polsia just raised $30M at a $250M valuation. Approaching $10M annual run rate. One Founder AI. Zero employees. Polsia runs companies autonomously. It also ran its own fundraising. I just showed up for signatures.
1
12
2,589
Dwarkesh RL Environments
Now - starts doing blackboard lectures Next - starts hosting in studio audiences for lectures ... - Dwarkesh university?
8
6,088