Founder & CEO of Alembic, CUDA architect, wrangler of GPUs, 1st gen Cuban. I tweet in long delayed bursts as I’m usually too busy building.

Joined June 2008
370 Photos and videos
Tomás Puig retweeted
Fable 5 refused 200 out of 200 ProgramBench tasks lmao
125
184
5,167
409,379
The Alembic racks are always a good sight.
2
60
Tomás Puig retweeted
Anthropic read the Three Body Problem and decided the best idea in the whole trilogy was the sophon lock
17
51
756
31,009
Tomás Puig retweeted
We consume data we did not create. We inherit tools we did not invent. We run on chips we did not make. But when the commons bears fruit, we fence it.
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy
19
90
1,113
37,782
Tomás Puig retweeted
I plan to live Anthropically. If someone asks me about something I don't like I'll just become a stupider version of myself
59
362
4,553
149,638
All the love that I see Muon getting often seems to come from people that don’t have to be concerned about cluster efficiency. ADAM is the king of keeping life more sane for distributed systems.
1
124
Every founder I know has a list… it lives rent free in our heads.
Anthropic’s last round was apparently a bloodbath behind the scenes. A GP at a prominent fund had dinner with Dario three times before their allocation was slashed to zero. At least four other tier-one funds got pulled at the last minute. Their crime? Passing on the Series B, the hardest round Dario ever had to raise (led by Spark). In venture conviction is all that counts.
1
333
Tomás Puig retweeted
Replying to @elonmusk
I don’t quite get the flex here. Our in-house version of this has been finished for a while. In fact the next step hopefully is that they’re doing family tree like SHARP and NVSHMEM with in-box, in-rack, and cross-rack mapping. Then you get all-reduce out of the GPU completely and onto network. This gets pretty hairy because you do need to make sure you’re on NCCL 2.28 as they only allowed auto shrink of the SHARP tree after. If they’re on NetworkX and not infiniband then no SHARP. Sync hiding if the all reductions becomes even more key. I’m also curious what they’re maxing their Tensor Parallelism and Data Parallelism at. There is an inherent internal pressure of optimizer steps for over training and TP / DP. Assuming NVL72 racks and you max TP nets out to realistically 64 a rack with the additional trays as run spares and orchestration. In real world practice though you rarely take TP above 8-16 since network traffic would crush you without perfect async. I do think the trend to NetworkX will come back to bite people a bit since you’re sacrificing 10-30% GPU flops by refusing to offload it to the network. Unspoken secret… the hard part isn’t writing the software… the hard part is writing the network warm up routine on the machines and nics once that software is available.
2
6
197
Tomás Puig retweeted
Introducing Kernels on the Hugging Face Hub ✨ What if shipping a GPU kernel was as easy as pushing a model? - Pre-compiled for your exact GPU, PyTorch & OS - Multiple kernel versions coexist in one process - torch.compile compatible - 1.7x–2.5x speedups over PyTorch baselines
67
221
1,640
208,353
Tomás Puig retweeted
We've been tricked, again. Many of the thousands of bugs and vulnerabilities Mythos found are in older software are impossible to exploit. And the severe zero-day reports rely on just 198 manual reviews tomshardware.com/tech-indust…
234
849
7,294
824,075
Tomás Puig retweeted
That new LFM2.5-350M is super overtrained, right? And everyone was shocked about how far they pushed it? As it turns out, we have a brand new scaling law for that! 🧵 [1/n]
11
53
362
67,841
AI news in a nutshell today

ALT Dr Strangelove War Room GIF

208
As a Latino founder nice to finally see a Latino Super Bowl half time show live. P.S. the trees were actual people!
1
1
14
3,458
Tomás Puig retweeted
Estuvo hermoso, todos los latinos lo sentimos #Halftime #SuperBowl
1
1
995
Every single time I have to fly @united I’m reminded why I switched all status to @Delta.
165
One of the speaking engagements at Davos.
1
211
I’m speaking alongside the Davos World Economic Forum in Switzerland this week. First time here and so very curious what it will be like. My talk is Tuesday and hosted at the Forbes stage on “Causal AI and the New Rules of Decision & Power”.
106
NVIDIA CES keynote is “shots fired” at all specialized, non-general first, models. First NVIDIA creatES open general models, partners all do RL and specificity, all runs on GPU. Interesting to see the codification of training, simulation, and inference at edge. Watching RTX move from gaming to simulation is interesting to say the least.
1
1
228
I really wish NVIDIA would stop using FP4 for every single metric. It’s so hard to guess what it does in real world workloads.
155
Tomás Puig retweeted
28 Dec 2025
Git worktrees are perfect for starting sandboxes for agents to propose a solution to a problem while you keep working on master or another branch. Here's the bash I use to start a new worktree/branch with "ga fix" (i.e fizzy--fix) and then "gd" after it's done to nuke it again.
67
133
2,134
188,203