Joined May 2019
2 Photos and videos
Today marks one year since I began my new career as a researcher at @colfaxintl! I am incredibly grateful to my brilliant team -- Hassan, Jack, Jay, Paul, Ryo, and Frank -- as well as the collaborators we've been fortunate enough to work with.
4
157
Reuben Stern retweeted
I've defended and graduated! Perhaps the most important lesson I've learned during my time at MIT is that progress in science (and in society!) is deeply collective. In today's world --- and especially in a hyper-competitive field like AI research --- it's easy to get sucked into comparison and self-doubt. Much of this, I think, comes from a misunderstanding of how scientific progress actually works: we tend to attribute oversized credit to a small number of figures. But certainly none of the work I've done, and none of the growth I've undergone, would have been possible without the support of my mentors, collaborators, and the insights of millions of brilliant scientists before me. Along these lines, I am grateful to the amazing community around me who have supported my journey: most importantly, to my advisor @jacobandreas, the dozens of collaborators I've worked with during my PhD, my labmates, my mentees, and my co-organizers at @MITGradUnion --- all of whom have shown me, in various ways, what it means to work not out of comparison but out of love: for science, for the community around me, and for humanity. I hope to carry forward these values wherever I go.
37
22
524
39,719
We have enjoyed collaborating with @thinkymachines on some of the attention backend that supports this impressive work. Congrats to everyone involved!
People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…
3
81
All my homies love CLC
I alluded to this a few tweets ago but just pushed up a shortish blog on a subtle feature of CLC work stealing that makes cuda-graphable grouped_gemm possible with this scheduling mode: drisspg.github.io/nuggets/A-…
2
124
My colleagues Jack Carlisle and Jay Shah gave a fantastic lecture for @GPU_MODE yesterday on our categorical foundations for CuTe layout algebra! They were joined by Cris Cecka, the inventor of CuTe, and @marksaroufim as moderators. Bravi tutti! youtu.be/MVh_guNbWMA?si=zpiP…
2
9
59
5,449
Reuben Stern retweeted
🚀 Linear Attention is unlocking million-token context windows by dropping computational complexity from O(N^2) to O(N), but software is increasingly bottlenecking the hardware. Meet cuLA (CUDA Linear Attention): hand-written kernels using CuTe DSL & CUTLASS C to extract maximum performance on NVIDIA GPUs. A drop-in replacement for FLA designed to push hardware to its absolute limits.
6
48
389
91,366
Reuben Stern retweeted
Thank you to the companies and open-source communities behind Kimi K2.5, Ray, ThunderKittens, PyTorch, and more. We'd also like to thank Fireworks and Colfax for their collaboration and partnership.
9
8
297
74,027
Reuben Stern retweeted
Mar 23
PyTorch 2.11 is now available, featuring 2,723 commits from 432 contributors since PyTorch 2.10. This release prioritizes performance scaling for distributed training and next-generation hardware architectures. Highlights include a FlashAttention-4 backend for FlexAttention on Hopper and Blackwell GPUs, Differentiable Collectives for distributed training, and performance optimizations for Intel GPUs via XPU Graph. This release also delivers comprehensive operator expansion for Apple Silicon (MPS) and RNN/LSTM GPU export support. 🖇️ Read the PyTorch 2.11 release blog and release notes: pytorch.org/blog/pytorch-2-1… #PyTorch #OpenSource #AIInfrastructure
13
82
616
58,833
Reuben Stern retweeted
Mar 17
The frontier has increasingly shifted to hybrid models - from Qwen to Kimi-Linear and now with NVIDIA's Nemotron-3 Super - that rely on a strong linear sequence model. Today we release Mamba-3, the most powerful linear model to date. x.com/_albertgu/status/20339…

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!
11
113
842
78,302
It's been great working on the FA-4 backend to FlexAttention -- check out this blog post to learn more!
FlexAttention now has a FlashAttention-4 backend. FlexAttention has enabled researchers to rapidly prototype custom attention variants—with 1000 repos adopting it and dozens of papers citing it. But users consistently hit a performance ceiling. Until now. We've added a FlashAttention-4 backend to FlexAttention on Hopper and Blackwell GPUs. PyTorch now auto-generates CuTeDSL score/mask modifications and JIT-instantiates FlashAttention-4 for your custom attention variant. The result: 1.2× to 3.2× speedups over Triton on compute-bound workloads. 🖇️ Read our latest blog here: hubs.la/Q045FHPh0 No more choosing between flexibility and performance. hashtag#PyTorch hashtag#FlexAttention hashtag#FlashAttention hashtag#OpenSourceAI
1
69
Reuben Stern retweeted
New blog post on introspection for interpretability, and why I think training models to self-explain is a promising frontier for interpretability research:
8
37
246
23,607
Reuben Stern retweeted
25 Sep 2025
Friends at @colfaxintl released an excellent work on the mathematical foundation of CuTe layouts. CuTe layouts are central to the modern programming models on NVIDIA GPUs. You can (almost) ditch C , but you cannot ditch CuTe. In fact, you can (almost) ditch C because of a thing called Python CuTe DSL. And to use that, you must know CuTe layouts. Despite their role, CuTe layouts are highly unintuitive. I think you can only make sense of CuTe layouts with some mathematical guarantees about their peculiar behaviors. Colfax friends provided that 👇 It's amazing that these best things are free.
2
15
165
15,589
Amazing, Matt! Combined with one of my favorite acts of opera ever 😃
just announced! I can't wait to be reunited with the great @METOrchestra and @nezetseguin for the premiere of my orchestral "Lear Sketches" at @carnegiehall next June. on the second half: @travlingtenor and @ReneeFleming sing Act 4 of "Otello" (!) carnegiehall.org/Calendar/20…
2
the score fairy arrived today, and now my library of #bacewicz's orchestral works is nearly complete! @PWMedition
1
Reuben Stern retweeted
On Feb 26, The Harvard-Radcliffe Orchestra, led by Federico Cortese, performs Hannah Lash's "Forestallings" including the premiere of the 3rd movement commissioned by the HRO Foundation in honor of Cortese's 10 years as HRO's music director. hrofoundation.org/events/the…
1
1
Reuben Stern retweeted
Bravo, @reubenconducts for creating a wonderful introduction to the life and music of @PWMedition composer, Grażyna Bacewicz! stern-reuben.com/blog/2022/2…
Happy birthday, #GrażynaBacewicz! Today I'm celebrating the 113th birthday anniversary of the fantastic composer with a blog post I wrote introducing people to her life and work: stern-reuben.com/blog/2022/2… #bacewicz #polishmusic #classicalmusic #womencomposers
2
4
Grateful for our continued collaboration @PolishInstNY!
🎂 Happy birthday, Bacewicz! Today—Feb 5, 2022—is Grażyna Bacewicz’s 113th birthday. To celebrate, @reubenconducts prepared a magnificent blog post to walk us through Bacewicz’s biography, a guide to her works, suggested listening, and further reading. 🎻 bit.ly/3Lovhsz
1
Reuben Stern retweeted
6
35