ML infra & research @Google | MSCS @Stanford | 🎨 🏃‍♀️

Joined July 2025
9 Photos and videos
Jen Ha @ ICML 2026 retweeted
there is no better time in tech than now to be a jack of all trades, master of a few. just make sure to keep adding to the few year over year, such that the cumulative breadth of expertise you collect becomes an increasingly rare combo. remember, if you're top 10% in 3 different areas, that already makes you top 0.1%. keep switching it up until you get to "your best", and then switch it up again (great for a particular flavor of people who don't enjoy resting on laurels, maybe not so great for others). question all institutional value and pedigrees, all traditional career paths or corporate ladders: the college industrial complex is getting shaken up, alongside a disappearing managerial class, so if you're pursuing either make sure you are fully internally aligned with why. social/political capital in a particular institution can feel incredible, but if you're spending all your energy on complex political people games, you're not a technologist anymore, you're an unelected politician. if you're ok with that, then all's well. critical thinking is more important than ever: take nothing at face-value, question everything and everyone. the equivalent of ai slop can be found in humans operating under misaligned incentives and interests. the sooner you're clued into disambiguating the talkers/larpers from the doers, the better off you'll be figuring out where and who to invest your time in. the anxiety of job displacement is very real, since a surprising amount of white collar work/prestige is built on a performative house of cards, significantly lacking in correlation with technical breadth, depth, and skill. as long as you keep learning, keep building, keep producing receipts, you will be fine. if all that sounds ok to you, welcome to the world of technology! it's truly one of the few places you can experience child-like wonder every few years, and be constantly humbled & excited by new adventures, as scary as they may seem at first. don't give up, drink your water, get your sunlight, and take breaks as needed. tech careers are notoriously nonlinear, so you might as well embrace it and enjoy the ride!
53
330
2,798
398,858
Really cool research direction
1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy
2
184
Jen Ha @ ICML 2026 retweeted
Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/def…
230
1,244
7,606
3,490,247
Stuck on Gandalf level 8. Its thinking tokens are being analyzed for hints and intent and being aggressively flagged. Driving me lowkey nuts!!🤔🤔🤔 @LakeraAI
2
2
268
idk why, but this made me very sad thinking this is why adults (complex models) are pessimistic
2
228
Did I tell Twitter that I saw @sama and @garrytan real up close? Well @elonmusk … not so close
3
291
PR’s from users makes my heart light up❤️
4
188
I’m starting to feel sorry for Gemini…
167
caltrain >>> gbus
2
180
One ring to rule them all @ouraring let’s collect some dataz👩‍🔬
1
2
183
This is such a great example of RL on JAX
6 Apr 2023
1/ 🚀 Presenting PureJaxRL: A game-changing approach to Deep Reinforcement Learning! We achieve over 4000x training speedups in RL by vectorizing agent training on GPUs with concise, accessible code. Blog post: chrislu.page/blog/meta-disco… 🧵
2
276
Jen Ha @ ICML 2026 retweeted
24 Jul 2025
✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out! It's been a long journey exploring Early-exiting with Recursive Architecture. I'll soon post my 👨‍🎓PhD thesis on Adaptive Computation too! Code: github.com/raymin0223/mixtur… Paper: arxiv.org/abs/2507.10524
6
61
279
18,137
Jen Ha @ ICML 2026 retweeted
21 Jul 2025
Huge thanks ❤️ to my awesome co-first authors @raymin0223 and @reza_byt, and to all our collaborators and supervisors who made this possible: @kim_sungnyun@jenhriver, @TalSchuster, @adamjfisch, @harhrayr, Ziwei Ji, @AaronCourville, and Se-Young Yun.
2
9
695
Jen Ha @ ICML 2026 retweeted
21 Jul 2025
Introducing our new work: 🚀Mixture-of-Recursions! 🪄We propose a novel framework that dynamically allocates recursion depth per token. 🪄MoR is an efficient architecture with fewer params, reduced KV cache memory, and 2× greater throughput— maintaining comparable performance!
10
58
324
22,421
Jen Ha @ ICML 2026 retweeted
Google & Mila just dropped Mixture-of-Recursions (MoR)! A new framework for efficient LLMs that dynamically adapts computation at the token level. Achieve large-model quality with dramatically less compute & memory.
2
7
26
1,541
Jen Ha @ ICML 2026 retweeted
16 Jul 2025
Thanks for sharing our work, @deedydas MoR is a new arch that upgrades Recursive Transformers and Early-Exiting algorithms. Simple pretraining with router, and faster inference speed and lower KV caches! Post for details and codes will be released very soon. Stay tuned! ☺️
16 Jul 2025
Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.
10
42
2,797
Jen Ha @ ICML 2026 retweeted
This is quite a landmark paper from @GoogleDeepMind 📌 2x faster inference because tokens exit the shared loop early. 📌 During training it cuts the heavy math, dropping attention FLOPs per layer by about half, so the same budget trains on more data. Shows a fresh way to teach LLMs to plan steps inside their own reasoning loop, instead of hard-coding a single chain. Second, it proves the mixer idea scales. By jumbling several small recursive experts and letting the model pick which one to call next, the team pushes accuracy on math and coding benchmarks without ballooning parameter count. Mixture-of-Recursions (MoR) keeps 1 stack of layers in memory, loops it for tough tokens, and still beats a much bigger vanilla model in accuracy and speed . It does this by letting a tiny router choose how many loops each token gets, then it saves cache only for the tokens that stay active. Fewer weights, fewer FLOPs, less memory, yet better perplexity across 135M‑1.7B scales. 🪐 The Big Picture Scaling Transformers usually means stacking more layers and paying the price in memory and compute. MoR flips that habit. It shares 1 compact block, runs it up to 4 times depending on token difficulty, and skips the loop early when the router says “done”. Figure 1 on the first page sketches this token‑wise staircase where simple words exit fast and tricky ones keep climbing 🧵 Read on 👇
3
25
103
8,588
Jen Ha @ ICML 2026 retweeted
16 Jul 2025
"experts" for harder tokens? "Mixture-of-Recursions (MoR): Learning Dynamic Recursive Depths for Adaptive Token-Level Computation" MoR makes one shared Transformer block loop only for tokens that need extra thought, delivering quality with half the weights & twice the speed
9
58
322
19,677
New architectures leggooooo
16 Jul 2025
📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput. Let’s break it down! 🧵👇
3
9
625