Susan Zhang

Susan Zhang

9 Photos and videos

Tweets

Jen Ha @ ICML 2026 retweeted

Susan Zhang

@suchenzang

May 18

there is no better time in tech than now to be a jack of all trades, master of a few. just make sure to keep adding to the few year over year, such that the cumulative breadth of expertise you collect becomes an increasingly rare combo. remember, if you're top 10% in 3 different areas, that already makes you top 0.1%. keep switching it up until you get to "your best", and then switch it up again (great for a particular flavor of people who don't enjoy resting on laurels, maybe not so great for others). question all institutional value and pedigrees, all traditional career paths or corporate ladders: the college industrial complex is getting shaken up, alongside a disappearing managerial class, so if you're pursuing either make sure you are fully internally aligned with why. social/political capital in a particular institution can feel incredible, but if you're spending all your energy on complex political people games, you're not a technologist anymore, you're an unelected politician. if you're ok with that, then all's well. critical thinking is more important than ever: take nothing at face-value, question everything and everyone. the equivalent of ai slop can be found in humans operating under misaligned incentives and interests. the sooner you're clued into disambiguating the talkers/larpers from the doers, the better off you'll be figuring out where and who to invest your time in. the anxiety of job displacement is very real, since a surprising amount of white collar work/prestige is built on a performative house of cards, significantly lacking in correlation with technical breadth, depth, and skill. as long as you keep learning, keep building, keep producing receipts, you will be fine. if all that sounds ok to you, welcome to the world of technology! it's truly one of the few places you can experience child-like wonder every few years, and be constantly humbled & excited by new adventures, as scary as they may seem at first. don't give up, drink your water, get your sunlight, and take breaks as needed. tech careers are notoriously nonlinear, so you might as well embrace it and enjoy the ride!

330

2,798

398,858

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

Mar 19

Really cool research direction

Christos Tzamos @ChristosTzamos

Mar 11

1/4 LLMs solve research grade math problems but struggle with basic calculations. We bridge this gap by turning them to computers. We built a computer INSIDE a transformer that can run programs for millions of steps in seconds solving even the hardest Sudokus with 100% accuracy

0:22

184

Thinking Machines

Jen Ha @ ICML 2026 retweeted

Thinking Machines

@thinkymachines

10 Sep 2025

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to prompt engineering. Here we share what we are working on and connect with the research community frequently and openly. The name Connectionism is a throwback to an earlier era of AI; it was the name of the subfield in the 1980s that studied neural networks and their similarity to biological brains. thinkingmachines.ai/blog/def…

230

1,244

7,606

3,490,247

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

17 Aug 2025

Stuck on Gandalf level 8. Its thinking tokens are being analyzed for hints and intent and being aggressively flagged. Driving me lowkey nuts!!🤔🤔🤔 @LakeraAI

268

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

15 Aug 2025

idk why, but this made me very sad thinking this is why adults (complex models) are pessimistic

228

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

11 Aug 2025

Did I tell Twitter that I saw @sama and @garrytan real up close? Well @elonmusk … not so close

291

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

9 Aug 2025

PR’s from users makes my heart light up❤️

188

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

8 Aug 2025

I’m starting to feel sorry for Gemini…

167

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

8 Aug 2025

caltrain >>> gbus

180

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

7 Aug 2025

One ring to rule them all @ouraring let’s collect some dataz👩‍🔬

183

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

30 Jul 2025

This is such a great example of RL on JAX

Chris Lu

@_chris_lu_

6 Apr 2023

1/ 🚀 Presenting PureJaxRL: A game-changing approach to Deep Reinforcement Learning! We achieve over 4000x training speedups in RL by vectorizing agent training on GPUs with concise, accessible code. Blog post: chrislu.page/blog/meta-disco… 🧵

276

Sangmin Bae

Jen Ha @ ICML 2026 retweeted

Sangmin Bae

@raymin0223

24 Jul 2025

✨Huge thanks for interest in Mixture-of-Recursions! Codes are officially out! It's been a long journey exploring Early-exiting with Recursive Architecture. I'll soon post my 👨‍🎓PhD thesis on Adaptive Computation too! Code: github.com/raymin0223/mixtur… Paper: arxiv.org/abs/2507.10524

279

18,137

Yujin Kim

Jen Ha @ ICML 2026 retweeted

Yujin Kim @yujin301300

21 Jul 2025

📄Paper: arxiv.org/abs/2507.10524 🧑‍💻Code: github.com/raymin0223/mixtur… We actually have much more interesting experiments and findings in our paper🤩, check it out! Discussion and comments are very welcome!

Mixture-of-Recursions: Learning Dynamic Recursive Depths for...

Scaling language models unlocks impressive capabilities, but the accompanying computational and memory demands make both training and deployment expensive. Existing efficiency efforts typically...

arxiv.org

541

Yujin Kim

Jen Ha @ ICML 2026 retweeted

Yujin Kim @yujin301300

21 Jul 2025

Huge thanks ❤️ to my awesome co-first authors @raymin0223 and @reza_byt, and to all our collaborators and supervisors who made this possible: @kim_sungnyun , @jenhriver, @TalSchuster, @adamjfisch, @harhrayr, Ziwei Ji, @AaronCourville, and Se-Young Yun.

695

Yujin Kim

Jen Ha @ ICML 2026 retweeted

Yujin Kim @yujin301300

21 Jul 2025

Introducing our new work: 🚀Mixture-of-Recursions! 🪄We propose a novel framework that dynamically allocates recursion depth per token. 🪄MoR is an efficient architecture with fewer params, reduced KV cache memory, and 2× greater throughput— maintaining comparable performance!

324

22,421

DailyPapers

Jen Ha @ ICML 2026 retweeted

DailyPapers

@HuggingPapers

15 Jul 2025

Google & Mila just dropped Mixture-of-Recursions (MoR)! A new framework for efficient LLMs that dynamically adapts computation at the token level. Achieve large-model quality with dramatically less compute & memory.

1,541

Sangmin Bae

Jen Ha @ ICML 2026 retweeted

Sangmin Bae

@raymin0223

16 Jul 2025

Thanks for sharing our work, @deedydas MoR is a new arch that upgrades Recursive Transformers and Early-Exiting algorithms. Simple pretraining with router, and faster inference speed and lower KV caches! Post for details and codes will be released very soon. Stay tuned! ☺️

Deedy

@deedydas

16 Jul 2025

Google DeepMind just dropped this new LLM model architecture called Mixture-of-Recursions. It gets 2x inference speed, reduced training FLOPs and ~50% reduced KV cache memory. Really interesting read. Has potential to be a Transformers killer.

2,797

Rohan Paul

Jen Ha @ ICML 2026 retweeted

Rohan Paul

@rohanpaul_ai

16 Jul 2025

This is quite a landmark paper from @GoogleDeepMind 📌 2x faster inference because tokens exit the shared loop early. 📌 During training it cuts the heavy math, dropping attention FLOPs per layer by about half, so the same budget trains on more data. Shows a fresh way to teach LLMs to plan steps inside their own reasoning loop, instead of hard-coding a single chain. Second, it proves the mixer idea scales. By jumbling several small recursive experts and letting the model pick which one to call next, the team pushes accuracy on math and coding benchmarks without ballooning parameter count. Mixture-of-Recursions (MoR) keeps 1 stack of layers in memory, loops it for tough tokens, and still beats a much bigger vanilla model in accuracy and speed . It does this by letting a tiny router choose how many loops each token gets, then it saves cache only for the tokens that stay active. Fewer weights, fewer FLOPs, less memory, yet better perplexity across 135M‑1.7B scales. 🪐 The Big Picture Scaling Transformers usually means stacking more layers and paying the price in memory and compute. MoR flips that habit. It shares 1 compact block, runs it up to 4 times depending on token difficulty, and skips the loop early when the router says “done”. Figure 1 on the first page sketches this token‑wise staircase where simple words exit fast and tricky ones keep climbing 🧵 Read on 👇

103

8,588

alphaXiv

Jen Ha @ ICML 2026 retweeted

alphaXiv

@askalphaxiv

16 Jul 2025

"experts" for harder tokens? "Mixture-of-Recursions (MoR): Learning Dynamic Recursive Depths for Adaptive Token-Level Computation" MoR makes one shared Transformer block loop only for tokens that need extra thought, delivering quality with half the weights & twice the speed

322

19,677

Jen Ha @ ICML 2026

Jen Ha @ ICML 2026 @jenhriver

20 Jul 2025

New architectures leggooooo

Reza Bayat

@reza_byt

16 Jul 2025

📄 New Paper Alert! ✨ 🚀Mixture of Recursions (MoR): Smaller models • Higher accuracy • Greater throughput Across 135 M–1.7 B params, MoR carves a new Pareto frontier: equal training FLOPs yet lower perplexity, higher few‑shot accuracy, and more than 2x throughput. Let’s break it down! 🧵👇

625