Joined September 2009
14 Photos and videos
This is going to be a super fun workshop on scaling & learning dynamics & optimization! Please consider submitting your best work there!
The High-Dimensional Learning Dynamics Workshop @ ICML 2026 🇰🇷, with a special focus on scaling laws, is coming up, July 10! Submission Deadline: May 11 AoE (extended). @Locchiu @albertobietti @JustinLin610 @inbarser @BachFrancis @ShamKakade6 @andrewgwils @blake__bordelon @_brloureiro @Qiuzihanhan
2
8
2,190
The originality and the depth of science are really impressive. High thinking, signal to flop ratio. Congrats Damien, Elliot, Courtney et al.
1/10 We built ADANA, an optimizer that gets better as you scale. It extends AdamW with log-time schedules for momentum and weight decay — same hyperparameter count, no extra engineering. Scaled from 45M to 2.6B, it saves ~40% compute vs tuned AdamW, and the gap keeps growing.🧵
1
35
4,342
Lechao Xiao retweeted
Mathematicians 🤝AI researchers arxiv.org/abs/2601.22401. Our take on AI solving Erdos problems: * Many "Open" problems are actually just obscure: many cases the AI didn't find something new, only rediscovered solutions buried in the literature. We present our systematic approach to reporting AI results on Erdos. * The real bottleneck is still human labor, e.g. we spent lots of time filtering out technically correct but meaningless solutions (AI missed Erdos’s original intent). * Acceleration in solving low-hanging fruits is real, but we also need to highlight the many more misses that require human auditing. Clear research directions ahead though, and we feel optimistic about drastically increasing the signal-to-noise ratio. More to come!
Replying to @lmthang
Here's the paper link to our scaled effort for tackling Erdős problems. We started with 700 problems marked ‘Open’ in the database. Our agent #Aletheia identified potential solutions to 200 problems. Initial human grading revealed 63 correct answers, followed by deep expert evaluation and discussion to eventually arrive at meaningful proofs to 13 Erdős problems. arxiv.org/abs/2601.22401
9
33
203
29,755
Lechao Xiao retweeted
I am hiring a student researcher to work with our team in Montreal on LLMs architecture and pre-training in spring-summer 2026, if you're excited to push the frontier of research forward, join us to help keeping the TPUs warm. fill out this form: forms.gle/1AfdyCbzjdKi2yAb7

11
42
465
36,214
23 Oct 2025
First scaling law: performance follows a power law of compute, with its exponent governed by science and engineering. Second scaling law: the total improvement of this law follows a power law of resource, with its exponent governed by vision and conviction.
5
802
Lechao Xiao retweeted
Great talk from ⁦@ShikaiQiu⁩ on scaling collapse, which can be used as a sensitive diagnostic for model specification (LR, etc) at small scale that transfers to large scales! arxiv.org/abs/2507.02119
2
2
31
3,192
15 Jul 2025
come and enjoy the blessing from universality
14 Jul 2025
Realistic training dynamics are too complex to be described by simple scaling laws with hand-picked formulas, yet they obey precise scaling trends and universality. Join me Tue morning at the Theory and Phenomenology oral session for an alternative approach that gets us farther!
6
1,003
14 Jul 2025
Learning rate schedule is a big mystery in machine learning that is underinvested in both theory and practice. Shikai did an awesome job in revealing that simple quadratic models can indeed capture many key insights from big ones.
12 Jul 2025
📉Learning rate decay is super effective and sometimes mysterious, but the simplest model of SGD on quadratic loss w/ noisy gradients almost perfectly predicts loss curves of transformers trained with Adam on real data, across schedules, model sizes, and token budgets. 1/4
2
13
132
12,849
Lechao Xiao retweeted
8 Jul 2025
While scaling laws typically predict the final loss, we show in our ICML oral paper that good scaling rules enable accurate predictions of entire loss curves of larger models from smaller ones! w/@Locchiu, @andrewgwils, J. Pennington, A. Agarwala: arxiv.org/abs/2507.02119 1/10
4
39
233
27,489
29 May 2025
Really impressive work! Changing the exponent is highly nontrivial.
It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer! Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!
8
1,130
31 Jan 2025
Very cool! Congrats and thanks for trusting our codebase! @peterjliu (creator of nanodo), @ARomanNovak ,@_katieeverett (pre-)lead of the codebase.
Engineering shoot-out to Nanodo & Drjax for making it possible. Nanodo (github.com/google-deepmind/n…) is a great jax open-source codebase from @Locchiu @Mitchnw et al. DrJax (github.com/google/drjax) allow us to efficiently parallelize training. Look at their paper (arxiv.org/abs/2403.07128), those guys were already training distributed training at 8B scale last year!
1
11
1,700
30 Jan 2025
It’s extremely exciting to see the frontiers of science and practice merge, yet equally disheartening to see when they become frontier of geopolitics. What wisdoms do AI offer to human?
5
612
21 Dec 2024
Curious to hear thoughts about levels of AGI in math. My take (unchanged since march 2023): L1: solve IMO L3: breakthroughs in 2 areas (e.g., analysis/PDEs, algebra/number theory, geometry/topology), each ~ one Annals/JAMS/Acta paper L5: solve a Millennium problem L6: ...
2
2
19
2,826
14 Dec 2024
“勿以恶小而为之,勿以善小而不为。惟贤惟德,能服于人” ("Do not do evil just because it is small; do not ignore doing good just because it is small. Only by being virtuous and moral can one be respected by others.") learned this in middle school, never forgot, passed it to my kids
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
21
1,326
Lechao Xiao retweeted
5 Nov 2024
If you miss the NYTimes needle, especially one that is statistically uniform (blog.alexalemi.com/a-degree-…), you can use this page: alexalemi.com/random/electio… I whipped together to reason about the correlations between the swing states tonight as results come in.
1
18
2,785
23 Oct 2024
Lgtm
23 Oct 2024
Two amazing accounts with high signal I have been impressed by recently on this app is: @cloneofsimo and @kellerjordan0 Just unprompted thought :)
4
818
Lechao Xiao retweeted
23 Oct 2024
Two amazing accounts with high signal I have been impressed by recently on this app is: @cloneofsimo and @kellerjordan0 Just unprompted thought :)
7
1
52
7,677
12 Oct 2024
“It’s the people, not the projects …”
11 Oct 2024
My @Google colleague and longtime @UCBerkeley faculty member David Patterson has a great essay out in this month's Communications of the ACM (@TheOfficialACM):🎉 "Life Lessons from the First Half-Century of My Career Sharing 16 life lessons, and nine magic words." I saw an early draft when it was only 10 lessons 😀. The lessons are generally useful to people in a wide variety of fields, not just CS. cacm.acm.org/opinion/life-le…
3
611