Lechao Xiao

Lechao Xiao

14 Photos and videos

Tweets

Lechao Xiao @Locchiu

May 10

It was a record (still now?) arxiv.org/abs/1806.05393

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train...

In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing...

arxiv.org

rohan anil

@_arohan_

May 10

Who has the deepest network of them all?

16,303

Lechao Xiao

Lechao Xiao @Locchiu

May 7

This is going to be a super fun workshop on scaling & learning dynamics & optimization! Please consider submitting your best work there!

Elliot Paquette

@poseypaquet

May 7

The High-Dimensional Learning Dynamics Workshop @ ICML 2026 🇰🇷, with a special focus on scaling laws, is coming up, July 10! Submission Deadline: May 11 AoE (extended). @Locchiu @albertobietti @JustinLin610 @inbarser @BachFrancis @ShamKakade6 @andrewgwils @blake__bordelon @_brloureiro @Qiuzihanhan

2,190

Lechao Xiao

Lechao Xiao @Locchiu

Feb 20

The originality and the depth of science are really impressive. High thinking, signal to flop ratio. Congrats Damien, Elliot, Courtney et al.

Damien Ferbach @damien_ferbach

Feb 19

1/10 We built ADANA, an optimizer that gets better as you scale. It extends AdamW with log-time schedules for momentum and weight decay — same hyperparameter count, no extra engineering. Scaled from 45M to 2.6B, it saves ~40% compute vs tuned AdamW, and the gap keeps growing.🧵

4,342

trieu

Lechao Xiao retweeted

trieu

@thtrieu_

Feb 2

Mathematicians 🤝AI researchers arxiv.org/abs/2601.22401. Our take on AI solving Erdos problems: * Many "Open" problems are actually just obscure: many cases the AI didn't find something new, only rediscovered solutions buried in the literature. We present our systematic approach to reporting AI results on Erdos. * The real bottleneck is still human labor, e.g. we spent lots of time filtering out technically correct but meaningless solutions (AI missed Erdos’s original intent). * Acceleration in solving low-hanging fruits is real, but we also need to highlight the many more misses that require human auditing. Clear research directions ahead though, and we feel optimistic about drastically increasing the signal-to-noise ratio. More to come!

Semi-Autonomous Mathematics Discovery with Gemini: A Case Study on...

We present a case study in semi-autonomous mathematics discovery, using Gemini to systematically evaluate 700 conjectures labeled 'Open' in Bloom's Erdős Problems database. We employ a hybrid...

arxiv.org

Thang Luong

@lmthang

Feb 2

Replying to @lmthang

Here's the paper link to our scaled effort for tackling Erdős problems. We started with 700 problems marked ‘Open’ in the database. Our agent #Aletheia identified potential solutions to 200 problems. Initial human grading revealed 63 correct answers, followed by deep expert evaluation and discussion to eventually arrive at meaningful proofs to 13 Erdős problems. arxiv.org/abs/2601.22401

203

29,755

Amr Khalifa

Lechao Xiao retweeted

Amr Khalifa

@AmrMAlameen

16 Dec 2025

I am hiring a student researcher to work with our team in Montreal on LLMs architecture and pre-training in spring-summer 2026, if you're excited to push the frontier of research forward, join us to help keeping the TPUs warm. fill out this form: forms.gle/1AfdyCbzjdKi2yAb7

465

36,214

Lechao Xiao

Lechao Xiao @Locchiu

23 Oct 2025

First scaling law: performance follows a power law of compute, with its exponent governed by science and engineering. Second scaling law: the total improvement of this law follows a power law of resource, with its exponent governed by vision and conviction.

802

Andrew Gordon Wilson

Lechao Xiao retweeted

Andrew Gordon Wilson

@andrewgwils

15 Jul 2025

Great talk from ⁦@ShikaiQiu⁩ on scaling collapse, which can be used as a sensitive diagnostic for model specification (LR, etc) at small scale that transfers to large scales! arxiv.org/abs/2507.02119

3,192

Lechao Xiao

Lechao Xiao @Locchiu

15 Jul 2025

come and enjoy the blessing from universality

Shikai Qiu

@ShikaiQiu

14 Jul 2025

Realistic training dynamics are too complex to be described by simple scaling laws with hand-picked formulas, yet they obey precise scaling trends and universality. Join me Tue morning at the Theory and Phenomenology oral session for an alternative approach that gets us farther!

1,003

Lechao Xiao

Lechao Xiao @Locchiu

14 Jul 2025

Learning rate schedule is a big mystery in machine learning that is underinvested in both theory and practice. Shikai did an awesome job in revealing that simple quadratic models can indeed capture many key insights from big ones.

Shikai Qiu

@ShikaiQiu

12 Jul 2025

📉Learning rate decay is super effective and sometimes mysterious, but the simplest model of SGD on quadratic loss w/ noisy gradients almost perfectly predicts loss curves of transformers trained with Adam on real data, across schedules, model sizes, and token budgets. 1/4

132

12,849

Shikai Qiu

Lechao Xiao retweeted

Shikai Qiu

@ShikaiQiu

8 Jul 2025

While scaling laws typically predict the final loss, we show in our ICML oral paper that good scaling rules enable accurate predictions of entire loss curves of larger models from smaller ones! w/@Locchiu, @andrewgwils, J. Pennington, A. Agarwala: arxiv.org/abs/2507.02119 1/10

233

27,489

Lechao Xiao

Lechao Xiao @Locchiu

29 May 2025

Really impressive work! Changing the exponent is highly nontrivial.

Damien Ferbach @damien_ferbach

26 May 2025

It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer! Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!

1,130

Lechao Xiao

Lechao Xiao @Locchiu

26 Feb 2025

Wow, didn’t think I’d see this conjecture solved in my lifetime terrytao.wordpress.com/2025/…

The three-dimensional Kakeya conjecture, after Wang and Zahl

There has been some spectacular progress in geometric measure theory: Hong Wang and Joshua Zahl have just released a preprint that resolves the three-dimensional case of the infamous Kakeya set con…

terrytao.wordpress.com

2,785

Lechao Xiao

Lechao Xiao @Locchiu

31 Jan 2025

Very cool! Congrats and thanks for trusting our codebase! @peterjliu (creator of nanodo), @ARomanNovak ,@_katieeverett (pre-)lead of the codebase.

Arthur Douillard

@Ar_Douillard

31 Jan 2025

Engineering shoot-out to Nanodo & Drjax for making it possible. Nanodo (github.com/google-deepmind/n…) is a great jax open-source codebase from @Locchiu @Mitchnw et al. DrJax (github.com/google/drjax) allow us to efficiently parallelize training. Look at their paper (arxiv.org/abs/2403.07128), those guys were already training distributed training at 8B scale last year!

1,700

Lechao Xiao

Lechao Xiao @Locchiu

30 Jan 2025

It’s extremely exciting to see the frontiers of science and practice merge, yet equally disheartening to see when they become frontier of geopolitics. What wisdoms do AI offer to human?

612

Lechao Xiao

Lechao Xiao @Locchiu

21 Dec 2024

Curious to hear thoughts about levels of AGI in math. My take (unchanged since march 2023): L1: solve IMO L3: breakthroughs in 2 areas (e.g., analysis/PDEs, algebra/number theory, geometry/topology), each ~ one Annals/JAMS/Acta paper L5: solve a Millennium problem L6: ...

2,826

Lechao Xiao

Lechao Xiao @Locchiu

14 Dec 2024

“勿以恶小而为之，勿以善小而不为。惟贤惟德，能服于人” ("Do not do evil just because it is small; do not ignore doing good just because it is small. Only by being virtuous and moral can one be respected by others.") learned this in middle school, never forgot, passed it to my kids

Jiao Sun

@sunjiao123sun_

14 Dec 2024

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡

1,326

Alex Alemi

Lechao Xiao retweeted

Alex Alemi @alemi

5 Nov 2024

If you miss the NYTimes needle, especially one that is statistically uniform (blog.alexalemi.com/a-degree-…), you can use this page: alexalemi.com/random/electio… I whipped together to reason about the correlations between the swing states tonight as results come in.

2,785

Lechao Xiao

Lechao Xiao @Locchiu

23 Oct 2024

Lgtm

rohan anil

@_arohan_

23 Oct 2024

Two amazing accounts with high signal I have been impressed by recently on this app is: @cloneofsimo and @kellerjordan0 Just unprompted thought :)

818

rohan anil

Lechao Xiao retweeted

rohan anil

@_arohan_

23 Oct 2024

Two amazing accounts with high signal I have been impressed by recently on this app is: @cloneofsimo and @kellerjordan0 Just unprompted thought :)

7,677

Lechao Xiao

Lechao Xiao @Locchiu

12 Oct 2024

“It’s the people, not the projects …”

Jeff Dean

@JeffDean

11 Oct 2024

My @Google colleague and longtime @UCBerkeley faculty member David Patterson has a great essay out in this month's Communications of the ACM (@TheOfficialACM):🎉 "Life Lessons from the First Half-Century of My Career Sharing 16 life lessons, and nine magic words." I saw an early draft when it was only 10 lessons 😀. The lessons are generally useful to people in a wide variety of fields, not just CS. cacm.acm.org/opinion/life-le…

611