Mathieu Dagréou

Mathieu Dagréou

10 Photos and videos

Tweets

Pinned Tweet

Mathieu Dagréou @Mat_Dag

19 Apr 2023

📣📣 Preprint alert 📣📣 « A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization » w. @tomamoral, @vaiter & @PierreAblin arxiv.org/abs/2302.08766 1/3

A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical...

Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning. In many practical cases, the upper and the lower...

arxiv.org

11,350

Fabian Schaipp

Mathieu Dagréou retweeted

Fabian Schaipp @FSchaipp

May 27

"It's easier to tune the LR for method A than for B." We tried to formalize this for model-based stochastic optimization methods. We find a key quantity, called stability index, that describes how stable a (weakly) convex bound is as a function of LR. 📚arxiv.org/abs/2602.09842

7,245

Michael Arbel

Mathieu Dagréou retweeted

Michael Arbel @MichaelArbel

May 26

What do JEPA-style self-distillation dynamics actually learn — and why do they sometimes avoid collapse? In our new work with @BasileTerv987 and Jean Ponce, we tackle this question. What surprised us: These dynamics provably recover representations with nonlinear-CCA structure.

34,604

Clément Bonet

Mathieu Dagréou retweeted

Clément Bonet @Clement_Bonet_

May 2

Our work "Busemann Functions in the Wasserstein Space" was accepted at #AISTATS2026 This is a joint work with Elsa Cazelles, Lucas Drumetz and @nicolas_courty. I will be presenting it tomorrow at the poster 96, see you there! Link: openreview.net/forum?id=Xpt0…

2,032

Mark Schmidt

Mathieu Dagréou retweeted

Mark Schmidt @MarkSchmidtUBC

6 Oct 2025

This is the way.

Jeremy Cohen @deepcohen

1 Oct 2025

Replying to @jasondeanlee @SebastienBubeck @tomgoldsteincs @zicokolter @atalwalkar

This is the third, last, and best paper from my PhD. By some metrics, an ML PhD student who writes just three conference papers is "unproductive." But I wouldn't have had it any other way 😉 !

10,371

Konstantin Mishchenko

Mathieu Dagréou retweeted

Konstantin Mishchenko

@konstmish

3 Oct 2025

Nesterov dropped a new paper last week on what functions can be optimized with gradient descent. The idea is simple: we know GD can optimize both nonsmooth (bounded grads) and smooth (Lipschitz grads) functions, but smooth nonsmooth satisfies neither property, so what can we do?

465

30,755

Fabian Schaipp

Mathieu Dagréou retweeted

Fabian Schaipp @FSchaipp

1 Sep 2025

🚟 New blog post: On "infinite" learning-rate schedules and how to construct them from one checkpoint to the next fabian-sp.github.io/posts/20…

Infinite Schedules and the Benefits of Lookahead

TL;DR: Knowing the next training checkpoint in advance (“lookahead”) helps to set the learning rate. In the limit, the classical square-root schedule appears on the horizon.

fabian-sp.github.io

4,964

Rudy Morel

Mathieu Dagréou retweeted

Rudy Morel @rdMorel

14 Jul 2025

For evolving unknown PDEs, ML models are trained on next-state prediction. But do they actually learn the time dynamics: the "physics"? Check out our poster (W-107) at #ICML2025 this Wed, Jul 16. Our "DISCO" model learns the physics while staying SOTA on next states prediction!

301

21,138

Mathieu Blondel

Mathieu Dagréou retweeted

Mathieu Blondel @mblondel_ml

1 Jul 2025

Back from MLSS Senegal 🇸🇳, where I had the honor of giving lectures on differentiable programming. Really grateful for all the amazing people I got to meet 🙏 My slides are here github.com/diffprog/slides/b…

slides/README.md at main · diffprog/slides

Slides for the book "The Elements of Differentiable Programming". - diffprog/slides

github.com

5,498

Waïss Azizian

Mathieu Dagréou retweeted

Waïss Azizian @wazizian

17 Jun 2025

❓ How long does SGD take to reach the global minimum on non-convex functions? With @FranckIutzeler, J. Malick, P. Mertikopoulos, we tackle this fundamental question in our new ICML 2025 paper: "The Global Convergence Time of Stochastic Gradient Descent in Non-Convex Landscapes"

0:08

486

34,589

Konstantin Mishchenko

Mathieu Dagréou retweeted

Konstantin Mishchenko

@konstmish

18 Jun 2025

I want to address one very common misconception about optimization. I often hear that (approximately) preconditioning with the Hessian diagonal is always a good thing. It's not. In fact, finding a good preconditioner is an open problem, which I think deserves more attention. 1/4

204

20,267

Matthieu Terris

Mathieu Dagréou retweeted

Matthieu Terris @MatthieuTerris

7 Jun 2025

🧵 I'll be at CVPR next week presenting our FiRe work 🔥 TL;DR: We go beyond denoising models in PnP with more general restoration (e.g. deblurring) models! A starting point observation is that images are not fixed-points of restoration models:

2,201

Samuel Vaiter

Mathieu Dagréou retweeted

Samuel Vaiter @vaiter

3 Jun 2025

📣 New preprint 📣 **Differentiable Generalized Sliced Wasserstein Plans** w/ L. Chapel @rtavenar We propose a Generalized Sliced Wasserstein method that provides an approximated transport plan and which admits a differentiable approximation. arxiv.org/abs/2505.22049 1/5

2,688

Mathurin Massias

Mathieu Dagréou retweeted

Mathurin Massias @mathusmassias

24 Apr 2025

It was received quite enthusiastically here so time to share it again!!! Our #ICLR2025 blog post on Flow M atching was published yesterday : iclr-blogposts.github.io/202… My PhD student Anne Gagneux will present it tomorrow in ICLR, 👉poster session 4, 3 pm, #549 in Hall 3/2B 👈

848

Gabriel Peyré

Mathieu Dagréou retweeted

Gabriel Peyré

@gabrielpeyre

19 Feb 2025

Optimization algorithms come with many flavors depending on the structure of the problem. Smooth vs non-smooth, convex vs non-convex, stochastic vs deterministic, etc. en.wikipedia.org/wiki/Mathem…

0:08

108

511

21,396

Alex Hägele

Mathieu Dagréou retweeted

Alex Hägele @haeggee

14 Feb 2025

A really fun project to work on. Looking at these plots side-by-side still amazes me! How well can **convex optimization theory** match actual LLM runs? My favorite points of our paper on the agreement for LR schedules in theory and practice: 1/n

Fabian Schaipp @FSchaipp

5 Feb 2025

Learning rate schedules seem mysterious? Turns out that their behaviour can be described with a bound from *convex, nonsmooth* optimization. Short thread on our latest paper 🚇 arxiv.org/abs/2501.18965

4,875

Fabian Schaipp

Mathieu Dagréou retweeted

Fabian Schaipp @FSchaipp

5 Feb 2025

The Surprising Agreement Between Convex Optimization Theory and...

We show that learning-rate schedules for large model training behave surprisingly similar to a performance bound from non-smooth convex optimization theory. We provide a bound for the constant...

arxiv.org

Aaron Defazio

@aaron_defazio

3 Feb 2025

The sudden loss drop when annealing the learning rate at the end of a WSD (warmup-stable-decay) schedule can be explained without relying on non-convexity or even smoothness, a new paper shows that it can be precisely predicted by theory in the convex, non-smooth setting! 1/2

141

31,661

Konstantin Mishchenko

Mathieu Dagréou retweeted

Konstantin Mishchenko

@konstmish

3 Feb 2025

Learning rate schedulers used to be a big mistery. Now you can just take a guarantee for *convex non-smooth* problems (from arxiv.org/abs/2310.07831), and they give you *precisely* what you see in training large models. See this empirical study: arxiv.org/abs/2501.18965 1/3

430

28,690

Theo Uscidda

Mathieu Dagréou retweeted

Theo Uscidda @theo_uscidda

22 Jan 2025

Our work on geometric disentangled representation learning has been accepted to ICLR 2025! 🎊See you in Singapore if you want to understand this gif better :)

Theo Uscidda @theo_uscidda

14 Dec 2024

Curious about the potential of optimal transport (OT) in representation learning? Join @CuturiMarco's talk at the UniReps workshop today at 2:30 PM! Marco will notably discuss our latest paper on using OT to learn disentangled representations. Details below ⬇️

152

14,285

Gabriel Peyré

Mathieu Dagréou retweeted

Gabriel Peyré

@gabrielpeyre

22 Jan 2025

The Mathematics of Artificial Intelligence: In this introductory and highly subjective survey, aimed at a general mathematical audience, I showcase some key theoretical concepts underlying recent advancements in machine learning. arxiv.org/abs/2501.10465

406

1,941

180,902

Francis Bach

Mathieu Dagréou retweeted

Francis Bach @BachFrancis

21 Dec 2024

My book is (at last) out, just in time for Christmas! A blog post to celebrate and present it: francisbach.com/my-book-is-o…

310

1,934

235,486