Pierfrancesco Beneventano

Pierfrancesco Beneventano

246 Photos and videos

Tweets

CBMM retweeted

Pierfrancesco Beneventano

@PierBeneventano

Jun 6

A common practitioner belief mechanistically: “weight decay stabilizes training.” We show it is wrong technically, a bit real In practice. - Training still happens at EoS with weight decay. - That said "it doesn't look like it"! The curvature at stabilization is much lower! - We properly charcaterize all we see by extending to weight decay the self-stabilization by @alex_damian_, @EshaanNichani, and @jasondeanlee and matching the result cleanly with an underdamped harmonic oscillator! - This however stabilizes/tames the dynamics in function space. Importantly: - This shows that regularized training at EoS even though "doesn't look like that", meaning curvature measures are lower than thresholds. Check out our paper: arxiv.org/pdf/2605.16622

5,841

CBMM

CBMM @MIT_CBMM

May 30

Check out the latest work of our center, in collaboration with @TAMU! Towards theorizing the boost in capabilities of agent systems. @PierBeneventano @GalantiTomer

Pierfrancesco Beneventano

@PierBeneventano

May 30

Have you ever wondered how to formalize what an agentic system actually is? Meaning where they fit in the book of ML and how to explain/predict their performance? We argue here, agents can be seen as boosting reasoning models! arxiv.org/abs/2605.14163

779

Pierfrancesco Beneventano

CBMM retweeted

Pierfrancesco Beneventano

@PierBeneventano

May 22

Thanks a lot for sharing our work! On top of the things mentioned! We also give a very nice mathematical framework and mathematical results about agent systems :) With the amazing Varun, Riccardo, Tommy, @GalantiTomer

DAIR.AI

@dair_ai

May 18

NEW paper worth reading. GPT-5.4 nano plus a critic-comparator orchestration loop hits 76.4% on SWE-bench Verified, matching standalone Gemini 3 Pro and Claude Opus 4.5 Thinking. The trick is to select from k=8 weak-model proposals using execution and proof signals. What does this mean? Many of the patches you'd expect from a frontier model are already inside a weak model's top-8 candidates. When you have 8 candidate patches from a weak model, don't ask the model which is best. Run them and verify them. That's enough to match a frontier model's accuracy. The takeaway for AI devs: a weak model's top-k often already contains the right answer. What limits you is the quality of your selector, not the capability of the model. Paper: arxiv.org/abs/2605.14163 Learn to build effective AI agents in our academy: academy.dair.ai/

4,244

Tomer Galanti

CBMM retweeted

Tomer Galanti @GalantiTomer

May 16

1/ Many optimization problems are hard in theory. But real OR and NP-hard instances often have exploitable structure. Can an LLM agent discover that structure automatically and turn it into faster solver code?

205

25,271

Pierfrancesco Beneventano

CBMM retweeted

Pierfrancesco Beneventano

@PierBeneventano

May 16

This is a project I’m very excited about. Back in the days the smartest computer scientists were finding the efficient ways to solve their problems. We made the agents do this work here.

Tomer Galanti @GalantiTomer

May 16

5,905

Pierfrancesco Beneventano

CBMM retweeted

Pierfrancesco Beneventano

@PierBeneventano

May 3

Our new paper was accepted at ICML! 1) Momentum isn’t just “SGD but faster”. It affects sharpness (of orders of magnitude!) 2) The usual story says momentum lets you train in sharper regions. That’s true for large batches only! The opposite is true for minibatches!

ALT SGD Momentum trains at the Edge of Stability, but the level of stabilization is not as we expected from the full-batch case!

113

7,630

Pierfrancesco Beneventano

CBMM retweeted

Pierfrancesco Beneventano

@PierBeneventano

Apr 25

Muon leads to severely miscalibrated models! This is just one of the results of this new paper of ours: In “Too Sharp, Too Sure” we show calibration error tracks loss curvature during training and we tie both to margin tails.

454

83,638

CBMM

CBMM @MIT_CBMM

Apr 10

[blog] What is Intelligence? Or "Distinguishability is All You Need" Here are several related questions to which we do not have a good answer: How will we know when we've achieved "Artificial General Intelligence" (AGI)?... poggio-lab.mit.edu/blogsupda…

211

CBMM

CBMM @MIT_CBMM

Apr 1

[video] "Intelligence as Prediction: Cybernetics, LLMs, and Sociality" Speaker: Blaise Agüera y Arcas - Google, Paradigms of Intelligence youtu.be/6NC0tSjZXBo

1,195

CBMM

CBMM @MIT_CBMM

Mar 29

[blog post] "PoggioAI/MSc Went Online" This first public release is an open-source, customizable, modular multi-agent system for academic research workflows, with a current emphasis on machine learning theory and nearby quantitative fields. poggio-lab.mit.edu/blogsupda…

444

Pierfrancesco Beneventano

CBMM retweeted

Pierfrancesco Beneventano

@PierBeneventano

Mar 26

Check the blog of Poggio Lab at MIT! We went online with some very nice blogs! The last one being about our multiagent system: poggio-lab.mit.edu/blogsupda…

930

Pierfrancesco Beneventano

CBMM retweeted

Pierfrancesco Beneventano

@PierBeneventano

Mar 23

Most AI for research work tries to maximize autonomy first and patch quality later. We think the near-term path is the reverse: Automating step-by-step holding the quality bar fixed. Today we’re open-sourcing PoggioAI/MSc for ML Theory Research

40,487

Yulu Gan

CBMM retweeted

Yulu Gan

@yule_gan

Mar 13

Simply adding Gaussian noise to LLMs (one step—no iterations, no learning rate, no gradients) and ensembling them can achieve performance comparable to or even better than standard GRPO/PPO on math reasoning, coding, writing, and chemistry tasks. We call this algorithm RandOpt. To verify that this is not limited to specific models, we tested it on Qwen, Llama, OLMo3, and VLMs. What's behind this? We find that in the Gaussian search neighborhood around pretrained LLMs, diverse task experts are densely distributed — a regime we term Neural Thickets. Paper: arxiv.org/pdf/2603.12228 Code: github.com/sunrainyg/RandOpt Website: thickets.mit.edu

455

3,162

767,654

CBMM

CBMM @MIT_CBMM

Mar 17

[blog] Beneficial Misalignment: Why We Shouldn't Always Align AI to Humans In the rapidly evolving field of NeuroAI, a significant amount of energy is dedicated to 'alignment', the idea that representations from artificial intelligence should converge... poggio-lab.mit.edu/blogsupda…

691

CBMM

CBMM @MIT_CBMM

Mar 11

[blog post] A Conversation with Blaise Agüera y Arcas: On Intelligence, Life, and the Future of AI What does it mean to call something intelligent - and when did this question get so hard to answer? For Blaise Agüera y Arcas, VP at Google and founder... poggio-lab.mit.edu/blogsupda…

936

CBMM

CBMM @MIT_CBMM

Mar 4

[blog post] Can a Neural Network Think Before It Speaks? Somewhere around 2022, an observation started making the rounds among researchers working with large language models: if you just asked a model... poggio-lab.mit.edu/blogsupda…

629

CBMM

CBMM @MIT_CBMM

Feb 26

[blog post] Edge of (Stochastic) Stability made simple — Part II: the mini-batch case In Part I we had one landscape and a deterministic update. Now we have a distribution of mini-batch landscapes and a stochastic update... poggio-lab.mit.edu/blogsupda…

292

CBMM

CBMM @MIT_CBMM

Feb 20

[blog post] Edge of (Stochastic) Stability made simple — Part I: A crash course on (full-batch) Edge of Stability In this part I introduce the phenomenon and what I believe are the two key mechanisms—which we’ll use as the springboard for the mini-bat... poggio-lab.mit.edu/blogsupda…

598

CBMM

CBMM @MIT_CBMM

Feb 13

[blog post] Are Transformers Just "Stochastic Parrots"? A common criticism of Large Language Models (LLMs) is that they are merely "stochastic parrots"—statistical mimics that stitch together likely patterns without genuine reasoning... poggio-lab.mit.edu/blogsupda…

411

Tomer Galanti

CBMM retweeted

Tomer Galanti @GalantiTomer

17 Oct 2025

🧵 New paper: LLM-ERM: Sample-Efficient Program Learning via LLM-Guided Search arxiv.org/abs/2510.14331 We use reasoning LLMs to learn tasks like IsPrime from ~200 samples by proposing short programs, making both the learned function *and* the learning process interpretable 🤯

LLM Priors for ERM over Programs

We study program-learning methods that are efficient in both samples and computation. Classical learning theory suggests that when the target admits a short program description (for example, a...

arxiv.org

8,310