Intern @IBMResearch | 🤖 ML PhD student @cornell @cornell_tech | Previously @MSFTResearch @Apple, @amazon | 🎓 @EPFL_en, @polimi

Joined July 2023
15 Photos and videos
LLMs waste massive memory remembering every reasoning step. What if they could leave behind just "breadcrumbs" instead? Breadcrumbs Reasoning: KV cache compression during decoding with learned beacon tokens. 2–32x less memory, minimal accuracy drop. 🧵
2
17
74
7,860
Where it struggles: solving linear equations. Error analysis reveals this isn't a retrieval failure. Compression disrupts arithmetic circuits 🧮, leading to computational errors.
1
238
Giovanni Monea retweeted
1/5 How do we update a model trained in 2025 with new world knowledge from 2026? ⚠️Continued training will undo skills learned by LLMs during post-training, e.g. instruction-following/math/code. 🤝Our method DiSC updates LLMs with new knowledge while preserving existing skills!
1
16
62
11,146
Giovanni Monea retweeted
🧵New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck" The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining 👇
27
104
952
122,687
Giovanni Monea retweeted
This call is still open. I am looking to recruit, as well as many other faculty @Cornell. We review folders as they come, and will send offers until all positions are filled. Please share with your network 🙏
28 Oct 2025
.@Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/j…
23
76
16,914
Giovanni Monea retweeted
5 Dec 2025
🧩Natural language isn’t all you need. We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning? Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder
1
13
57
16,042
Giovanni Monea retweeted
🧵 New paper: "Simple Context Compression" - we show that mean-pooling beats the widely-used compression-tokens method for compressing contexts in LLMs, while being simpler and more efficient! with @yoavartzi (1/7)
3
13
43
25,879
Giovanni Monea retweeted
28 Oct 2025
.@Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/j…
2
40
124
60,238
Giovanni Monea retweeted
🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free! 💡Key idea: Reward accuracy but penalize searches during training. Under the right optimization pressure, LLMs learn to invoke search when their parametric knowledge is lacking. At inference, we simply remove this search access and treat any search invocation as a proxy for abstention!
1
22
39
5,464
Giovanni Monea retweeted
25 Jul 2025
The talk for our work on Retrospective Learning from Interactions, which will be in ACL (once I figure out how to squeeze it shorter) Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! 🙌📈🚀 youtube.com/watch?v=qW8S308e…
2
6
40
6,592
Giovanni Monea retweeted
27 May 2025
🚀Excited to share our latest work: LLMs entangle language and knowledge, making it hard to verify or update facts. We introduce LMLM 🐑🧠 — a new class of models that externalize factual knowledge into a database and learn during pretraining when and how to retrieve facts instead of memorizing them. 🧠Why LMLM? • Learning to look up facts is easier than memorization • Externalizing knowledge improves factual precision • Enables instant machine unlearning by design LMLM opens new directions for how future language models can manage and access knowledge. 📄 [ArXiv] arxiv.org/pdf/2505.15962 🌐 [Project Page] linxi-zhao.github.io/LMLM-si… 💻 [Code] github.com/kilian-group/LMLM 🎤 [Talk] simons.berkeley.edu/talks/ki… Huge thanks to my amazing collaborators: @linxizhao4 @sofianzalouk Christian Belardi Justin Lovelace @JinPZhou And to our incredible advisors @KilianQW, @yoavartzi, and @JenJSun for their generous support and insight.
1
13
43
6,048
Giovanni Monea retweeted
21 May 2025
I’m stoked to share our new paper: “Harnessing the Universal Geometry of Embeddings” with @jxmnop, Collin Zhang, and @shmatikov. We present the first method to translate text embeddings across different spaces without any paired data or encoders. Here's why we're excited: 🧵👇🏾
37
257
1,757
160,619
Giovanni Monea retweeted
New paper: Language models have “universal” concept representation – but can they capture cultural nuance? 🌏 If someone from Japan asks an LLM what color a pumpkin is, will it correctly say green (as they are in Japan)? Or does cultural nuance require more than just language?
6
35
131
26,438
Giovanni Monea retweeted
27 Mar 2025
How does Claude understand different languages? We find shared circuitry underlying the same concepts in multiple languages, implying that Claude "thinks" using universal concepts even before converting those thoughts into language.
5
47
562
48,657