Giovanni Monea

Giovanni Monea

15 Photos and videos

Tweets

Giovanni Monea @giomonea

Apr 9

LLMs waste massive memory remembering every reasoning step. What if they could leave behind just "breadcrumbs" instead? Breadcrumbs Reasoning: KV cache compression during decoding with learned beacon tokens. 2–32x less memory, minimal accuracy drop. 🧵

7,860

more replies

Giovanni Monea

Giovanni Monea @giomonea

Apr 9

Where it struggles: solving linear equations. Error analysis reveals this isn't a retrieval failure. Compression disrupts arithmetic circuits 🧮, leading to computational errors.

238

Giovanni Monea

Giovanni Monea @giomonea

Apr 9

Long overdue thread, but better late than never. Grateful to my amazing co-authors for making this happen ( @yair_feldman @shankarpad8 @xkianteb @yoavartzi ) and to the great @nthngdy for feedback and support! Check our paper on arxiv for more details: arxiv.org/abs/2510.13797

Breadcrumbs Reasoning: Memory-Efficient Reasoning with Compression Beacons

The scalability of large language models for long-context reasoning is severely constrained by the linear growth of their Transformer key-value cache, which incurs significant memory and...

arxiv.org

228

Shankar Padmanabhan

Giovanni Monea retweeted

Shankar Padmanabhan @shankarpad8

Mar 23

1/5 How do we update a model trained in 2025 with new world knowledge from 2026? ⚠️Continued training will undo skills learned by LLMs during post-training, e.g. instruction-following/math/code. 🤝Our method DiSC updates LLMs with new knowledge while preserving existing skills!

0:14

11,146

Nathan Godey

Giovanni Monea retweeted

Nathan Godey @nthngdy

Mar 12

🧵New paper: "Lost in Backpropagation: The LM Head is a Gradient Bottleneck" The output layer of LLMs destroys 95-99% of your training signal during backpropagation, and this significantly slows down pretraining 👇

104

952

122,687

Yoav Artzi

Giovanni Monea retweeted

Yoav Artzi

@yoavartzi

Feb 16

This call is still open. I am looking to recruit, as well as many other faculty @Cornell. We review folders as they come, and will send offers until all positions are filled. Please share with your network 🙏

Yoav Artzi

@yoavartzi

28 Oct 2025

.@Cornell is recruiting for multiple postdoctoral positions in AI as part of two programs: Empire AI Fellows and Foundational AI Fellows. Positions are available in NYC and Ithaca. Deadline for full consideration is Nov 20, 2025! academicjobsonline.org/ajo/j…

16,914

Zizhao Chen

Giovanni Monea retweeted

Zizhao Chen @ch272h

5 Dec 2025

🧩Natural language isn’t all you need. We’re great at evaluating text-based reasoning (MATH, AIME…) but what about long-horizon visual reasoning? Enter 𝗞𝗻𝗼𝘁𝗚𝘆𝗺: a minimalistic testbed for evaluating agents on spatial reasoning along a difficulty ladder

0:03

16,042

Yair Feldman

Giovanni Monea retweeted

Yair Feldman @yair_feldman

26 Nov 2025

🧵 New paper: "Simple Context Compression" - we show that mean-pooling beats the widely-used compression-tokens method for compressing contexts in LLMs, while being simpler and more efficient! with @yoavartzi (1/7)

25,879

Yoav Artzi

Giovanni Monea retweeted

Yoav Artzi

@yoavartzi

28 Oct 2025

124

60,238

Tanya Goyal

Giovanni Monea retweeted

Tanya Goyal @tanyaagoyal

2 Oct 2025

🚨Modeling Abstention via Selective Help-seeking LLMs learn to use search tools to answer questions they would otherwise hallucinate on. But can this also teach them what they know vs not? @momergul_ introduces MASH that trains LLMs for search and gets abstentions for free! 💡Key idea: Reward accuracy but penalize searches during training. Under the right optimization pressure, LLMs learn to invoke search when their parametric knowledge is lacking. At inference, we simply remove this search access and treat any search invocation as a proxy for abstention!

5,464

Yoav Artzi

Giovanni Monea retweeted

Yoav Artzi

@yoavartzi

25 Jul 2025

The talk for our work on Retrospective Learning from Interactions, which will be in ACL (once I figure out how to squeeze it shorter) Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! 🙌📈🚀 youtube.com/watch?v=qW8S308e…

Retrospective Learning from Interactions

Retrospective Learning from InteractionsZizhao Chen, Mustafa Omer ...

youtube.com

6,592

Linxi Zhao

Giovanni Monea retweeted

Linxi Zhao @linxizhao4

27 May 2025

🚀Excited to share our latest work: LLMs entangle language and knowledge, making it hard to verify or update facts. We introduce LMLM 🐑🧠 — a new class of models that externalize factual knowledge into a database and learn during pretraining when and how to retrieve facts instead of memorizing them. 🧠Why LMLM? • Learning to look up facts is easier than memorization • Externalizing knowledge improves factual precision • Enables instant machine unlearning by design LMLM opens new directions for how future language models can manage and access knowledge. 📄 [ArXiv] arxiv.org/pdf/2505.15962 🌐 [Project Page] linxi-zhao.github.io/LMLM-si… 💻 [Code] github.com/kilian-group/LMLM 🎤 [Talk] simons.berkeley.edu/talks/ki… Huge thanks to my amazing collaborators: @linxizhao4 @sofianzalouk Christian Belardi Justin Lovelace @JinPZhou And to our incredible advisors @KilianQW, @yoavartzi, and @JenJSun for their generous support and insight.

1:08

6,048

Rishi Jha

Giovanni Monea retweeted

Rishi Jha @rishi_d_jha

21 May 2025

I’m stoked to share our new paper: “Harnessing the Universal Geometry of Embeddings” with @jxmnop, Collin Zhang, and @shmatikov. We present the first method to translate text embeddings across different spaces without any paired data or encoders. Here's why we're excited: 🧵👇🏾

257

1,757

160,619

Veniamin Veselovsky

Giovanni Monea retweeted

Veniamin Veselovsky

@VminVsky

15 Apr 2025

New paper: Language models have “universal” concept representation – but can they capture cultural nuance? 🌏 If someone from Japan asks an LLM what color a pumpkin is, will it correctly say green (as they are in Japan)? Or does cultural nuance require more than just language?

131

26,438

Anthropic

Giovanni Monea retweeted

Anthropic

@AnthropicAI

27 Mar 2025

How does Claude understand different languages? We find shared circuitry underlying the same concepts in multiple languages, implying that Claude "thinks" using universal concepts even before converting those thoughts into language.

ALT Shared features exist across English, French, and Chinese, indicating a degree of conceptual universality.

562

48,657