📝 MeMo: Memory as a Model
RAG systems help LLMs find grounding information, but they can struggle when the answer is spread across many documents. They can also be confused by irrelevant or noisy retrieved text. This paper proposes MEMO, which trains a separate memory model to store and connect knowledge so the main LLM can answer more accurately.
The authors say MEMO has several benefits. It can connect information across different documents, handle noisy or unrelated retrieved text, and avoid forgetting old knowledge. It can also work with closed-source LLMs because it does not need access to their weights or logits, and its retrieval cost does not grow with the corpus size because the MEMORY model stays fixed in size.
It uses two models. The MEMORY model is a smaller model that learns and stores information from the documents. The EXECUTIVE model is the main LLM, which stays frozen and asks the MEMORY model for useful information before making the final answer.
How does it work?
The novelty is that MEMO turns the document corpus into “reflections,” which are synthetic question-answer pairs about facts, entities, and links across documents. These reflections are used to train a MEMORY model, which the paper builds with Qwen2.5-14B-Instruct. At inference time, the EXECUTIVE model does not ask the MEMORY model only one question. Instead, it breaks the user’s question into smaller parts, checks facts, finds important entities, asks follow-up questions, and then writes the final answer.
The approach was tested with both open-source models, like Qwen2.5, and a closed-source model, Gemini-3-Flash, as the EXECUTIVE model, showing that it can work in both settings.
Focusing on Gemini-3-Flash, MEMO gets 53.58% on NarrativeQA, much higher than HippoRAG2 at 23.21% and NV-Embed-V2 at 26.62%. On MuSiQue, MEMO gets 60.20%, beating HippoRAG2 at 57.00% and NV-Embed-V2 at 46.60%.