🚨 LLMs are frozen after pretraining, but the world keeps changing. How do you give an LLM new knowledge without retraining it, bloating its context, or breaking what it already knows?
Existing methods hit a wall:
🔸 RAG is brittle to retrieval noise and struggles with cross-document reasoning;
🔸 Fine-tuning is expensive and causes catastrophic forgetting;
🔸 Latent memory is tightly coupled to the model that produced it.
👉 Key question: Can we encode knowledge into a small, dedicated memory model that any LLM can query without accessing the LLM itself?
🚀 Introducing MeMo (Memory as a Model) 🚀
We train a dedicated MEMORY model on a reflection Question-Answer dataset synthesized from the target corpus. At inference, a frozen EXECUTIVE model (any LLM, including closed-source models) queries the MEMORY model through a structured 3-stage protocol that decomposes complex queries into targeted sub-queries to retrieve precise, noise-robust knowledge and reasons over the responses.
🔥 Key Highlights
🧠 5-step data synthesis pipeline captures explicit facts, implicit relationships, and cross-document connections as reflections;
🛡️ Robust to retrieval noise: where RAG drops up to 6.22% with added distractors, MeMo holds steady;
🔌 Plug-and-play with any LLM, no weights, gradients, or logits required;
📦 Fixed inference cost, independent of corpus size;
🔄 Continual integration via model merging: 33% compute savings over full retraining and scaling benefits grow with the number of corpora.
📊 Strong results across BrowseComp-Plus, NarrativeQA, and MuSiQue, matching or outperforming retrieval baselines (BM25, NV-Embed-V2, HippoRAG2) with gains of up to 27% on NarrativeQA when paired with Gemini-3-Flash.
💡 Why this matters
MeMo decouples knowledge from reasoning: Train memory once with a small open model, then plug it into the frontier LLM of your choice. No retraining as new corpora arrive, no fragile retrieval pipelines, and full compatibility with proprietary APIs, paving the way for scalable knowledge-aware AI systems.
🤝 Joint work with
@workryanq_nus,
@961014dltkdg,
@alfredleongwl, Alok Prakash, Nancy F. Chen,
@arun_v3rma, Daniela Rus, and Armando Solar-Lezama
📄 Paper:
arxiv.org/abs/2605.15156
💻 Code:
github.com/arunv3rma/MeMo
🌐 Project page:
arunv3rma.github.io/blogs/me…
🤗 Huggingface:
huggingface.co/collections/G…
#LLMs #KnowledgeIntegration #MemoryAugmentedLLMs #RAG #ModelMerging
// Memory as a Model //
The paper augments any LLM with a separate trained memory model that stores, retrieves, and integrates facts on its behalf.
It decouples memory updates from base-model weight updates. It achieves continual-learning robustness without catastrophic forgetting, which is a property that RAG fails to deliver.
A vector store is a database with a learned encoder bolted on. MeMo is a learned subsystem with explicit interfaces. That distinction matters, as agents need to be able to ingest fresh knowledge weekly without retraining or vector-DB churn.
At its core, the position here is that memory in agents should be modular, learned, and gated, not a context-window hack.
Paper:
arxiv.org/abs/2605.15156
Learn to build effective AI agents in our academy:
academy.dair.ai/