Introducing our latest work, LightMem: Lightweight and Efficient Memory-Augmented Generation 🚀.
A memory system that cuts cost while preserving (and often improving) long-horizon reasoning for LLM agents.
#NLP #LLMs #Memory #LightMem #Agents
📖 Paper:
huggingface.co/papers/2510.1…
🔗 Code:
github.com/zjunlp/LightMem
🧩 Motivation: LLMs struggle in long, multi-turn interactions — context gets noisy, expensive, and models get “lost in the middle.”
Existing memory systems are often accurate but heavy on tokens, API calls, and latency. ⚠️
💡 Solution Overview: LightMem is inspired by human memory and uses a three-stage lightweight pipeline to keep memories compact, topical, and consistent:
1️⃣ Pre-compressing Sensory Memory — remove redundant/low-value tokens before further processing.
2️⃣ Topic-aware Short-Term Memory — group turns by topic and summarize to form precise memory units.
3️⃣ Sleep-time Long-Term Updates Soft Updates — do only incremental inserts at test time and run high-fidelity consolidation offline to avoid runtime latency.
🔬 Results: On LONGMEMEVAL, LightMem yields notable gains in accuracy (up to ~10.9%) while slashing costs — tokens reduced up to 117×, API calls up to 159×, and runtime reduced >12× in some settings. ⚡
☑️ Upcoming (README Todo — highlights):
- Offline pre-computation of KV cache for update (lossless)
- Online pre-computation of KV cache before Q&A (lossy)
- MCP (Memory Control Policy)
- Integration of more common models & feature enhancements
- Coordinated use of context and long-term memory storage
We’d love your feedback, issues, and PRs — let’s make memory for agents practical and lightweight! 🎙️