Most memory setups for agentic systems are centralized.
They either provide memory only to the orchestrator, or expose one shared pool every agent reads from and writes to. This makes sense because a naive decentralized memory would just mean that each agent has its own isolated context, which goes against the goal of collaboration.
However, centralized memory can hurt multi-agent systems.
It makes every agent have the same context, which blurs the distinct roles each one is supposed to play. The shared pool also makes it computationally expensive — the agents prefill a lot more information that may not be necessary, and this gets worse over time as every update grows this shared repository.
Therefore, we (
@GuangyaHao666 ) tried something different, something that's decentralized, but still collaborative.
In our latest paper, DecentMem, the agents still work together the usual way, in whatever agent structure they already have. But we let each agent keep its own private memory instead of pooling everything into a shared repository. And we make the private memory stay collaboration-aware by remembering how a task got solved and who handled each piece, so decentralizing the memory doesn't throw out the coordination signal.
Specifically, each agent's memory has two halves — an exploitation pool of past trajectories it can reuse, and an exploration pool of fresh LLM-generated candidates for things it hasn't seen yet. A lightweight online router reweights the two from stage-wise feedback from a judge, so each agent works out its own exploit/explore balance instead of us hard-coding a schedule.
Theoretically, we model each agent's search as a random walk over a graph of candidate strategies, where the two pools act as two kinds of moves — the exploitation pool is a local walk over strategies the agent already knows, and the exploration pool is a teleport that can jump anywhere in its space through the LLM prior. Under mild assumptions, that combination guarantees no agent ever gets permanently stuck in a local place, since the search can always reach any strategy. We also cast the router as a bandit problem and show it converges toward the right exploit/explore balance at an O(log T) regret rate — about the best rate this kind of online balancing can achieve.
Empirically, across 3 MAS frameworks (AutoGen, DyLAN, AgentNet), 5 backbones (Qwen3-4B/8B/14B, Gemma4-E2B/E4B), and 5 benchmarks spanning math, code, QA, and embodied tasks, DecentMem comes out ahead of the strongest centralized baseline by ~9% on average — up to ~24% in the best case — and the no-memory baseline by ~26%. It also uses up to ~49% fewer tokens, since each agent only touches its own memory instead of the whole shared repository.
We also watched how this plays out as the agents pile up experience, since a memory system should naturally support self-evolution and help the system keep improving. We show that DecentMem helps the agentic system evolve faster than every centralized baseline as it sees more tasks — on DyLAN it reaches strong accuracy roughly 2.5× sooner.
Another interesting result is that the improvement gets bigger when the agent coordination is looser and more free-form.
Going from AutoGen's fixed, scripted workflows to AgentNet's improvised, on-the-fly coordination, the relative gain widens pretty steadily, and on the loosest setup, DecentMem even lands on strategies the shared-pool baselines never reach. Our read is that keeping memory private lets different agents keep chasing different solution paths, while a shared pool drags everyone toward the same stored answers — and that variety pays off most when coordination is loose.
Zooming out, the takeaway may not be that decentralized beats centralized. It's that each agent's memory should be scoped and structured more carefully, and more personalized to that agent, which is something I think most multi-agent systems and memory designs still leave on the table.
📑 Paper:
arxiv.org/pdf/2605.22721