Following on from Kitchen table genomes/liquid biopsy. There are two forces at work, a centralization one and a decentralization one. Large companies are trying to trap everything in central data centers with proprietary data streams, giving them a moat and control, others are pushing to run AI and other things locally empowering individual users. Same is happening to bio/medical data including sequencing.
Microsoft just solved the context window problem.
Right now, every AI suffers from a fatal flaw: the "context window problem."
When an AI reasons through a complex problem, it generates a massive chain-of-thought. But there is a catch. It has to keep every single token of that thought in its active memory.
The technical term is the "KV Cache."
The longer the AI thinks, the heavier it gets. It slows down. It gets expensive. Eventually, it runs out of space.
We thought the only fix was renting bigger, more expensive cloud GPUs to hold all that context.
Microsoft just proved us wrong. They published a paper called "MEMENTO."
Instead of giving the AI a bigger memory, they taught it how to forget.
Here is how it works:
Instead of generating one endless stream of consciousness, a Memento-trained model breaks its reasoning into small blocks.
After it finishes a block, it writes a dense, highly compressed summary of its own logic—a "memento."
Then, it does something unprecedented.
It physically deletes the entire previous reasoning block from its memory cache.
It only carries the memento forward. The model reasons, extracts the core logic, and instantly drops the dead weight.
The results rewrite the economics of running AI.
• Context length compressed by 6x.
• Active memory usage (KV cache) reduced by 2.5x.
• Zero loss in math, science, or coding accuracy.
And here is the real implication.
Big tech has been charging you by the token for massive context windows you don't actually need.
With this architecture, small businesses and solo operators can run complex, multi-step autonomous agents entirely locally.
You don't need an enterprise cloud setup. A standard machine running an open-source model can now reason indefinitely without overflowing its memory. No API fees. Complete privacy.
We spent the last two years trying to give AI an infinite memory.
It turns out, the secret to smarter AI isn't remembering everything.
It's knowing exactly what to forget.