The more Zipfian distribution of these terms can't be understated.
That means this tiny SAE adapter enables decades of BM25 index optimizations, rather than having to step into the new set of (also great) manifold-hypothesis-driven LSR engines designed for SPLADE.
By now, everyone knows that single-vector embedding models are hugely limiting for modern workflows.
But they contain than you think: you can extract sparse Latent Terms from them.
And it turns out that BM25 is all you need to turn this vocabulary into a strong retriever.