Sparse attention mechanisms are finally moving beyond academic benchmarks into production systems, including DeepSeek Sparse Attention, and recently
@NousResearch 's Lighthouse Attention.
BLASST by NVIDIA, from paper Dynamic Blocked Attention Sparsity via Softmax Thresholding, attempts to sparsify attention in a different way, leveraging a similar rescale factor threshold idea from Flash Attention 4.
We expect to see more interesting sparse attention techniques in the future.
arxiv.org/abs/2512.12087 (2/4)