vLLM v0.22.0 is out! 459 commits from 230 contributors (63 new). π
Highlights: DeepSeek V4 hardening (NVFP4 fused MoE, full piecewise CUDA graph, ROCm support), experimental Rust frontend in-Tree, batch-invariant Cutlass FP8 (28.9% lower e2e latency), Model Runner V2 advances, multi-tier KV cache offloading.
Thread π