KV cache shouldn't disappear every time vLLM restarts. With
@novita_labs, we're sharing PegaFlow — a production-grade external KV cache service that plugs into vLLM through the external KV connector interface.
PegaFlow runs as a standalone Rust daemon owning the host KV pool, SSD cache, and RDMA resources. vLLM workers attach via CUDA IPC gRPC, and cache survives engine crashes, upgrades, and model switches.
In production-oriented evaluations:
🚀 2.15× faster vLLM startup with a pre-warmed 500 GiB host pool
📈 56% higher throughput for 8 Qwen3-8B instances sharing one cache
⚡ 72% higher throughput for DeepSeek-V3.2 MLA TP8 (logical KV stored once, not per rank)
🌐 194 GB/s average remote-read throughput across nodes
Three-level hierarchy: pinned DRAM, remote DRAM over RDMA, local SSD on io_uring. Integrates through the existing `kv_transfer_config` path — no vLLM source changes.
📖
vllm.ai/blog/2026-05-18-pega…