day 5 - did some benchmarking and yeah it doesn't make sense to move off triton for bf16 especially for gemma 3. for nvfp4 though it is flashinfer/fa2 or bust of course.
vLLM ready to go, verifying SGLang now - once that's green will lodge PRs
day 4 - took it from just 31B to the rest of the Gemma 4 ladder: E4B, 12B, 26B-A4B all serving full NVFP4 KV now (up to 3.6× vs bf16), plus Gemma 3 12B.
also got Gemma 4 off the Triton fallback on consumer Blackwell entirely...