The trick is mmap seems to be very aggressive when running with the Vulkan backend, even at -ngl 0, I didn't expect that, CPU-build it was ~9gb, Vulkan it was ~2gb.
Bogged a bit while swapping, but how can you beat such a tiny footprint, it's running in near the same RAM as E2B!
Gemma 4 26B-A4B-it (UD-Q4_K_M) runs too way well, it literally feels like black magic, if my old TV box can run this on an outdated i5, I have no idea why 75% of new datacenters even need to exist, at least 75% of computers made in the last 10 years could run this (sans Chromebooks) and that would be good enough for 75% of people. It's like 2 to 3gb-ish in active memory, insane.