This model indeed works acceptably on a RTX 3060 Laptop GPU w/ 6GB VRAM:
llama-server -c 98304 -m Qwen3.6-35B-A3B-UD-IQ3_XXS.gguf -fitt 512 --temp 0.6 --top_p 0.95 --top_k 20 --min_p 0
Runs at ca 22 tok/s!
(kv quantization would be marginally faster but generates worse output)
Exciting!
Seeing these benchmarks, Qwen3.6-35B-A3B could potentially bring Qwen3.5-27B / Gemma4-31B quality inference to small laptop GPUs.
I will give this a test run a on an NVIDIA GeForce RTX 3060 Laptop GPU and report back.