if budget allows, single RTX 6000 96GB no question. one card, no splits, no tensor parallel overhead, full model in VRAM. you can run 70B at Q8 without thinking about it. simplest possible setup.
but if you're optimizing for cost: 2x 3090 with NVLink gives you 48GB unified for under $2K used. runs 80B coding models at 46 tok/s. that's what i tested Qwen3 Coder Next on.
for coding specifically, Qwen 3.5 27B dense fits on a single 3090 at 35 tok/s with 262K context. you might not even need multiGPU depending on which model you pick.