Simplify your life.
Ryzen 9 9950X MSI X870E Carbon 256GB DDR5 RTX Pro 6000 Blackwell 96GB GDDR7 ECC Seasonic TX-1600 Titanium
One GPU, no rack, no NVLink mess. Smokes 4×3090 in vLLM/Flux/inference thanks to unified bandwidth & zero multi-GPU overhead.
3 months ago, I realized I was hopelessly dependent on corporations that only care about power, money, and control.
At this point Cursor, Claude, OpenAI, all had rugged their unlimited plans.
I wanted a Mac M3 Ultra with 512GB RAM. Ahmad and Pewdiepie convinced me otherwise.
Here's what I learned building my own AI Rig
-----------------------------
The Build ($3K-$10K)
This is the top performance you can get below 10k USD
• 4x RTX 3090s with 2x NVLink
• Epyc CPU with 128 PCIe lanes
• 256-512GB DDR4 RAM
• Romed8-2T motherboard
• Custom rack fan cooling
• AX1600i PSU quality risers
Cost: $5K in US, $8K in EU (thanks VAT)
Performance Reality Check
More 3090s = larger models, but diminishing returns kick in fast.
Next step: 8-12 GPUs for AWQ 4-bit or BF16 Mix GLM 4.5-4.6
But at this point, you've hit consumer hardware limits.
----------------------------------------
Models that work:
S-Tier Models (The Golden Standard)
• GLM-4.5-Air: Matches Sonnet 4.0, codes flawlessly got this up to a steady 50 tps and 4k/s prefill with vLLM
• Hermes-70B: Tells you anything without jailbreaking
A-Tier Workhorses
• Qwen line
• Mistral line
• GPT-OSS
B-Tier Options
• Gemma line
• Llama line
------------------------------------
The Software Stack That Actually Works
For coding/agents:
• Claude Code Router (GLM-4.5-Air runs perfectly)
• Roocode Orchestrator: Define modes (coding, security, reviewer, researcher)
The orchestrator manages scope, spins up local LLMs with fragmented context, then synthesizes results. You can use GPT-5 or Opus/GLM-4.6 as orchestrator, and local models as everything else!
Scaffolding Options (Ranked)
1. vLLM: Peak performance usability, blazing fast if model fits
2. exllamav3: Much faster, all quant sizes, but poor scaffolding
3. llama.cpp: Easy start, good initial speeds, degrades over context
UI Recommendations
• lmstudio: Locked to llama.cpp but great UX
• 3 Sparks: Apple app for local LLMs
• JanAI: Fine but feature-limited
-------------------------------
Bottom Line
Mac Ultra M3 gets you 60-80% performance with MLX access. But if you want the absolute best you need Nvidia.
This journey taught me: real independence comes from understanding and building your own tools.
If you're interested in benchmarks I've posted a lot on my profile