We're slowly entering a new chatper: Self-hosting AI models is no longer all-or-nothing.
There is now a real middle ground between toy-setups and getting serious work done!
Here's the practical breakdown 👇
→ Qwen3.6-27B
30.5B total params.
3.3B active.
262K context.
Runs in the sane range:
~$0.70/hr on Vast AI with an 80GB-class GPU.
This is the practical entry point for agent work:
Research, sorting, docs, support prep, and simple tool workflows.
Small enough to run without doing anything ridiculous.
Large enough to be worth taking seriously.
→ GLM-5.1
235B total params.
22B active.
256K context.
Now you're in multi-GPU territory:
~$3-6/hr on Vast AI depending on setup and context.
This is where self-hosting starts to hurt a bit.
But for higher-value internal workflows, a few dollars per hour is not crazy if the agent can stay coherent, use tools properly, and finish more work without constant correction.
→ Kimi K2.6
744B total params.
40B active.
This is the high-end setup:
~$12-20/hr on Vast AI with serious multi-H100 infrastructure.
Not the normal founder setup.
But useful as the upper edge of what open-weight agentic models can look like when you stop pretending everything has to run on cheap hardware.
It's a tough balance right now!
We want agents that can use tools, keep context, handle files/APIs, and finish actual work. But we don't want OpenAI and Antrophic to own us forever.