Our LLM spend dropped 80–90% after we changed one thing: who gets the expensive tokens.
github.com/SourceShift/mini-…
Used Claude in the terminal since December 2024, back when `npm install -g
@anthropic-beta/claude-cli` was the only way in and the product hadn't been announced yet. Filed feedback then, still file it now. The model is unrecognizably better. The bill is too.
The expensive model isn't the problem. Sending every grep and every "did the test pass?" to the expensive model is.
What fixed it for me: let one expensive model orchestrate. Let cheap models from different vendors do the work in parallel. Let a bash script decide pass/fail. The frontier model only judges at the end, when judgment is actually the bottleneck.
Yesterday this pattern, run via mini-ork, built its own recipe to audit a TypeScript codebase for silent `.catch(() => {})` swallows. $3.48, 13 minutes, four model families, one prompt. The same DAG dispatched all-Opus would have run roughly $30–50.
Cross-family review wired by config: Zhipu, Moonshot, OpenAI, DeepSeek, Anthropic, MiniMax. Not four Sonnets agreeing with themselves.
A daily cap that lives inside the dispatcher, not on the invoice. Per-call cost ledger included.
State persists across runs, so yesterday's failure is today's planner context. No paying again to teach the model what it already learned!!!
Open source, Apache 2.0.
github.com/SourceShift/mini-…
#LLM #AgentEngineering