Lots of excitement around Factory's model router and its 20% cost saving, and rightly so.
Many are saying this is just the start. After digging into the numbers, I think it's close to the ceiling. Here's why 👇
Routing only saves money on work a cheaper model can handle without dropping the ball. Factory's router holds ~99% of Opus 4.7's pass rate while cutting cost by 20%. They also published a Pareto curve of other experiments, showing that when they pushed harder performance suffered. Getting down to ~56% of Opus 4.7 cost dragged the pass rate to 81%. In their research, 20% was the elbow of the curve, i.e. about the most you can save before quality starts to go.
It's also important not to confuse this cost saving as "only 20% of tasks could be handed to smaller models". The reality was likely far more. Firstly, smaller models aren't free - Claude Sonnet is only ~50% cheaper. But most importantly, the hardest tasks are often the long, token-hungry, multi-step ones. So a handful of hard sessions still eat the lion's share of the bill, even if the majority of tasks get routed.
Furthermore, any benchmark that Opus 4.7 scores 99% is forgiving. The tasks we throw at AI in reality are often harder. If you're pushing AI to its limits you'll naturally need to send a higher share of tasks to the smartest models. Hence why I think Factory's numbers form something of a ceiling for cost saving. At least for now...
So why does this matter?
Plenty of startups are now running in-house agents and watching usage spiral. The results are awe-inspiring, but cost is a creeping concern. I've seen teams attempt their own model routing, and if you are Factory's research should give pause for thought. What it shows is that you're unlikely to beat ~20% cost reduction. Worse, if you think you have, you've probably traded away performance without realising it.
For Factory's enterprise clients, 20% off a vast bill is real money. For a startup building it yourself, if you ask me the juice isn't worth the squeeze.
My advice: worry about cost far less than you're tempted to. Put that energy into the product and anything that helps you ship faster. The AI landscape is going to keep shifting and many cost reductions are going to come for free.
Where I'd bet the genuinely dramatic cost reductions will come from (most to least likely):
→ Hardware acceleration letting the frontier labs cut prices
→ Better open-source models and a shift toward local compute
→ Specialised models giving routers cheaper options with best-in-class performance (see Harvey's announcement from the last 24 hours!)
I'm watching the first two very closely this year, and will keep sharing what I find