Dynamic model routing products have largely been snake oil so far. We’ve seen many come and go since 2022.
The story of model routing has a simple, legible quality that magnetizes capital.
@Alfred_Lin’s “Beware of Simple Narratives” speaks to the danger of this:
x.com/alfred_lin/status/2038…
I’m an engineer who has been working on genAI applications since 2022. The nuanced reality is very different from the simple story:
1. As
@sqs points out below, frontier models are often better, faster AND cheaper—because they don’t have to retry or get stuck in reasoning loops. The gains of cost-optimized routing are often minimal. Also: people generally want the best possible output. People want to pay 20% more for 5% better.
2. Many projects take a concert of tightly bound models and prompts to complete well. You don’t want individual tasks being routed to different models, as it makes a system unpredictable and unstable. You care about the performance of the aggregate system much more than individual task performance. Dynamic task routing makes it hard to measure the system as a whole.
3. As a user, I dislike how model routing makes software feel opaque. I want to be able to get a “feel” for each model and how to best use it. I don’t want to use a system where changing one word of my prompt might cause me to get routed to a different model, getting wildly different results.
4. Foundation model APIs are already doing model routing to some extent. If there is a significant model arbitrage opportunity which can save costs, they can close the arbitrage themselves.