Everyone's quoting Jevons now. A year ago you had to explain why cheaper tokens meant a bigger bill but now it's consensus.
So let me push past the part everyone stops at, because the part Jevons doesn't cover is the one that matters: where the margin goes once the pie grows.
Start with what got cheap: the model did
-> Chinese open weights commoditised the single most expensive thing an inference provider owns. DeepSeek v4 reportedly codes within a hair of the frontier on SWE-bench at roughly 1/30th of the price, and a quite a few companies just moved all their traffic onto it and mentionned performance going up.
When the scarce thing stops being scarce, its margin goes with it.
At
@nebiustf we work with customers who are already moving their production workloads onto these open models, exactly where the margin has now migrated (I see this every day)
That's how you get the two facts everyone treats as a contradiction:
- frontier-lab revenue is reportedly off the charts while their margins are reportedly deeply negative.
Both at once. Revenue was never the problem, the model stopped being a moat, and you can't charge moat prices for a commodity when a cheap, frontier-grade open weight is one API call away.
Here's the step past Jevons:
- the extra tokens don't disappear, and neither does the money, but it stops pooling at the model. Usage compounds, the per token price keeps falling, and the value migrates to whatever part of the stack is still scarce. The model is the part that got abundant, the compute it runs on, the memory, the power, did not.
That's the reframe, and it's the part almost nobody prices. Stop valuing AI labs on the strength of the next model. The model is the commodity now (altough not every single frontier model is identical yet, the hardest long horizon agentic tasks still favor some closed models, and we'll see what 5.6 and Mythos hold), but the gap has collapsed so far that open weights are now the practical default for the vast majority of production workloads. This is exactly the shift I've been writing about in my recent posts on inference.
Cheaper tokens, more tokens, and a model that's no longer where the money is.