The Microsoft and Uber stories are being misread by most people because the reading skips the part where neither company actually scaled back.
MSFT is migrating 100k engineers from Claude Code to Copilot (which they own capture every prompt bit of user feedback to improve their own product). That's a different thing vs. cost capitulation - it's MSFT electing to build internally, where they can use frontier capacity at cost vs. at cost-plus-markup AND make that investment do double work (improve Copilot save $$$).
Uber's CTO said the company was "back to the drawing board." That doesn't entail that they're no longer using the tools - it means they're rebuilding their forecasts/models/FinOps around consumption they underestimated by an order of magnitude. Engineers spending $2k/mo on tokens aren't doing it for fun; they're doing it because the productivity differential justifies it.
Both examples point to the same conclusion (and it isn't the one being reported). It's "this got valuable enough that usage exceeded projections in a fraction of the time" aka Jevons paradox showing up in real time - not the end of the subsidy era.
The market structure emerging from this is bifurcation, which is what is missed in the entire conversation.
On one end: Frontier inference (reasoning, long context, agentic workflows) is consolidating rents at the top. Demand at that tier is supply-constrained, not price-sensitive.
On the other: Commodity inference (the GPT-4-class workloads that defined the category 18 months ago) is collapsing toward marginal cost. Equivalent-capability pricing is now ~1/600th of what it was 6 years ago. Those 2 curves moving in opposite directions are the entire pricing landscape. They have almost nothing to do with each other. Treating this as a single number is an analytical error that's leading to most of the bad takes.
The implication for procurement is straightforward: the cost differential between running frontier on everything vs. routing intelligently across frontier commodity tiers is now 100x to 500x on equivalent workloads. That delta is the rent (and likely, the most valuable infrastructure business that has yet to be built). Whoever builds it will capture more durable margin than either the layer above or the layer below. The labs know this, which is why they're racing to wrap routing into their own platforms. Most enterprise buyers haven't bothered to figure it out (they're too busy building stuff for their business to think about how to optimize the cost of doing it), which is why their spend looks categorically insane.
The Uber analogy gets reached for a lot in this conversation. It doesn't hold. Uber subsidized rides on the way to a clean rent-extraction end state (i.e. own the network, raise prices, let the unit economics work).
AI labs can't run that playbook, because the terminal state is multiple breakthroughs away and the "extraction" time for a frontier model is 12-18 months. Fortunately for them, their strategic capital is patient in ways venture money isn't, and their end state is bifurcated, not unitary. The right-sizing pressure people are looking for isn't going to land at the model layer. It's going to land at the application layer, on every company that raised on the assumption that commodity inference would stay subsidized long enough to build a moat. That subsidy is ending, but the labs aren't the ones who have to absorb the cost; it's their customers' customers who will end up holding the bag.
The question that follows naturally is: how do the labs absorb training costs that increase 10x per generation if commodity inference falls to zero.
It has an answer the labs say out loud and most analysts ignore. They don't absorb it through token margin. They absorb it through a combination of (1) enterprise bundle contracts, (2) (eventually) through agentic outcome pricing & (3) ultimately through strategic capital treating training CapEx as optionality on capability scaling as opposed to cost-of-goods. The right comparable for an AI lab is late-stage pharma. Both have decade-long burn, binary terminal values & a capital base sized to the option vs the current P&L. That model survives commodity compression in a way SaaS economics can't. It also fails differently when it fails, which is the part the writedown discourse hasn't started pricing in yet.
The "subsidy era ending" framing sounds good, but it's analytically wrong. The subsidy era didn't end. The undifferentiated consumption era is ending.
🦔Microsoft canceled its internal Claude Code licenses this week after token-based billing made the cost untenable, even for a company with effectively infinite cloud resources. Uber's CTO sent an internal memo warning the company burned through its entire 2026 AI budget in just four months. American AI software prices have jumped 20% to 37%, and GitHub (owned by Microsoft) is dropping flat-rate plans for usage-based billing across its products.
My Take
The AI subsidy era is ending in real time. The same company that put $13 billion into OpenAI and built the Azure infrastructure powering most of Anthropic's compute just looked at the bill from a competitor's coding tool and decided it was not worth paying. That is not a productivity failure on Anthropic's end. Token-based pricing is forcing every enterprise customer to confront the actual cost of running these models at scale, and the number turns out to be far higher than the flat-rate experiments suggested.
This ties directly to my Gemini Flash post yesterday. Anthropic, OpenAI, and Google all raised effective prices in the last six months. Enterprises that built workflows assuming AI costs would keep falling are now watching annual budgets evaporate in months. Two outcomes look likely from here. Either enterprises scale back AI usage to fit budgets, which slows the revenue ramp the labs need to justify their valuations ahead of IPOs, or the labs cut prices and absorb the losses, which makes the unit economics worse at exactly the wrong moment. Both paths land in the same place, the numbers stop working, and somebody has to take the writedown.
Hedgie🤗