Everyone keeps repeating line: "AI costs are dropping 10× per year." And the bills are still going up. 😅
Per-token prices fell ~150× from GPT-4 to GPT-4o. In the same window, enterprise AI spend went from $11.5B to $37B in a single year. Across Ramp's customer base, AI spend grew 13×. The price curve is collapsing. The bill curve is exponential. They are not the same curve.
Here's what's actually happening in 2026:
→ Peter Steinberger ran 100 coding agents in parallel for one month. The bill: $1,305,088. 603 billion tokens. 7.6 million requests. OpenAI covered it as an experiment in "how software would be built if token costs didn't matter."
→ The creator of Claude Code told Lenny's Podcast that Anthropic's own engineers are spending "hundreds of thousands a month in tokens." One was clocked at $150K/month. Single engineer.
→ Salesforce told the All-In Podcast they expect to spend $300M on Anthropic in 2026. Mostly coding.
→ Uber's CTO publicly admitted they burned through the entire 2026 AI budget in 4 months.
→ One Claude Max user — on the $200/month plan — consumed $51,291 of compute in a single calendar month.
→ Anthropic now has 1,000 customers spending $1M /year on tokens. Two years ago that number was 12.
And the kicker, for anyone watching the wrapper layer:
@cursor_ai paid an estimated $650M to
@AnthropicAI last year on ~$500M in revenue. Negative 30% gross margin. Their
@awscloud bill doubled in 30 days when
@AnthropicAI introduced priority tiers. Their response was the only response that works — they shipped their own model, Composer.
@github Copilot loses $20/user/month. Replit's gross margin swung between 36% and negative 14% in 2025. Every coding company that survives is quietly building or buying its own model. The ones that aren't, won't be here in 18 months.
Why is this happening?
The per-token price is the supply curve. Nobody is tracking the demand curve. Reasoning models burn 10–100× more tokens than the models they replaced. Agents burn thousands of times more than chat. Context windows expanded 100×. Headline rate goes down 10×. Tokens consumed per task go up 100×. You do not need a CFO to finish that math.
And the frontier labs themselves are not winning this game. OpenAI's adjusted gross margin fell from a 46% target to 33% in 2025 as inference costs quadrupled. Anthropic spent $2.66B on AWS through September 2025 on $2.55B of revenue — they are paying more for compute than they collect from customers. So they're doing what any business under that pressure does: splitting subscription tiers, retokenizing Opus to bill 35% more for the same text, killing unlimited plans. OpenAI's head of ChatGPT said it out loud: "Unlimited AI is like an unlimited electricity plan. It doesn't make sense."
That sentence is the entire enterprise AI strategy debate in 11 words.
If you are a Fortune 500 CTO renting frontier API tokens for your production workload, you are not buying a falling price. You are buying a contract whose volume your own product roadmap is engineered to multiply, against a provider whose unit economics require that multiplication. Your bill in 2027 is not going to be smaller. It is going to be the size of an engineering org.
The companies that figured this out early —
@cursor_ai with Composer,
@cognition with SWE-1.5,
@Get_Writer with Palmyra-LLM.
RL-tuned on the workflows that actually matter. The math is not subtle: a domain-tuned model running on your own hardware beats a frontier API on cost-per-task by one to two orders of magnitude on the workloads that matter most.
Per-token prices will keep falling. Enterprise token bills will keep rising. The only way out of that scissors is to stop being a tenant.
The frontier labs are great at research. They are not your infrastructure.
Build the model.