Big Token loves charging by token, but the real cost is GPU-seconds.
For now, they prefer tokens, because they can collect the delta between revenue and GPU-seconds.
On the upside, this means there is still alpha in asking questions that have high internal recursion with low token output, but the frontier model providers will simply start charging for internal thinking tokens…which customers don’t like because it’s unpredictable.
Customers want predictable pricing, so it is inevitable that frontier model providers migrate to time-based rates margin based on model intelligence for usage-based APIs.
As soon as one frontier model provider breaks rank, the others will follow.