Most people see “tokens” and think: text.
A few words in.
A few words out.
Basically nothing.
But behind that tiny stream of text is a very physical machine...
A 30-second answer from a large model (here GPT-OSS-120b) running on an H100 can use roughly the same electricity as keeping an energy-efficient LED ceiling light on for around 10 minutes.
That’s not the wild part.
The wild part is the hardware...
An H100 isn’t “a chip” in the normal consumer sense. It’s closer to a small car in cost.
An 8×H100 server can cost as much as an apartment in some cities.
A serious AI cluster is basically an industrial facility.
So the cost of AI isn’t just “how much electricity did this one answer use?”
It’s the fact that millions of people expect instant answers at the same time.
That means someone has to pre-buy and operate enormous amounts of GPU capacity, data centers, cooling, networking, power contracts, storage, redundancy, and staff... all before you type your prompt.
Tokens look weightless.
They are not...