Generating a simple tweet was costing us >>>>>>> 18,000 tokens.
The prompt: ~400 tokens.
The tweet: 70 tokens.
So where did the other 17,500 go?
// GPT-5 Nano is a reasoning model. max_output_tokens doesn't just cap the output.
It caps reasoning output combined.
The model was spending 17,000 tokens thinking before writing 70.
The fix wasn't a bigger cap.
It was a dynamic one:
>> inputTokens = ceil(promptChars / 4.5)
>> reasoning = max(4000, inputTokens × 5)
>> cap = outputBudget reasoning
Why ×5?
Measured in production:
1,737 input → 7,273 reasoning tokens
That's x4.2, We use 5 for safety margin.
Short videos = small cap.
Long videos = large cap.
No waste. No truncation.
Reasoning models need reasoning budgets.
Not output limits.