Heads up, business leaders: This will soon be one of the most frequent questions you face when hiring engineers.
"What’s the token budget associated with this role?"
It sounds absurd until you look at the macro numbers.
Not only is coding agent adoption rising, but as models grow more advanced, the unit cost for compute is compounding. Amazon recently made headlines for spending $500 million on tokens in a single month, while Uber completely burned through its entire annual AI budget in just four months.
Organizations are rushing to gain control over token spend and measure true ROI, all while providing their engineers with the best tools for the job.
If you are developing a strategy around token management, here is a practical breakdown of the operational Do’s and Don'ts we are observing in the wild:
The Do's
Maintain Model Flexibility: Avoid vendor lock-in. Ensure the harness that operates your coding agents allows you to swap underlying models, including open-source options and third-party APIs.
Enforce Discipline: Software development best practices are eroding fast under AI usage. Engineers jump straight into coding prompts and skip structural design. Ensure your systems require agents to plan, estimate, and verify tasks before writing code. Your harness should feature a dedicated planning layer that integrates cleanly with your project management tools and provides a long-term execution graph.
Implement Routing: A robust planning layer also gives you predictability. When your harness understands the scope and complexity of a feature beforehand, it can act as an orchestration plane and route individual subtasks to the most cost-effective model tier capable of handling it, rather than default-spending on premium frontier models.
Decouple Engineers: Your developers do not need to sit and babysit agent loops in their terminals. A true long-horizon coding agent should be able to self-drive. This enables your agent to run at night—frequently capitalizing on off-peak API windows.
Log Spend per Feature: Start implementing visibility at the product level. When you log token costs per feature, high-agency engineers naturally gravitate toward optimizing their spend. You cannot change what you do not measure.
Approach Committed Use Carefully: Foundation model providers are pushing long-term spend commitments. Before signing away your opex, remember how fast this landscape shifts. The market leader today can be entirely leapfrogged tomorrow, and a rigid contract will prevent you from pivoting to better infrastructure.
The Don'ts
Don't Assign Cookie-Cutter Budgets by Role: Deciding that a "Senior Engineer gets $5,000 a month in tokens" ignores operational reality. It skews incentives. Engineers will either hoard budgets for fear of hitting a wall mid-sprint, or spend tokens carelessly just to exhaust their monthly allocation.
Don't Introduce Bureaucratic Approval Barriers: Adding complex justification forms or manual approval flows to token requests is a developer's nightmare. Top talent will simply look elsewhere for work. Budget guardrails should be managed systematically by the project sandbox, not through administrative overhead.
Don't Blindly Ban Premium Models: While you don't need a frontier model to write a basic script, blanket model-tier caps stifle innovation. Restricting your team entirely to lower-tier models prevents engineers from discovering advanced patterns, testing edge capabilities, and exploring the cutting edge.
Don't Gamify Token Spend via Leaderboards: Leaderboards were useful for driving initial internal adoption, but the long-term incentives are broken. Rewarding teams for raw output frequently leads to token waste. Instead, encourage developers to share their structural prompting techniques and architectural layouts. A great example of this is Shopify's internal tool, River, which utilizes public workspaces to mimic a traditional apprenticeship model where teams learn best practices by observing each other's interactions.
We are building predev with token efficiency as a core structural pillar. If you want to explore approaches to manage compute spend and start providing your engineers with a true self-driving coding agent, let's talk.
@predotdev