Token costs are the number one complaint in AI coding right now. Most of the damage comes from a few default habits that are easy to fix.
I tested every optimization I could find this month. These 7 made the biggest difference in Claude Code:
1. Clear context between tasks. Type /clear when you switch tasks. Every new message re-sends your full conversation history as input tokens. That debugging session from an hour ago is still inflating every prompt you send. A fresh start costs nothing. Carrying stale context costs you on every turn.
2. Compact at 60%, not 95%. Claude auto-compacts near 95% context capacity. By then, output quality has already degraded. Run /compact focus on [current task] at 60% yourself. You get a cleaner summary and stay in the range where the model still performs well.
3. Match the model to the task. Opus for complex reasoning. Sonnet for routine code. Haiku for simple lookups and formatting. Most tasks don't need the most expensive model. One team documented a 72% cost reduction just from model switching and prompt caching over three months.
4. Offload heavy reads to subagents. A 10,000-line log file that Claude reads early in a session stays in context for every message after it. Instead of reading it in your main session, spin up a subagent. It reads in isolated context and returns only the findings. Your main window stays clean.
5. Build deterministic tools that cost zero tokens to run. Not everything needs an LLM call. Data formatting, file moves, test runners, API calls with known inputs. Write these as regular scripts. The LLM orchestrates. Deterministic code executes. The scripts run for free, every time, with predictable output.
6. Keep CLAUDE. md lean. It loads into every session before anything else. A 5,000-token CLAUDE. md costs 5,000 tokens before you've typed a word. Every turn. Every session. Keep it under 200 lines. Move project-specific context into scoped markdown files that only load when relevant.
7. Run /usage before starting a new task. Don't wait until you notice the model making mistakes it wouldn't have made 20 minutes ago. Check /usage, see where you stand, and decide whether to /compact or /clear before committing to the next chunk of work.