So I found a github repo that stops AI agents from burning tokens for no reason.
It’s called Headroom.
It's built by a guy name Tejas Chopra who works at Netflix.
Basically, it compresses all the things your AI agent reads before it reaches the LLM.
For example:
- Tool outputs
- Logs
- Files
- RAG chunks
- Code search results
- Conversation history
Developer claims 60–95% fewer tokens with the same answers.
Right now you can use it with:
- Python/TypeScript library
- Local proxy
- MCP server
- Wrapper for Claude Code, Codex, Cursor, Aider, and Copilot
If your coding agent is getting expensive, slow, or lost in giant logs, this repo is worth checking out.
Thanks for reading.