AI AGENTS DON’T SHIP PRODUCTION CODE. HARNESSES DO
A raw model is just raw processing power. To make it useful, you build the environment around it
I broke down the infrastructure systems used by OpenAI, Anthropic, and ThoughtWorks to ship millions of lines of code
Here are the 11 parts worth stealing:
1. The harness is everything that isn't the model - the constraints, feedback loops, state tracking, and tool permissions
2. The model functions like a CPU, the context window is RAM, and the harness is the operating system managing memory, tasks, and rules
3. Cutting 80% of an agent's tools often yields better output by eliminating choice paralysis and context contamination
4. Place AGENT.md or CLAUDE.md files in your repo folders. The agent reads them at session start to learn localized architecture and tasks
5. In long autonomous runs, agents easily overwrite loose text files. A strict JSON schema keeps the pipeline stable
6. Every run must execute the same routine: verify directories, read git logs, parse the tracker, start servers, and run tests
7. Have a generator agent propose a code plan, and a separate evaluator agent review it before any implementation begins
8. The harness must scan the actual file tree first to provide real paths and symbols, stopping the agent from inventing APIs
9. A single agent instance cannot critically evaluate its own work. Use separate instances and browser automation to test outputs
10. Agents trying to fix multiple issues at once drop requirements. The loop must be: pick one feature, implement, test, commit, stop
11. Do not maintain a separate wiki. If a design constraint or utility function isn't in the repo files, the agent will miss it
What actually compounds:
> one tracking file per module
> strict boot sequences
> isolated evaluation agents
> explicit code contracts
> dropping unused tools
Harness components decay as models improve
Build every constraint to be modular, test them by turning them off, and delete them the moment they become overhead
Save this if you are building actual development pipelines, not just chatting with a bot