code as agent harness.
a 102-page survey from Stanford, Meta, and UIUC on agent harnesses.
the paper argues that code is no longer just the thing agents produce. it’s the medium through which they reason, act, and represent their environment.
it calls this “code as agent harness” and covers three layers: code as the interface between agents and their tasks; the mechanisms that keep agents reliable over long-horizon execution (planning, memory, tool use, verification); and how multi-agent systems coordinate through shared code artifacts.
core findings:
the paper introduces “evolution agents” that treat the harness itself as the optimization target. they collect telemetry, diagnose failures, propose infrastructure changes, and promote only mutations that pass regression. the harness improves itself.
in multi-agent systems, topology complexity inversely correlates with infrastructure quality. teams with better shared state use simpler coordination. teams without it build increasingly elaborate workarounds.
finally, the paper concludes that future agent systems need four properties:
- executable
- inspectable
- stateful
- governed
read more:
arxiv.org/abs/2605.18747
i also published this deep dive (article) on agent harness engineering, covering the orchestration loop, tools, memory, context management, and everything else that transforms a stateless LLM into a capable agent.
the article is quoted below.