Some notes on how I got a coding agent to work for 13 days straight:
Self-verification: the agent needs to be able to end-to-end verify everything itself. Design efficient testing layers that make sense for your project, and make sure the agent can effectively loop on them to prove correctness.
Write spec documents: work with the agent to fully specify goals, full implementation details, and verification in a document. It's almost always fruitful to iterate on this multiple times. This could be anywhere: a Notion page, or just a simple .md file.
Use a running to-do list: break down complex work into a to-do list that you can see and edit. As you think of more things for it to do next, you can just add more to-dos. This could be anywhere: a page or database in Notion, or a .md file.
Adversarial review: as a step in the process, the agent should ask another agent to review the spec and implementation and make sure there are no gaps. Force the agent to loop on this until it's totally aligned. This can use sub-agents if that's supported, or you can design a simple CLI that calls another agent. The key thing is to invoke a fresh agent context.
This was for a prototype of a new product. It's easier to do all of this on a small codebase, but the principles work anywhere.
Our co-founder, Simon’s record is letting a coding agent run for 13 days straight.
His bedtime routine now includes giving his agents enough work to keep running until breakfast 🍳
@simonlast
@saranormous