I start with very informal specifications written by hand. I have an agent convert these into harder specifications that are subdivided into tasks. I review these.
Then I feed those tasks into the specifier agent, which converts each task to Gherkin, prunes the Gherkin, and then hands it off to the coder agent. I spot check the Gherkin.
The coder agent writes acceptance tests directly from the Gherkin. Then writes unit tests. Then writes code. When all those tests pass, the coder agents hands off to the refactorer agent.
The refactorer agent reduces crap to 6 or below, and reduces any duplication. Then it write property tests and gets them to pass. Then it hands off to the architect agent.
The architect agent runs language mutation and covers any uncovered sections, and kills all survivors. Then it runs Gherkin mutation and kills any of those survivors. Then it runs the entire test suite, and when it passes it hands the result off to the specifier, coder, and refactorer.
I spot check the code.
This is an exercise of transformations from the informal to the formal through managed stages, with human interaction decreasing with each stage.
Raw computer power is the limiting factor. Those mutation tests are CPU intensive.