The data on AI-generated code is in, and it's naming something a lot of teams have been feeling quietly:
– 81% of enterprise leaders report more production incidents tied to AI-generated code.
– 43% of AI-generated changes still need manual debugging in production — after passing QA and staging.
– 0% of engineering leaders say they're "very confident" the code will behave correctly once deployed.
Right now the industry seems split between two responses: add more AI agents to review the output, or add more human gates to catch it. Both feel, to me, like patches on the same gap.
My own bias: this is an architecture problem, not a model problem — and architecture problems don't get solved by adding more intelligence on top. But I'm genuinely curious how it looks from inside other teams.
So, honest question for anyone running this in production:
Where's the line for you between "fast enough to ship" and "safe enough to trust" — and who actually gets to decide it on your team?