I built a Claude skill that creates ghost personas to review your code before production.
6 lines of code. "Looks fine, ship it."
Then 3 ghosts showed up:
😈 Hit Back twice — 630 TL cart became 322 TL
🏢 Paid in EUR — discount never applied
🐵 DB dropped mid-loop — half the data corrupted
3 bugs. 0 tests written. Before the PR even opened.
Open source → github.com/mturac/simulacra
Production agents fail on missing control planes, not prompts. You can write the perfect system prompt, but without runtime verifiers and goal contracts, long-running agent loops will fail silently. Valid JSON is not a correct action.
How to build the control plane:
1. Log (tool, args, outcome).
2. Enforce independent auth (Cedar/OPA) at dispatch.
3. Assert post-conditions. No state change? Halt.
4. Mine sessions for drift.
Grammar is a linter, not a brain. Keep the orchestration layer simple and put your engineering weight into validation policies at the execution boundary.
🔖 Bookmark this — you'll want it when this hits you at 2am.
1/9 We've rewritten 3 major codebases in 5 years. 2 were worth it. 1 was a $1.2M mistake. Here's the decision framework we use now to avoid costly rewrites.
10/10 If you don't have all 5, fix instead of rewrite. We've saved $3.4M by choosing targeted rewrites (specific modules) over full rewrites. The question isn't "can we rewrite?" - it's "should we?" What's your rewrite threshold? 🔖 Bookmark this framework.
1/10 We tested 23 MCP servers and agent skills in production. 5 actually moved the needle on real engineering work. The other 18 are demos, wrappers, or solve problems that don't exist yet. Here's what survived 6 months of daily use.
10/10 If you're building MCP servers or agent skills, the only question that matters: does this save a senior engineer more time than it takes to maintain? If no, it's a toy. If yes, it ships. We've shipped 5. We're still testing 3 more.