Yesterday, Claude betrayed me. He pushed some code he’d written straight into production.
I just asked him to fix a bug and then to create a pull request, and instead, he decided to directly push his code to main, and from there it went straight to our production system.
So then I see people on the Internet writing, "oh, so we just add some instructions on our Claude MD file - like "don’t push something directly to main" or "don’t delete my database".
But that's wrong - Claude may or may not listen to my nice Claude MD file.
When you think about it, the problem isn't Claude, it was me.
I didn’t put the right guardrails in my system so that he couldn’t push something directly to production. When I used to work at Google, I could never just push some code straight to prod. Someone has to approve all my PRs, I needed to roll out new features with a feature flags, and so on. There were a lot of systems and guardrails in place preventing me from doing harm to the YouTube system I was working on.
Even if I really wanted to, I couldn’t take YouTube down.
Good systems prevent you from destroying themselves, even if you didn't mean to do that. So when you're thinking to yourself how do I prevent Claude from making mistakes - think about building the right guardrails so that Claude cannot push code to main, or cannot access your production databases etc.
Otherwise, if you allow Claude to take your own system down, you’re at fault, not Claude.