We ran thousands of queries trying to break our own AI agent at
@GetLago
Some versions executed actions without waiting for confirmation. Others were so detailed they burned through the context window before a single useful response.
This wasn't a side project. Prompt engineering turned into a multi-week engineering effort in its own right β and we're a team that expected the hard part to be the infrastructure.
Here's what we got wrong early: we thought hallucination prevention was a prompt problem.
Write precisely enough, and the model behaves. That's true for a chatbot. It's not true for an agent that can void invoices, retry payments, and apply discounts to your customer base.
In billing, a hallucination isn't a wrong answer. It's a financial incident. Over/undercharging the wrong customer can create big trust issues (and angry Slack DMs from finance).
So we rebuilt the approach around three layers:
Constrain.
The agent only calls tools we've explicitly defined. No improvising, no adapting. More hand-holding required β but catastrophic outcomes become structurally impossible.
Confirm.
Before any consequential action β create, update, delete, void, retry β the agent shows a preview and waits for an explicit yes. No "always allow." Not optional.
Exclude.
Some tools we simply didn't build. Org management, API keys, webhook config β manual only. The best guardrail is one that doesn't need to exist.
We also made a call that surprised some people: we built three separate assistants instead of one. A billing assistant that executes. A finance assistant that queries but can't modify. A pricing assistant that only advises.
The reason: a product leader asking "what if we raised prices 20%?" should get strategic advice β not a price change.
A lot went wrong. I wrote up the full picture β architecture, mistakes, and what I'd do differently from day one. Check it out here:
getlago.com/blog/building-aiβ¦