I built a Clinical Trials AI agent on top of the entire
ClinicalTrials.gov database (AACT — 50 tables, 12M rows), and I want to share a few things I learned, because most of them surprised me.
The goal: not another chatbot that "talks to a database," but an agent that could actually reason over a massive, real-world dataset and be useful. And I vibe-coded the whole thing — directing AI to architect and build it, while I held the design decisions, the data model, and the deployment.
1. The hard part isn't the agent. It's the data. You can wire up a model in an afternoon. But 12M rows don't answer questions quickly or cheaply by accident. Almost all the real effort went into the boring layer underneath: a two-DB split (a read-only pool just for AACT), materialized views hand-tuned SQL instead of an ORM fighting the schema, versioned snapshots with automated sync as new dumps land, and caching so hot queries stay cheap. The agent is the easy 10%. The data engineering is the 90% nobody films.
2. You don't need an agent framework. You need a clean tool contract. No LangChain, no orchestration library. Just a tiny registry: defineTool() to declare a tool, runTool() to call one. The thing I'd underline for anyone building agents — every call goes through the same pipeline: schema-validated (Zod) → policy-checked → executed → audited. That one invariant is worth more than any framework. The control loop stays mine, and adding a capability is one file.
3. Give the model a toolbelt, not a database. Instead of raw SQL, I gave it nine purpose-built tools — trials_search, study_get, eligibility_lookup, feasibility,
competitive_landscape, safety_profile, and more. Each encodes how a human actually thinks about clinical trials, and the model composes them. You're not building a query interface, you're building the agent's vocabulary for the domain.
4. Decouple from the model early. Everything goes through one getModel(role) factory — Anthropic OpenAI today, switchable per task. Adding Bedrock, Azure, or a local model is one file, zero refactor. Models change every few months now; your architecture shouldn't care which one is winning this week.
4. Decouple from the model early. Everything goes through one getModel(role) factory — Anthropic OpenAI today, switchable per task. Adding Bedrock, Azure, or a local model is one file, zero refactor. Models change every few months now; your architecture shouldn't care which one is winning this week.
5. Vibe-coding is a real skill, and it's not "typing less." The work wasn't writing code. It was steering an AI to produce a clean, production-shaped system — knowing what good architecture looks like, catching when it drifts, then getting it deployed: containerized, health checks, telemetry on every call.