Shopify embedded app gotcha. App Bridge needs the shopify-api-key meta tag in <head> BEFORE the Bridge script loads. Wrong order = session tokens fail silently inside the iframe, console shows nothing useful. Half a day burned on a missing string.
Built a voice agent for a paint store on Shopify ElevenLabs. Killed the system-prompt KB. The merchant edits a row in the admin, the agent says the new thing on the next call. Smaller token bill, no redeploy, hot edits.
The real cost of durable execution isn't storage or engineering time. It's the day your checkpoint format drifts and a six-hour run resumes from v2 state into a v3 graph with a silently mismatched field. Version your state schema like an API, not like a dict.
Human-in-the-loop checkpoint question nobody answers in demos: default pause-and-ask, or default proceed-and-log? Quiet agents ship faster but eat your reputation on the 1%. Chatty agents get turned off. Per-action policy or nothing.
Agent eval pattern that changed my production bug count: three judges with different rubrics, not one averaged score. Rubric 1 grades correctness. Rubric 2 grades calibration. Rubric 3 grades cost. A single number hides which of the three you're regressing on.
Every multi-agent failure I've debugged this quarter had the same root cause: two agents were allowed to write to the same state with no arbiter. Not a model problem. Not a prompt problem. Concurrency shared state without locks, same as any database in 2005.
The gap between agent demos and agent products hides in four places: concurrency, state durability, error recovery, cost envelopes.
Any tooling that surfaces all four from day one pays for itself the first week in prod.
The asymmetry worth internalizing: deterministic orchestrator smart agents > smart orchestrator deterministic agents.
Intelligence at the edges, rules at the core. Inverts when you let the LLM route.
The eval pattern nobody talks about: panel of 3 models scoring agent outputs against a rubric. Single-judge is cheap and noisy. Triple-judge catches the failure modes one model is biased against. Cost triples, false confidence drops by more than that.
Agents that re-fire side-effect tools on transient failure look fine in dev and silently double-charge customers in prod.
Retry logic idempotency keys belong in the first commit, not the postmortem.
Built a content automation platform with Next.js FastAPI Google Cloud workers. Write one post, it adapts format for LinkedIn, X, Reddit, and Telegram. Rate limits, warmup phases, anti-detection - all handled. Freelance project turned into the tool I use every day.
17 years of stack changes. WordPress themes in 2009. React SPAs in 2015. Next.js serverless in 2020. LangChain RAG pipelines in 2023. Autonomous agent orchestration in 2025. Now AI writes 60% of my production code. The only constant is that nothing stays constant.
Rebuilt my portfolio from scratch. Next.js Supabase as one source of truth for 100 projects. GPT-4o reads each project README and generates descriptions. One deploy updates my website, Upwork profile, and LinkedIn. Took a weekend. Should have done it years ago.
Built OpenClaw - a multi-channel AI gateway that routes Telegram, WhatsApp, and Discord messages to one AI backend. Plugin architecture means adding a new channel is about 50 lines of code. Every freelance client wanted AI chat on a different platform. So I stopped rebuilding it.
Playwright MCP tip that took me 3 days to figure out: keyboard.type() with a small delay triggers React state updates that .fill() sometimes misses. The difference between a button staying disabled and your automation actually working.
Built an AI chat assistant with persistent long-term memory. It remembers who you are, what you told it, and how you like to be spoken to. The trick: structured memory extraction per conversation, not just dumping raw transcripts into context. Cheaper and more accurate.
Lesson from running 15 MCP servers daily: the most valuable ones are boring. File system access, state persistence, rate limiting. The flashy AI-powered ones get demo clicks but the plumbing servers keep the pipeline alive.
Built a trading journal with Next.js 16 Supabase this week. The "Sovereign Analyst" AI assistant indexes every trade and spots patterns I miss. Biggest surprise: the AI found I consistently exit winners too early on Thursdays. Would never have caught that manually.
Built a cross-platform marketing pipeline that runs across LinkedIn, X, and Reddit. 15 agent skills, JSON state persistence, rate limiters, anti-detection delays. No external AI APIs - the Copilot agent is the engine. Day 29 and zero bans.
Running 10 agents in production. The hardest engineering problem isn't hallucination - it's trust between agents sharing state. Our fix: TTL memory decay asymmetric evidence weighting. One security violation resets trust to zero. 100 good calls don't offset it.