Completely agree. There's an upfront time cost to get a codebase working with cloud agents, but it's easy and worth it. Cloud agents give you so much leverage and time back.
One cursor automation cut down our oncall workload by like 80%. PagerDuty triggers a cloud agent that checks aws logs, posthog, slack, linear, notion, and pylon to gather context and root cause. It generates a report, drafts what to tell affected users, and opens a PR when appropriate.
The PRs have a high acceptance rate. This wasn't always the case. At first it was like 50%, which I thought was really high, but makes sense since paging issues are usually pretty narrowly scoped.
But the acceptance rate has gone up to like 80%-90% thanks to a weekly self-improvement automation, which we call the meta bot.
The meta bot is also a cloud agent but instead it triggers weekly and is prompted to improve the oncall bot. It checks for recent corrective human actions in slack and rejected PRs. Then it opens a PR to improve the oncall bot's prompt and reports in slack what other context it needs in its setup. Most of the time it's just the prompt. Things like remembering to run /babysit to get all the review agents happy before asking for human attention.
I guess you could call this a self-improving loop. Not sure i really understand the term "loop" but to me it seems like it's just vaguepostism for "a cloud agent with a cron/webhook trigger mcp to complete a task, and another cloud agent to review and improve the first one."
This also accidentally doubled our eng capacity, kinda. I couldn't get the cursor automation to trigger only on pagerduty alerts, so I just set it to trigger on all new messages in our oncall slack channel. Within a week, non-eng teammates began asking questions, then reporting bugs, then kicking off implementations of small customer asks. Very nice to skip the whole triage/intake dance. I get why ppl like devin now.
Cloud agents are good at just "getting it" when it has its a dev environment, strong backpressure/CI, and legible company context.
I'm a little scared to ask what ppl mean when they say "loops" or whatever but as a dspy stan, self-improving process makes sense to me. So I added another weekly automation that looks back at all the recent automations that led to human follow up touches or rejected PRs and improves the oncall bot's runbooks, prompt, and reports on any missing context or tools. This has incremented the success rate of fully-automated PRs over time.
Is this a loop? Idk, it's just a cloud agent cron mcp in my mind, but who cares, it's f-ing dope!
Cursor cloud agents is almost perfect for making this stupid easy. Some small things can be better.
Video recording is OK but still not great. It doesn't capture how a UI feels, so it's hard to accept a PR without first pulling it down to try it sometimes.
I'd much rather use a local browser to access localhost:3000 running on the cloud agent's VM. It'd be sweet to use the cursor browser's component selector tool in the local agents window for a remote session.
Actually I bet we can spin up quick session-specific links with something like tailscale or cloudflared or ngrok. Might try that out soon.
Which reminds me of another reason why cloud agents beat local parallel agent worktrees. No more container port conflicts, or having to remember which localhost ports map to which agent session.
Some types of work are still a better local experience than cloud, at least for me, esp. high touch exploratory work. Thankfully, Cursor makes it pretty seamless to move a session between local and cloud.
I'd be surprised if any ADE isn't thinking about how to support sandboxed cloud agents asap. Every ADE needs to run or support cloud sandbox infra or they're gonna fall behind as people switch to cloud
some reflections from solely using cloud agents this year:
1. every engineer should default to cloud. it completely changes how you view and use agents. if you run a company, it might be worth mandating everyone starts in cloud
2. cloud agent adoption has been much slower than i expectedβ e.g. looking at a ton of cursor profiles itβs clear majority cloud usage is still rare
3. getting your dx cloud agent ready still requires creative jiu jitsu. dev infra docs could be much better β βthis is how to make our stuff accessible to agents/parallelizable.β luckily investments also benefit humans
4. itβs still a PITA to setup & manage cloud envs across cursor/devin etc. but i assume itβll get bitter lessoned and we donβt need conventions for setup scripts etc.
5. where are the labs?! would love to see codex et al. invest more in their cloud experience. i know they can do it :)
6. itβs strange that cursor/devinβs investment in mobile apps lags behind their investment in cloud agents. they should go hand in hand. the ability to start agents from slack mobile isnβt enough!
7. a cloud agent spinning up other cloud agents (middle manager pattern) is goated. e.g. nice to go for a run, yap for twenty minutes, and end up with parallel agents. only devin supports this well
8. the uis of ADEs have somewhat adapted for cloud agents. but ui patterns for upcoming long running *and* proactive agents are understudied. super excited to see more experiments here (and will contribute)
overall: i freaking love cloud agents. youβll dissappoint me personally if next month you still spin up more local agents than cloud. very grateful for cursor and devin for making this technology so easy to use!