I'm genuinely curious what
@bcherny's setup is like. I find it hard to pull this off—both technically and because real processes and organizations require humans in the loop.
On the technical side, I don't think current models are reliable enough for hours of uninterrupted work. If they start with flawed assumptions early on, the solutions can drift so far off track that recovering on their own becomes nearly impossible. I end up micromanaging like an overbearing boss, which defeats the purpose.
On the org and process side, so many of my tasks require actual humans. Clarifying requirements, challenging assumptions, getting buy-in—these don't happen over Slack threads. I need to talk to people. Get face time. And often it's not even the person I thought had the authority.
Claude or Codex spewing a 500-word root cause analysis just boggles other engineers. A 3-minute conversation with them frequently unblocks what would have taken hours or days of back-and-forth.
This makes me think we're still early:
1. These tools aren't ready for prime time in most real environments yet.
2. Builders like
@bcherny often have dedicated time (or a small team) to work on the meta-problem of reliable workflows. Meanwhile, most of us are just trying to ship the next thing yesterday. I don't have cycles to perfect the automation. I need deliverables. Their job is literally improving the workflows.
Seeing a number of benchmarks showing Opus is the best model for long-running work.
Five tips for running Opus autonomously for hours/days:
1. Use auto mode for permissions, so Claude doesn’t ask for approval
2. Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done
3. Use /goal or /loop, to nudge Claude to keep going until it’s done
4. Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app)
5. Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work