Long-running agents aren’t enough on their own the future belongs to specialized multi-agent teams.
@AnthropicAI November 26, 2025 article “Effective Harnesses for Long-Running Agents” clearly outlines one of the biggest limitations of current AI agents:
👉 A single agent struggles to make consistent progress on complex, multi-hour or multi-day tasks.
Each context window resets its memory, leading to half-implemented features, lost progress, inconsistent environments, and agents prematurely claiming that a project is “done.”
Anthropic proposes a two-part solution:
1️⃣ Initializer agent sets up the environment, feature list, and initial structure
2️⃣ Coding agent makes incremental progress and leaves clean artifacts
But the most important insight from the article is this:
“Specialized agents like a testing agent, a quality assurance agent, or a code cleanup agent could do an even better job at sub-tasks across the software development lifecycle.”
🚀
@almanak recognized this long before
While designing their DeFi strategy-building infrastructure, Almanak realized early that a single “super coder agent” would never scale reliably.
Because of this, they embraced a specialized multi-agent architecture from day one.
Almanak’s Strategy Builder uses a coordinated team of agents to take a strategy from ideation to deployment:
Research Agent – analyzes market conditions
Quant Agent – designs the strategy logic
Simulation & Backtest Agent – stress-tests risk and performance
Testing Agent – performs end-to-end behavioral checks
QA Agent – validates correctness and consistency
Deployment Agent – ships the strategy to chain safely
This goes far beyond the initializer coding model described by Anthropic.
🧩 Anthropic’s findings map directly to what Almanak already built
Anthropic’s Observed ChallengeAlmanak’s SolutionAgents try to “one-shot” the entire projectWork is divided across specialized agentsContext resets cause lost progressStructured multi-agent workflow stateAgents mark features as “done” before they workQA Testing agents approve every stepEnvironment becomes messy or buggyCleanup QA agents normalize each iterationMissing or incomplete testingDeep simulations robust DeFi backtests
Anthropic is now arriving at the conclusion that single agents have limits.
Almanak has been operating with that assumption and solving it for a long time.
🎯 The takeaway: Specialized agent teams outperform single agents
A lone agent still struggles with long-horizon engineering.
But when responsibilities are split across domain-specific agents:
quality improves,
debugging becomes easier,
progress becomes consistent,
and end-to-end systems become far more reliable.
This is why Almanak’s Strategy Builder lets users run an entire multi-agent quant team without writing a single line of code.
Try it yourself:
👉
app.almanak.co/invite?code=g…
In their latest post
@AnthropicAI mentioned:
> "It seems reasonable that specialized agents like a testing agent, a quality assurance agent, or a code cleanup agent, could do an even better job at sub-tasks across the software development lifecycle."
At
@almanak we knew all along and can confirm that indeed a team of specialized agents does a better job than a single coder agent.
Meet the Strategy Team writing and testing your whole DeFi strategy from ideation to deployment.
Try if for yourself:
builder.almanak.co/
anthropic.com/engineering/ef…