Gokul is spot on in this post. But the challenge is even bigger.
The last gen of vertical AI companies are not just competing against one deep-working long-horizon agent. They are competing against parallel fleets of them.
Autonomy enables their competition to create parent agents that can spawn and delegate work to thousands of sub-agents. Each sub-agent has its own filesystem, a shell to run CLI tools, and the ability to write and run new programs on the fly.
They divide complex problems, attack from multiple angles, and converge on outcomes in a fraction of the time.
Agents, in
@autonomy_comp, are modeled as concurrent actors that automatically form secure distributed clusters to enable massive scale on a tiny infra footprint. This creates orders of magnitude advantages in costs, speed, and scale.
The question to benchmark is: Can your specialized agent outperform a coordinated team of 100s or 1000s of really-cheap general-purpose agents that can code their way around problems in real-time?
If not, then the time to change your approach is now.
VERTICAL AI CHALLENGE
Vertical AI Founders: You've spent 2 years building your agents, training your model on your customers' data, embedding into workflows, creating a powerful GTM motion, all the best practices. You've beaten back challengers and are the #1 or #2 player in your vertical.
I'm sorry, you cannot relax. In fact, you need to massively up your game.
Turns out you are facing an existential challenge: long-horizon agents (eg: Claude Code). Agents that are not trained on a specific domain, but can reliably work for hours or days on end in pursuit of a goal, self-correct, and actually do stuff.
I'm sure many Vertical AI founders will say: "Oh, we are not worried. We are the system of record for decision traces. We train on enterprise-specific context. That's why these horizontal agents can never catch up with this."
You might well be right.
But, but, but ... you cannot afford to bury your head in the sand. These long-horizon agents will get better very, very quickly. You need to understand precisely how good they are at the exact jobs you've built your agents on. You cannot wait for someone else to do this. For example, if you're a legal AI company with an agent that automates contract review, you must compare how good your specialized agent is versus a general-purpose long-horizon agent that's simply given the contract and asked to perform the same review.
My challenge to you: Assign a strong engineer on your team to focus 100% on using long-horizon agents (with minimal context, other than just the contract in the example above) to compete with your custom-trained agents. Benchmark how the long-horizon agents perform vs your agent. Rinse and repeat it every few months.
Like with most other things worth measuring, what matters is the rate of improvement (the "slope" vs the Y-intercept). If the long-horizon agent is 30% as good as your vertical agent on Day 1, but 50% as good on Day 60, and 70% as good on Day 120, you need to reassess your product strategy.
AGI is coming for everyone. Long-horizon agents are the closest we have to AGI, and as a Vertical AI company, you need to figure out how you compete and survive.
Game on.