Accelerate toward the future @orbitfinance_ @sphereone_

Joined September 2010
58 Photos and videos
Pinned Tweet
Automatically mine and create agents for any business. Over the weekend I went to the Google IO hackathon and had a blast building what I called Mitosis. Generates a synthetic company to create conversations and people, automatically extract the workflows from the company based on the conversations and build out new agents to automate the busy work. Pushes all context into the company brain, leverages governance and policy gating to make sure the right person stays in the loop and the agents learn from their mistakes as they make them to get better over time. The end result is a 95% drop off of handoff between the employees in this example. Each week the agents can take on more to empower the employee to get even more work done while making the important judgement calls. This is all of course synthetic and the milege will vary greatly per company and workflows but was a great experiment to build in a day and leverage a lot of the work I have been working on for other businesses. It was also great to try out the antigravity managed agents and Gemini 3.5 Flash. In this demo you may see that it is listed as mock, this is definitely mocked. In the hackathon we had an account with free credits. I burned through the rate limits and the credits rather quickly. The rate limit for the anitgravity agents was 200k a minute, I blew past it at 2M a minute. So this was definitely tokenmaxxing. I look forward to expanding this further to other platforms and technology partners but was a blast to build.
1
328
Built a super goal loop Never felt more productive in my life. Don’t even know how to count hours saved when it’s doing what would take me a week in a night
1
32
AI agent projects that pay back are usually boring Recurring workflows Known inputs QA owner Spend limit Proof artifact One metric If it cannot show hours saved, revenue protected or cost avoided, it is still a demo. AI labor should look like output, not just activity
2
58
Ryan McNutt retweeted
We talk a lot about how important it is to set up self-verification loops. Especially in the age of powerful models that can run for long periods of time, self-verification is a key ingredient that enables the model to run for much longer, delivering a result that is closer to what you intended, so you can do more without having to constantly check in on Claude as it works. @delba_oliveira gives a great breakdown of what that looks like and why it matters
How do you get Claude Code to check its own work before handing it back? Watch how you can encode your manual checks so Claude closes its own feedback loop:
96
244
3,033
418,999
My thought exactly! We are moving to factory lines for work where we intervene only when something breaks and we learn from it for next time
1
103
Totally agree! They are hardened and thoroughly tested environments too so less likely to have a unknown bug or error on critical flows
Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents! The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints! We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch. Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success). And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong. In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time. Good tools are cached intelligence for agents! So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive. We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast! huggingface.co/blog/hf-cli-f…
93
This is quite impressive! I wonder if this will cause other folks to follow suite. Maybe have an insured offering for a bit higher than the base API price? Spoke to some top engineers at top companies who are “tokenmaxxing” and let’s just say they would definitely be getting a refund.
AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.
109
Ryan McNutt retweeted
Leading companies are moving from two-week sprint cycles to a daily rhythm that combines human judgment with overnight agent execution. The opportunity now is how organizations use the capacity those agent-enabled workflows create. mck.co/3Q0yqpt
22
103
572
183,104
Opus 4.8 landed right after I posted this, and it makes the Mitosis idea feel way more real: mine the messy conversations, find the repeatable work, and turn it into governed agents. Not chatbot for everything - managed AI labor with humans still making the calls.
Automatically mine and create agents for any business. Over the weekend I went to the Google IO hackathon and had a blast building what I called Mitosis. Generates a synthetic company to create conversations and people, automatically extract the workflows from the company based on the conversations and build out new agents to automate the busy work. Pushes all context into the company brain, leverages governance and policy gating to make sure the right person stays in the loop and the agents learn from their mistakes as they make them to get better over time. The end result is a 95% drop off of handoff between the employees in this example. Each week the agents can take on more to empower the employee to get even more work done while making the important judgement calls. This is all of course synthetic and the milege will vary greatly per company and workflows but was a great experiment to build in a day and leverage a lot of the work I have been working on for other businesses. It was also great to try out the antigravity managed agents and Gemini 3.5 Flash. In this demo you may see that it is listed as mock, this is definitely mocked. In the hackathon we had an account with free credits. I burned through the rate limits and the credits rather quickly. The rate limit for the anitgravity agents was 200k a minute, I blew past it at 2M a minute. So this was definitely tokenmaxxing. I look forward to expanding this further to other platforms and technology partners but was a blast to build.
120
has anyone built anything to monetize just the skills/plugins?
1
34
New model releases are more exciting than iPhone releases
May 28
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.
1
1
101
What happened to the twitter algorithm this morning? My whole feed is way worse than yesterday
1
1
114
Will we move from skills to goals? or will goals with skills be the new tokenmaxxed default soon?
37
I wonder how people will spend time “sharpening their axe” on the future before doing a task? With everything so easy now, I feel like I spend more time just trying to build a perfect system instead of getting going or just find more and more work to do
42
Ryan McNutt retweeted
Over at Substack, @JoshEakle asks: "It's 2026, and I have yet to see an anti-almond farm protest."
123
572
5,908
931,548
According to Gartner, over 40% of agentic AI projects will be canceled by 2027. I think one of the biggest reasons is that most agents are deployed without actually testing how real people will use them. So this weekend I built a hackathon project called “Simulation to Eval.” The idea is pretty simple: Before deploying an AI workflow or agent into production, simulate thousands of synthetic users and edge case conversations to uncover failures before real customers hit them. Right now most agent workflows look like this: Deploy. Wait for failures in production. Manually refine. Repeat. But agents are basically cold starts. You don’t fully know how employees, customers, or teams will actually interact with them until they go live. So I built a system that: • Generates synthetic personas like novices, experts, confused users, hostile users, etc. • Simulates conversations at scale • Uses judge models to classify success, failure, or refusal • Clusters failure cases together • Auto-generates eval datasets from failures • Feeds those evals back into an improvement loop Each node in the graph is a simulated conversation. 🟢 Success 🟡 Refusal / caution 🔴 Failure Run simulations. Generate evals. Improve the workflow. Re-simulate. Repeat. Basically synthetic production traffic before production. This is something I’ll likely use heavily for the stealth work I’m building around AI employees and AI labor systems for businesses. Built this over the weekend for a hackathon and wanted to share it. Would love thoughts from people working on evals, synthetic data, or agent infrastructure.
4
1
5
514
I have never felt dumb trading in old hardware for an upgrade until now. Will be hoarding everything until proven otherwise
2
104
Built ArcTrust for the Agentic Economy on Arc hackathon ArcTrust is a usage-based API monetization layer using Circle Wallets USDC transfers on Arc. Users can pay per call or buy discounted bulk credits, and bulk purchases create a demand-based reputation signal for providers. In short: pay-per-use bulk discounts onchain proof of demand. Biggest insight: bulk purchases don’t just improve pricing — they help show which services are actually worth buying from. Demo video below 👇 @buildoncircle @arc @lablabai
1
1
7
336
This is honestly going to be a massive explosion of jobs. I think it can go a step further too, not just operator but also doer (one person for the agent building and potentially another to just complete the service)
Apr 24
The hottest job for the next five years is going to be the agent operator. They don't need to be an engineer. They can walk into marketing, legal, or life sciences research and actually make agents work for that function. Required skills: > MCPs > CLIs > Writing skills (the file kind) > agents.md fluency > Business acumen None of this is in any CS curriculum today. Soon, enterprises will be pressured to redesign their workflows for agents, not for people. And when that happens, agent operators will be in massive demand.
2
189
The harness layer is now commoditized

ALT Endgame Now GIF

1
4
148