Ryan McNutt

Ryan McNutt

58 Photos and videos

Tweets

Pinned Tweet

Ryan McNutt

@ryanmcnutty33

May 28

Automatically mine and create agents for any business. Over the weekend I went to the Google IO hackathon and had a blast building what I called Mitosis. Generates a synthetic company to create conversations and people, automatically extract the workflows from the company based on the conversations and build out new agents to automate the busy work. Pushes all context into the company brain, leverages governance and policy gating to make sure the right person stays in the loop and the agents learn from their mistakes as they make them to get better over time. The end result is a 95% drop off of handoff between the employees in this example. Each week the agents can take on more to empower the employee to get even more work done while making the important judgement calls. This is all of course synthetic and the milege will vary greatly per company and workflows but was a great experiment to build in a day and leverage a lot of the work I have been working on for other businesses. It was also great to try out the antigravity managed agents and Gemini 3.5 Flash. In this demo you may see that it is listed as mock, this is definitely mocked. In the hackathon we had an account with free credits. I burned through the rate limits and the credits rather quickly. The rate limit for the anitgravity agents was 200k a minute, I blew past it at 2M a minute. So this was definitely tokenmaxxing. I look forward to expanding this further to other platforms and technology partners but was a blast to build.

6:13

328

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Jun 12

Built a super goal loop Never felt more productive in my life. Don’t even know how to count hours saved when it’s doing what would take me a week in a night

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Jun 11

AI agent projects that pay back are usually boring Recurring workflows Known inputs QA owner Spend limit Proof artifact One metric If it cannot show hours saved, revenue protected or cost avoided, it is still a demo. AI labor should look like output, not just activity

Boris Cherny

Ryan McNutt retweeted

Boris Cherny

@bcherny

Jun 9

We talk a lot about how important it is to set up self-verification loops. Especially in the age of powerful models that can run for long periods of time, self-verification is a key ingredient that enables the model to run for much longer, delivering a result that is closer to what you intended, so you can do more without having to constantly check in on Claude as it works. @delba_oliveira gives a great breakdown of what that looks like and why it matters

ClaudeDevs

@ClaudeDevs

Jun 2

How do you get Claude Code to check its own work before handing it back? Watch how you can encode your manual checks so Claude closes its own feedback loop:

5:57

244

3,033

418,999

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Jun 8

My thought exactly! We are moving to factory lines for work where we intervene only when something breaks and we learn from it for next time

This tweet is unavailable

103

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Jun 5

Totally agree! They are hardened and thoroughly tested environments too so less likely to have a unknown bug or error on critical flows

clem 🤗

@ClementDelangue

Jun 5

Token costs are why there will be no saas apocalypse / good dev tools are cached intelligence for agents! The popular theory goes: agents can write code, so they'll just rebuild every tool from scratch and hit raw APIs. no more dev tools, no more CLIs, no more software layers. just agents and endpoints! We just tested this and the data says the opposite. We benchmarked Claude Code and Codex on real Hugging Face Hub tasks (~1,000 graded runs), with two setups: the agent-optimized hf CLI vs the agent hand-rolling curl or SDK calls from scratch. Hand-rolling burns up to 6x more tokens on multi-step tasks and fails more often (84% vs 94% task success). And that's just dropping one abstraction layer. It would obviously be orders of magnitude more tokens and a dramatically higher failure rate if the agent tried to bypass HF altogether and rebuild model hosting, versioning, and distribution from scratch. Every time an agent re-derives a workflow from raw API calls, you pay for that reasoning in tokens. every single run. a good CLI compresses that entire chain into a few high-level commands the agent can't get wrong. In a world where everyone is complaining tokens are too expensive, abstraction is leverage: thousands of hours of design decisions your agent doesn't have to re-reason about at inference time. Good tools are cached intelligence for agents! So no, agents won't rebuild everything from scratch. they'll gravitate to the most token-efficient tools, because that's what their owners pay for. The software that survives won't just be accessible to agents, it will be accurate and cheap for them to drive. We're seeing it happen with HF, which is becoming the platform for agents to use AI: ~49M requests in just two months, and growing fast! huggingface.co/blog/hf-cli-f…

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Jun 4

This is quite impressive! I wonder if this will cause other folks to follow suite. Maybe have an insured offering for a bit higher than the base API price? Spoke to some top engineers at top companies who are “tokenmaxxing” and let’s just say they would definitely be getting a refund.

Cognition

@cognition

Jun 4

AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.

109

McKinsey & Company

Ryan McNutt retweeted

McKinsey & Company

@McKinsey

Jun 3

Leading companies are moving from two-week sprint cycles to a daily rhythm that combines human judgment with overnight agent execution. The opportunity now is how organizations use the capacity those agent-enabled workflows create. mck.co/3Q0yqpt

Comparison of traditional and agentic software delivery showing smaller teams and shorter development times with about 50% less effort and 60% smaller teams.

ALT Comparison of traditional and agentic software delivery showing smaller teams and shorter development times with about 50% less effort and 60% smaller teams.

103

572

183,104

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

May 29

Opus 4.8 landed right after I posted this, and it makes the Mitosis idea feel way more real: mine the messy conversations, find the repeatable work, and turn it into governed agents. Not chatbot for everything - managed AI labor with humans still making the calls.

Ryan McNutt

@ryanmcnutty33

May 28

6:13

120

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

May 28

has anyone built anything to monetize just the skills/plugins?

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

May 28

New model releases are more exciting than iPhone releases

Claude

@claudeai

May 28

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

ALT Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

101

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

May 28

What happened to the twitter algorithm this morning? My whole feed is way worse than yesterday

114

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

May 28

Will we move from skills to goals? or will goals with skills be the new tokenmaxxed default soon?

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

May 27

I wonder how people will spend time “sharpening their axe” on the future before doing a task? With everything so easy now, I feel like I spend more time just trying to build a perfect system instead of getting going or just find more and more work to do

Daniel Vassallo

Ryan McNutt retweeted

Daniel Vassallo

@dvassallo

May 19

Nick Gillespie

@nickgillespie

May 19

Over at Substack, @JoshEakle asks: "It's 2026, and I have yet to see an anti-almond farm protest."

123

572

5,908

931,548

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

May 19

According to Gartner, over 40% of agentic AI projects will be canceled by 2027. I think one of the biggest reasons is that most agents are deployed without actually testing how real people will use them. So this weekend I built a hackathon project called “Simulation to Eval.” The idea is pretty simple: Before deploying an AI workflow or agent into production, simulate thousands of synthetic users and edge case conversations to uncover failures before real customers hit them. Right now most agent workflows look like this: Deploy. Wait for failures in production. Manually refine. Repeat. But agents are basically cold starts. You don’t fully know how employees, customers, or teams will actually interact with them until they go live. So I built a system that: • Generates synthetic personas like novices, experts, confused users, hostile users, etc. • Simulates conversations at scale • Uses judge models to classify success, failure, or refusal • Clusters failure cases together • Auto-generates eval datasets from failures • Feeds those evals back into an improvement loop Each node in the graph is a simulated conversation. 🟢 Success 🟡 Refusal / caution 🔴 Failure Run simulations. Generate evals. Improve the workflow. Re-simulate. Repeat. Basically synthetic production traffic before production. This is something I’ll likely use heavily for the stealth work I’m building around AI employees and AI labor systems for businesses. Built this over the weekend for a hackathon and wanted to share it. Would love thoughts from people working on evals, synthetic data, or agent infrastructure.

4:20

514

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Apr 30

I have never felt dumb trading in old hardware for an upgrade until now. Will be hoarding everything until proven otherwise

104

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Apr 25

Built ArcTrust for the Agentic Economy on Arc hackathon ArcTrust is a usage-based API monetization layer using Circle Wallets USDC transfers on Arc. Users can pay per call or buy discounted bulk credits, and bulk purchases create a demand-based reputation signal for providers. In short: pay-per-use bulk discounts onchain proof of demand. Biggest insight: bulk purchases don’t just improve pricing — they help show which services are actually worth buying from. Demo video below 👇 @buildoncircle @arc @lablabai

2:10

336

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Apr 24

This is honestly going to be a massive explosion of jobs. I think it can go a step further too, not just operator but also doer (one person for the agent building and potentially another to just complete the service)

rvivek

@rvivek

Apr 24

The hottest job for the next five years is going to be the agent operator. They don't need to be an engineer. They can walk into marketing, legal, or life sciences research and actually make agents work for that function. Required skills: > MCPs > CLIs > Writing skills (the file kind) > agents.md fluency > Business acumen None of this is in any CS curriculum today. Soon, enterprises will be pressured to redesign their workflows for agents, not for people. And when that happens, agent operators will be in massive demand.

2:32

189

Ryan McNutt

Ryan McNutt

@ryanmcnutty33

Apr 23

The harness layer is now commoditized

ALT Endgame Now GIF

148