Ryan Carson

Ryan Carson

4,421 Photos and videos

Tweets

Pinned Tweet

Ryan Carson

@ryancarson

Apr 2

x.com/i/article/203977850528…

193

2,127

1,278,242

Ryan Carson

Ryan Carson

@ryancarson

40m

.@vercel @rauchg any plans to offer this type of capability on AI Gateway?

OpenRouter

@OpenRouter

16h

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

811

Ryan Carson

Ryan Carson

@ryancarson

Interesting

OpenRouter

@OpenRouter

16h

Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇

101

34,864

Ryan Carson

Ryan Carson

@ryancarson

16h

Yard work injury. Thankfully, only stitches required.

2,400

Ryan Carson

Ryan Carson

@ryancarson

19h

I enjoyed Fable while it lasted. I assume this will get sorted out in about week or so. Also, I assume gpt-5.6 will drop soon which means we’ll all be switching over then anyway. This demonstrates again why it’s important to use a code factory that isn’t tied to a single lab.

203

20,289

Ryan Carson

Ryan Carson

@ryancarson

Jun 11

God damn. Just cut our builds from 6 min -> 2 min with one line of code. Thank you @useblacksmith Not being paid to say this. Now that I'm shipping ~55 PRs/day, this saves us SO much time. (and thx @theo for the heads up)

165

17,322

Ryan Carson

Ryan Carson

@ryancarson

Jun 11

How can serious engineering orgs tie themselves down to one lab? Also, if you say "there's no lock in - I switch back and forth" then you're not building a real code factory. You need your code factory to be independent of the lab and it's models - especially as we see open weight models become more capable and routing becomes truly viable.

4,371

Ryan Carson

Ryan Carson

@ryancarson

Jun 11

Fable just suggested we NOT rebuild a feature because it already worked well. AGI achieved :)

846

60,437

Ryan Carson

Ryan Carson

@ryancarson

Jun 11

Please, pretty please, practice your prod db restore. And use your agent to do it all! Here's how 👇

Ryan Carson

@ryancarson

Jun 10

x.com/i/article/206473730206…

8,594

Ryan Carson

Ryan Carson

@ryancarson

Jun 10

This is too boring to go viral but it will save you someday.

Ryan Carson

@ryancarson

Jun 10

x.com/i/article/206473730206…

262

82,840

Ryan Carson

Ryan Carson

@ryancarson

Jun 10

x.com/i/article/206473730206…

210

99,887

Ryan Carson

Ryan Carson

@ryancarson

Jun 10

"This is also the job. An application earns its place in the untrainable corner by doing unglamorous work: arranging a company's private reality so a model can act on it, handing the model the tools to act, working with the customer to change the reality of its workforce." - @saranormous This is exactly what we're seeing as we build the AI software suite for Family Law at @HelloUntangle. The work to define, build and measure AI-augmented workflows specifically for divorce and child custody is a massive amount of work that falls into untrainable territory.

sarah guo

@saranormous

Jun 10

x.com/i/article/206450988970…

8,378

Ryan Carson

Ryan Carson

@ryancarson

Jun 9

OMG these numbers

1:45

Claude

@claudeai

Jun 9

Replying to @claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

ALT Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

473

44,666

Ryan Carson

Ryan Carson

@ryancarson

Jun 9

Napping in my car, listening to @bhalligan and @davidsenra, shipping a bunch of PRs on my phone with Devin, and waiting for my mom’s flight to land at 4am. And yes, I love Cornnuts.

4,924

Ryan Carson

Ryan Carson

@ryancarson

Jun 9

If you’re not using @agentmail for all your agent testing loops you’re seriously missing out. We’re on the $200/mo plan and it’s worth every cent.

130

26,784

Ryan Carson

Ryan Carson

@ryancarson

Jun 8

We just shipped PR 1200 - that's 200 more in just 7 days. That's 20% of all PRs I've ever shipped on @HelloUntangle - in ONE week. I can't even believe how fast we can ship using a mature full-featured code factory like @DevinAI.

Ryan Carson

@ryancarson

Jun 1

PR 1000 landing today. Code Factory is fully functioning. Agents write/review/land 100% of our code now. Don't get me wrong - there's a *lot* of work that I do crafting PRDs, steering conversations, etc, but the majority of the work is done by agents now.

8,092

Ryan Carson

Ryan Carson

@ryancarson

Jun 8

It's so good to see evals getting more realistic.

Scott Wu

@ScottWu46

Jun 8

SWE-Bench style grading has been the standard for years now - you ask the agent to solve an issue and then run its code on a pre-constructed unit test. The problem is that passing a unit test is only one part of writing production-ready code. You also want to evaluate agents on a number of other axes, including scope, coding style, and unintended side effects. The result is our new benchmark FrontierCode - which has ~80% fewer false positives and for which the best model (Opus 4.8) only scores 13%! "Where others grade like a CI, FrontierCode grades like a tech lead."

5,909

Ryan Carson

Ryan Carson

@ryancarson

Jun 8

"... every company is about to get the ability to hire infinite employees." This is why everyone is getting serious about managing token costs. Managers are accustomed to headcount budgets - but they're now realizing they can spend, sometimes without limit, on agent labor budgets. This is obv not sustainable for either their company, or for the app providing the agent to them (if they pay a fixed monthly costs vs usage).

Gabe Pereyra

@gabepereyra

Jun 8

The belief at the time was that model costs were halving every 6 months meaning tokens would get cheap and so application layer companies would need to find a way to charge for the value of tokens by selling the work / services. What actually happened is AI got much more expensive than people realized at the time. The shift from chat to agents led to an explosion in cost. One user could trigger hundreds of agents and each of those agents could trigger more agents. Agents started running longer and more autonomously. On top of that frontier models like Mythos are getting more expensive not less. If you look at what is happening in engineering, the coding companies are doing incredibly well selling tokens because engineers are consuming so many rather than needing to sell the value of the work. In fact, there is starting to be a massive demand for enterprise infrastructure to help large organizations track, manage and optimize their agents / tokens. Now the problem for application layer companies is how do you take that large token cost and convert it into something useful for your customers. A rough analogy is every company is about to get the ability to hire infinite employees. The main challenge is going to be figuring out how to manage those employees and make your business model work the same way it did with human employees. You can think about the previous generation of enterprise SaaS as building tools so that organizations could manage a large number of humans and make them productive. All of this is going to get rebuilt to support hybrid human / agent organizations. Some will be built by model / cloud, some by existing enterprise SaaS and some by new cos. For Harvey this means we don’t have to become a services company. The infrastructure for every law firm to deploy, train and manage a large number of agents is going to be so complex that model / cloud providers and law firms likely won’t build all of it. And AI is going to be expensive enough that we can capture something that looks like labor spend which is much closer to services without actually having to sell services.

7,530

Ryan Carson

Ryan Carson

@ryancarson

Jun 8

How to be happy: 1. Write a PRD with user stories that have agent-verifiable acceptance criteria 2. Use /goal 3. Go to bed

111

9,820

Ryan Carson

Ryan Carson

@ryancarson

Jun 7

My wife leveled up my office with a vase of Peonies from the garden.

207

4,077

217,465

Ryan Carson

Ryan Carson

@ryancarson

Jun 7

All the VC stories … I’m SO bored of hearing founders whine about this. It’s our job as founders to take the constant cuts and just keep going. There’s zero, absolutely zero, value to you sharing these stories publicly. Of course there are VCs who are selfish and flawed. What did everyone expect? These are not transactions among friends. There is one reason and one reason only for taking a meeting with you or writing you a check: Multiply their LP’s investment. That’s it. Everyone’s expectations are just way too high. Get the money and then get back to work.

14,282