Dad, Dev, CEO, 4x Founder. Building @HelloUntangle

Joined November 2006
4,421 Photos and videos
Pinned Tweet

89
193
2,127
1,278,242
.@vercel @rauchg any plans to offer this type of capability on AI Gateway?
Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇
5
3
12
811
Interesting
Introducing the Fusion API, the smartest compound model in the market. Fusion achieves Fable-level intelligence at half the price. How it works 👇
8
5
101
34,864
Yard work injury. Thankfully, only stitches required.
15
15
2,400
I enjoyed Fable while it lasted. I assume this will get sorted out in about week or so. Also, I assume gpt-5.6 will drop soon which means we’ll all be switching over then anyway. This demonstrates again why it’s important to use a code factory that isn’t tied to a single lab.
46
5
203
20,289
God damn. Just cut our builds from 6 min -> 2 min with one line of code. Thank you @useblacksmith Not being paid to say this. Now that I'm shipping ~55 PRs/day, this saves us SO much time. (and thx @theo for the heads up)
21
4
165
17,322
How can serious engineering orgs tie themselves down to one lab? Also, if you say "there's no lock in - I switch back and forth" then you're not building a real code factory. You need your code factory to be independent of the lab and it's models - especially as we see open weight models become more capable and routing becomes truly viable.
10
3
54
4,371
Fable just suggested we NOT rebuild a feature because it already worked well. AGI achieved :)
44
10
846
60,437
Please, pretty please, practice your prod db restore. And use your agent to do it all! Here's how 👇
3
3
44
8,594
This is too boring to go viral but it will save you someday.
11
19
262
82,840
"This is also the job. An application earns its place in the untrainable corner by doing unglamorous work: arranging a company's private reality so a model can act on it, handing the model the tools to act, working with the customer to change the reality of its workforce." - @saranormous This is exactly what we're seeing as we build the AI software suite for Family Law at @HelloUntangle. The work to define, build and measure AI-augmented workflows specifically for divorce and child custody is a massive amount of work that falls into untrainable territory.
4
4
26
8,378
OMG these numbers
Replying to @claudeai
Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.
24
15
473
44,666
Napping in my car, listening to @bhalligan and @davidsenra, shipping a bunch of PRs on my phone with Devin, and waiting for my mom’s flight to land at 4am. And yes, I love Cornnuts.
13
1
59
4,924
If you’re not using @agentmail for all your agent testing loops you’re seriously missing out. We’re on the $200/mo plan and it’s worth every cent.
22
6
130
26,784
We just shipped PR 1200 - that's 200 more in just 7 days. That's 20% of all PRs I've ever shipped on @HelloUntangle - in ONE week. I can't even believe how fast we can ship using a mature full-featured code factory like @DevinAI.
PR 1000 landing today. Code Factory is fully functioning. Agents write/review/land 100% of our code now. Don't get me wrong - there's a *lot* of work that I do crafting PRDs, steering conversations, etc, but the majority of the work is done by agents now.
15
3
45
8,092
It's so good to see evals getting more realistic.
SWE-Bench style grading has been the standard for years now - you ask the agent to solve an issue and then run its code on a pre-constructed unit test. The problem is that passing a unit test is only one part of writing production-ready code. You also want to evaluate agents on a number of other axes, including scope, coding style, and unintended side effects. The result is our new benchmark FrontierCode - which has ~80% fewer false positives and for which the best model (Opus 4.8) only scores 13%! "Where others grade like a CI, FrontierCode grades like a tech lead."
2
25
5,909
"... every company is about to get the ability to hire infinite employees." This is why everyone is getting serious about managing token costs. Managers are accustomed to headcount budgets - but they're now realizing they can spend, sometimes without limit, on agent labor budgets. This is obv not sustainable for either their company, or for the app providing the agent to them (if they pay a fixed monthly costs vs usage).
The belief at the time was that model costs were halving every 6 months meaning tokens would get cheap and so application layer companies would need to find a way to charge for the value of tokens by selling the work / services. What actually happened is AI got much more expensive than people realized at the time. The shift from chat to agents led to an explosion in cost. One user could trigger hundreds of agents and each of those agents could trigger more agents. Agents started running longer and more autonomously. On top of that frontier models like Mythos are getting more expensive not less. If you look at what is happening in engineering, the coding companies are doing incredibly well selling tokens because engineers are consuming so many rather than needing to sell the value of the work. In fact, there is starting to be a massive demand for enterprise infrastructure to help large organizations track, manage and optimize their agents / tokens. Now the problem for application layer companies is how do you take that large token cost and convert it into something useful for your customers. A rough analogy is every company is about to get the ability to hire infinite employees. The main challenge is going to be figuring out how to manage those employees and make your business model work the same way it did with human employees. You can think about the previous generation of enterprise SaaS as building tools so that organizations could manage a large number of humans and make them productive. All of this is going to get rebuilt to support hybrid human / agent organizations. Some will be built by model / cloud, some by existing enterprise SaaS and some by new cos. For Harvey this means we don’t have to become a services company. The infrastructure for every law firm to deploy, train and manage a large number of agents is going to be so complex that model / cloud providers and law firms likely won’t build all of it. And AI is going to be expensive enough that we can capture something that looks like labor spend which is much closer to services without actually having to sell services.
12
1
34
7,530
How to be happy: 1. Write a PRD with user stories that have agent-verifiable acceptance criteria 2. Use /goal 3. Go to bed
17
3
111
9,820
My wife leveled up my office with a vase of Peonies from the garden.
207
77
4,077
217,465
All the VC stories … I’m SO bored of hearing founders whine about this. It’s our job as founders to take the constant cuts and just keep going. There’s zero, absolutely zero, value to you sharing these stories publicly. Of course there are VCs who are selfish and flawed. What did everyone expect? These are not transactions among friends. There is one reason and one reason only for taking a meeting with you or writing you a check: Multiply their LP’s investment. That’s it. Everyone’s expectations are just way too high. Get the money and then get back to work.
25
4
82
14,282