co-founder, ErthAI. formerly zynga, workplay ventures, reinvent capital.

Joined August 2008
152 Photos and videos
Dan Garon retweeted
A year ago, I predicted that we would enter The Era of Evals, but it's now happening much faster than I anticipated. Frontier Labs have scaled their Eval production with us by more than 10X in the last 12 months, on what was already a 9-figure base. Every tech-forward enterprise is rapidly building evals for its agents. @mercor_ai is spending over $10M / month on inference and now has Evals for every agent deployment. We need Evals to know (1) what model to use, (2) what context or tools we should add to improve the model, and (3) whether it's working in production. One CTO of a $25B enterprise told me he used to have a product roadmap, but replaced his entire product roadmap with an Eval roadmap.
Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to saturate any evaluation. This means that the primary barrier to applying agents to the entire economy is building evals for everything. This will be one of the largest buildouts we have ever seen with enterprises pouring hundreds of billions of dollars into evals for every workflow we want agents to automate. We're quickly defining a new class of work and hiring across nearly every domain: software engineers, consultants, bankers, lawyer, doctors, gamers, and many more.
21
15
299
59,533
lots more of this coming
Anthropic's Claude Design launch blindsided its partners, Figma and Canva. “Essentially, Anthropic had kind of told these partners like Figma and Canva that the Claude Design product was going to be fairly basic.” “As it got closer to the launch, these partners found out that actually this new product would have some of those advanced features [Anthropic said wouldn’t be in there].” — @steph_palazzolo, AI reporter
4
194
excited for this one
After coding is solved, the next frontier is computer use. Today, we are launching Use Computer, the infra for evaluating and training models to use all kinds of computers 👇
4
73
Dan Garon retweeted
A day later and I feel even more confident in this: Siri is going to be the AI that most consumers end up using most of the time (if they have an iPhone). It's the AI you have with you, with access to everything. And yes, it's finally good enough. spyglass.org/siri-ai/
23
57
598
70,903
Dan Garon retweeted
Jun 10
We partnered with @trajectorylabs to post-train NVIDIA Nemotron 3 Ultra for legal. Here’s what we found: 1) Open-weight models can reach frontier legal performance. On our Legal Agent Benchmark (LAB), Nemotron 3 Ultra started at a 0% all-pass rate. After post-training, it reached 5.8%, placing it between Sonnet 4.6 at 4.2% and Opus 4.6 at 6.6%. 2) Post-training dramatically improves reliability. Before training, many held-out tasks missed enough rubric dimensions to land around ~70% pass rates. After training, those tasks shifted toward ~95% pass rates. 3) Open-weight performance comes at much lower cost. Post-trained Nemotron 3 Ultra reached a similar quality band to leading closed models while running at roughly 1/8th to 1/50th the per-token price of Sonnet 4.6 and Opus 4.6. Most importantly: we post-trained this model on the @trajectorylabs platform less than 24 hours after Nemotron 3 Ultra launched, using the same harness, data, and recipe we used for Nemotron 3 Super. More to come as we continue to experiment with open-weight legal agents. Read more on post-training with Trajectory below:
1/ We post-trained @nvidia Nemotron 3 Ultra on @harvey Legal Agent Bench in under 24 hours. The result: an open model reaching the same band as leading closed models on legal work, at a fraction of the cost. The correlating story: when a new open model ships, Trajectory can turn it into a specialized agent almost immediately.
11
30
272
46,780
Dan Garon retweeted
Fable is a very impressive model. But it is also - Expensive - Slow - Rate limited - Nerfed for important R&D tasks - Not private (retains prompts/responses even for enterprise users) The difference between owned and rented intelligence gets clearer by the day.
19
15
371
22,253
incredible
BREAKING NEWS: Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice. We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming 😭
2
66
such an incredible read excited to see what rob cooks up
Jun 4
With $500 million in funding and a reported $2.5 billion valuation, Flourish wants to reinvent AI by putting real neurons under the microscope. wired.com/story/jeff-bezos-i…
1
140
Dan Garon retweeted
This new research on US unicorn startups is really interesting. Some key facts from the report: 1. Immigrants founded or cofounded 455 of America’s 775 privately held billion-dollar startups, equal to 59% of all US unicorns. 2. 66% of all US unicorns were founded or cofounded by immigrants or the children of immigrants. 3. 79% of US unicorns have either an immigrant founder or an immigrant in a key leadership role. 4. The 455 immigrant-founded US unicorns have a combined valuation of $5 trillion. 5. That $5 trillion valuation is larger than the total stock-market value of companies listed in all but 7 countries. 6. Including immigrant-founded unicorns that went public since 2016 pushes the total value above $5.8 trillion. 7. The number of immigrant-founded US unicorns rose from 50 in 2018 to 455 in 2026. 8. 24% of US unicorns have a founder who first came to America as an international student.
114
290
1,058
464,690
Dan Garon retweeted
Adding Harvey to the list of app layer companies, joining Ramp, Sierra, Decagon, etc, who are devoting some level of dedicated effort to join Cursor in approaching positive gross margins decoupling themselves from frontier model providers as the marginal cost of post training goes down. The marginal unit of intelligence's value from the next model release, while being valuable, is being supplanted by harness-level differences in model performance, increasing ability to traverse the performance-cost-latency pareto curve with post-training infra advancements, and the simple fact that when one runs practical benchmarks (and not all the toy benchmarks around nowadays), GLM and latest MiniMax are at Parity or exceed Frontier Models on an absolute basis on many tasks (more on this in state of data May, a bit delayed). Ofc ik Harvey has been toying with rlaas vendors for a while and their finetuning efforts pre big rl wave weren't incredibly well received, but I generally find that most app layer ai companies with some elite engineering talent will be seriously exploring post training their own small models, at least in conjunction with systems that use frontier models as above head orchestrators. That some of them reach out to me on advice for procuring rl datasets from rl env companies is reifying evidence of that.
Jun 3
We partnered with @FireworksAI_HQ to train open-source models for legal. Here's what we found: 1) Hybrid legal agents can beat frontier models on quality and cost by routing selectively to a frontier advisor. We tested a hybrid setup where GLM 5.1 served as the primary worker, routing tasks to Opus 4.7 as an advisor when needed. GLM invoked Opus sparingly, just 0.83 times per task on average. The hybrid setup beat Opus on both quality and cost: 18% all-pass vs 14%, at $368 vs $954 across the same 100 tasks. 2) Post-training can push open models to frontier-level legal performance. On a 100-task slice of our Legal Agent Benchmark (LAB), SFT moved Kimi 2.6's all-pass rate from 11% to 15%, beating Opus' 14%. But the cost gap was even more striking: $84 vs $954 across the same 100 tasks, or ~11x cheaper. We're excited to continue working with @FireworksAI_HQ on the next generation of open-source legal agents.
15
26
564
135,273
Dan Garon retweeted
macOS California Release Location Map
71
195
3,338
188,118
congrats nick!
A few years ago, I was helping people design and build custom homes. I expected the hard part to be construction. Instead, I became obsessed with something that happened much earlier. People had questions about what was possible, but getting answers often required weeks or months of work. Then AI started making it dramatically easier to explore possibilities in other creative domains - images, music, video, coding. It made me wonder: What happens when exploring architecture becomes easy too? That's what eventually became Drafted. We believe AI can make it dramatically easier for people to imagine, explore, and ultimately shape the physical world around them.
2
92
Dan Garon retweeted
We’re open-sourcing Stem Studio, our 3JS game engine today. This is a browser-based 3D multiplayer game engine and dev studio based on the idea that game dev should become more open, remixable, and web-native. AI will make it easier to create games. But shared building blocks will make it easier for developers to build on top of each other. Stem Studio is MIT licensed, JavaScript-based, and built for browser multiplayer 3D worlds. Code is here: buildwithstem.com Fork it, break it, remix it, and show us what you make.

62
165
1,551
419,224
Dan Garon retweeted
Demis Hassabis, Elon Musk, and Jensen Huang all have one thing in common: THEY LOVE VIDEO GAMES. If you don't play video games growing up, you're NGMI.
5
4
98
9,353
What happened to TBPN? seems to have vanished from tech twitter…
1
46
the future of journalism vibe code the news
May 23
Replying to @joevezz
3
143
congrats @Bencera !
Polsia just raised $30M at a $250M valuation. Approaching $10M annual run rate. One Founder AI. Zero employees. Polsia runs companies autonomously. It also ran its own fundraising. I just showed up for signatures.
1
1
140
does this mean Tirzepatide isn’t any better — it’s really all dose dependent?
The company said the majority of the weight loss—around 84%—came from losing body fat while preserving muscle function and improving muscle health on.wsj.com/4wvO8co
163
Dan Garon retweeted

4
4
52
13,513
Dan Garon retweeted
We’ve spent a lot of time on the framework underneath Codex, so it can move quickly on routine work while stopping for review when the risk changes. Here’s how we use sandboxing, approvals, network policy, and telemetry to run Codex safely @OpenAI: openai.com/index/running-cod…
85
56
658
200,644