Brendan (can/do)

Brendan (can/do)

152 Photos and videos

Tweets

Dan Garon retweeted

Brendan (can/do)

@BrendanFoody

Jun 14

A year ago, I predicted that we would enter The Era of Evals, but it's now happening much faster than I anticipated. Frontier Labs have scaled their Eval production with us by more than 10X in the last 12 months, on what was already a 9-figure base. Every tech-forward enterprise is rapidly building evals for its agents. @mercor_ai is spending over $10M / month on inference and now has Evals for every agent deployment. We need Evals to know (1) what model to use, (2) what context or tools we should add to improve the model, and (3) whether it's working in production. One CTO of a $25B enterprise told me he used to have a product roadmap, but replaced his entire product roadmap with an Eval roadmap.

Brendan (can/do)

@BrendanFoody

30 Jun 2025

Mercor (@mercor_ai) is now working with 6 out of the Magnificent 7, all of the top 5 AI labs, and most of the top application layer companies. One trend is common across every customer: we are entering The Era of Evals. RL is becoming so effective that models will be able to saturate any evaluation. This means that the primary barrier to applying agents to the entire economy is building evals for everything. This will be one of the largest buildouts we have ever seen with enterprises pouring hundreds of billions of dollars into evals for every workflow we want agents to automate. We're quickly defining a new class of work and hiring across nearly every domain: software engineers, consultants, bankers, lawyer, doctors, gamers, and many more.

299

59,533

Dan Garon

Dan Garon

@dangaron

Jun 13

lots more of this coming

The Information

@theinformation

Jun 12

Anthropic's Claude Design launch blindsided its partners, Figma and Canva. “Essentially, Anthropic had kind of told these partners like Figma and Canva that the Claude Design product was going to be fairly basic.” “As it got closer to the launch, these partners found out that actually this new product would have some of those advanced features [Anthropic said wouldn’t be in there].” — @steph_palazzolo, AI reporter

1:23

194

Dan Garon

Dan Garon

@dangaron

Jun 11

excited for this one

Marco Mascorro

@Mascobot

Jun 11

After coding is solved, the next frontier is computer use. Today, we are launching Use Computer, the infra for evaluating and training models to use all kinds of computers 👇

0:25

M.G. Siegler

Dan Garon retweeted

M.G. Siegler

@mgsiegler

Jun 10

A day later and I feel even more confident in this: Siri is going to be the AI that most consumers end up using most of the time (if they have an iPhone). It's the AI you have with you, with access to everything. And yes, it's finally good enough. spyglass.org/siri-ai/

Apple Wins Consumer AI By Default

As the iPhone becomes the first true AI device...

spyglass.org

598

70,903

Harvey

Dan Garon retweeted

Harvey

@harvey

Jun 10

We partnered with @trajectorylabs to post-train NVIDIA Nemotron 3 Ultra for legal. Here’s what we found: 1) Open-weight models can reach frontier legal performance. On our Legal Agent Benchmark (LAB), Nemotron 3 Ultra started at a 0% all-pass rate. After post-training, it reached 5.8%, placing it between Sonnet 4.6 at 4.2% and Opus 4.6 at 6.6%. 2) Post-training dramatically improves reliability. Before training, many held-out tasks missed enough rubric dimensions to land around ~70% pass rates. After training, those tasks shifted toward ~95% pass rates. 3) Open-weight performance comes at much lower cost. Post-trained Nemotron 3 Ultra reached a similar quality band to leading closed models while running at roughly 1/8th to 1/50th the per-token price of Sonnet 4.6 and Opus 4.6. Most importantly: we post-trained this model on the @trajectorylabs platform less than 24 hours after Nemotron 3 Ultra launched, using the same harness, data, and recipe we used for Nemotron 3 Super. More to come as we continue to experiment with open-weight legal agents. Read more on post-training with Trajectory below:

Trajectory

@trajectorylabs

Jun 10

1/ We post-trained @nvidia Nemotron 3 Ultra on @harvey Legal Agent Bench in under 24 hours. The result: an open model reaching the same band as leading closed models on legal work, at a fraction of the cost. The correlating story: when a new open model ships, Trajectory can turn it into a specialized agent almost immediately.

272

46,780

Philip Kiely

Dan Garon retweeted

Philip Kiely

@philipkiely

Jun 10

Fable is a very impressive model. But it is also - Expensive - Slow - Rate limited - Nerfed for important R&D tasks - Not private (retains prompts/responses even for enterprise users) The difference between owned and rented intelligence gets clearer by the day.

371

22,253

Dan Garon

Dan Garon

@dangaron

Jun 10

incredible

SemiAnalysis

@SemiAnalysis_

Jun 9

BREAKING NEWS: Anthropic's latest model will NOT help you if it thinks your ML research/ML engineering is interesting, and/or will secretly degrade its IQ so that the average engineer won't notice. We are already seeing Anthropic's latest model's moderation filters our GPU inference research and programming 😭

Dan Garon

Dan Garon

@dangaron

Jun 5

such an incredible read excited to see what rob cooks up

WIRED

@WIRED

Jun 4

With $500 million in funding and a reported $2.5 billion valuation, Flourish wants to reinvent AI by putting real neurons under the microscope. wired.com/story/jeff-bezos-i…

140

Alec Stapp

Dan Garon retweeted

Alec Stapp

@AlecStapp

Jun 3

This new research on US unicorn startups is really interesting. Some key facts from the report: 1. Immigrants founded or cofounded 455 of America’s 775 privately held billion-dollar startups, equal to 59% of all US unicorns. 2. 66% of all US unicorns were founded or cofounded by immigrants or the children of immigrants. 3. 79% of US unicorns have either an immigrant founder or an immigrant in a key leadership role. 4. The 455 immigrant-founded US unicorns have a combined valuation of $5 trillion. 5. That $5 trillion valuation is larger than the total stock-market value of companies listed in all but 7 countries. 6. Including immigrant-founded unicorns that went public since 2016 pushes the total value above $5.8 trillion. 7. The number of immigrant-founded US unicorns rose from 50 in 2018 to 455 in 2026. 8. 24% of US unicorns have a founder who first came to America as an international student.

114

290

1,058

464,690

Sean Cai

Dan Garon retweeted

Sean Cai

@SeanZCai

Jun 3

Adding Harvey to the list of app layer companies, joining Ramp, Sierra, Decagon, etc, who are devoting some level of dedicated effort to join Cursor in approaching positive gross margins decoupling themselves from frontier model providers as the marginal cost of post training goes down. The marginal unit of intelligence's value from the next model release, while being valuable, is being supplanted by harness-level differences in model performance, increasing ability to traverse the performance-cost-latency pareto curve with post-training infra advancements, and the simple fact that when one runs practical benchmarks (and not all the toy benchmarks around nowadays), GLM and latest MiniMax are at Parity or exceed Frontier Models on an absolute basis on many tasks (more on this in state of data May, a bit delayed). Ofc ik Harvey has been toying with rlaas vendors for a while and their finetuning efforts pre big rl wave weren't incredibly well received, but I generally find that most app layer ai companies with some elite engineering talent will be seriously exploring post training their own small models, at least in conjunction with systems that use frontier models as above head orchestrators. That some of them reach out to me on advice for procuring rl datasets from rl env companies is reifying evidence of that.

Harvey

@harvey

Jun 3

We partnered with @FireworksAI_HQ to train open-source models for legal. Here's what we found: 1) Hybrid legal agents can beat frontier models on quality and cost by routing selectively to a frontier advisor. We tested a hybrid setup where GLM 5.1 served as the primary worker, routing tasks to Opus 4.7 as an advisor when needed. GLM invoked Opus sparingly, just 0.83 times per task on average. The hybrid setup beat Opus on both quality and cost: 18% all-pass vs 14%, at $368 vs $954 across the same 100 tasks. 2) Post-training can push open models to frontier-level legal performance. On a 100-task slice of our Legal Agent Benchmark (LAB), SFT moved Kimi 2.6's all-pass rate from 11% to 15%, beating Opus' 14%. But the cost gap was even more striking: $84 vs $954 across the same 100 tasks, or ~11x cheaper. We're excited to continue working with @FireworksAI_HQ on the next generation of open-source legal agents.

564

135,273

Basic Apple Guy

Dan Garon retweeted

Basic Apple Guy

@BasicAppleGuy

Jun 2

macOS California Release Location Map

195

3,338

188,118

Dan Garon

Dan Garon

@dangaron

May 29

congrats nick!

Nick Donahue

@PrimalNick

May 29

A few years ago, I was helping people design and build custom homes. I expected the hard part to be construction. Instead, I became obsessed with something that happened much earlier. People had questions about what was possible, but getting answers often required weeks or months of work. Then AI started making it dramatically easier to explore possibilities in other creative domains - images, music, video, coding. It made me wonder: What happens when exploring architecture becomes easy too? That's what eventually became Drafted. We believe AI can make it dramatically easier for people to imagine, explore, and ultimately shape the physical world around them.

mark pincus

Dan Garon retweeted

mark pincus

@markpinc

May 28

We’re open-sourcing Stem Studio, our 3JS game engine today. This is a browser-based 3D multiplayer game engine and dev studio based on the idea that game dev should become more open, remixable, and web-native. AI will make it easier to create games. But shared building blocks will make it easier for developers to build on top of each other. Stem Studio is MIT licensed, JavaScript-based, and built for browser multiplayer 3D worlds. Code is here: buildwithstem.com Fork it, break it, remix it, and show us what you make.

165

1,551

419,224

tae kim

Dan Garon retweeted

tae kim

@firstadopter

May 27

Demis Hassabis, Elon Musk, and Jensen Huang all have one thing in common: THEY LOVE VIDEO GAMES. If you don't play video games growing up, you're NGMI.

9,353

Dan Garon

Dan Garon

@dangaron

May 28

What happened to TBPN? seems to have vanished from tech twitter…

Dan Garon

Dan Garon

@dangaron

May 23

the future of journalism vibe code the news

JoeVezz

@joevezz

May 23

Replying to @joevezz

Interactive map: joevezzani.github.io/hazard-…

143

Dan Garon

Dan Garon

@dangaron

May 22

congrats @Bencera !

Ben Cera

@Bencera

May 22

Polsia just raised $30M at a $250M valuation. Approaching $10M annual run rate. One Founder AI. Zero employees. Polsia runs companies autonomously. It also ran its own fundraising. I just showed up for signatures.

1:25

140

Dan Garon

Dan Garon

@dangaron

May 17

does this mean Tirzepatide isn’t any better — it’s really all dose dependent?

The Wall Street Journal

@WSJ

May 17

The company said the majority of the weight loss—around 84%—came from losing body fat while preserving muscle function and improving muscle health on.wsj.com/4wvO8co

163

Redbud VC

Dan Garon retweeted

Redbud VC

@redbudvc

May 13

x.com/i/article/205459322602…

13,513

Fotis Chantzis

Dan Garon retweeted

Fotis Chantzis @ithilgore

May 8

We’ve spent a lot of time on the framework underneath Codex, so it can move quickly on routine work while stopping for review when the risk changes. Here’s how we use sandboxing, approvals, network policy, and telemetry to run Codex safely @OpenAI: openai.com/index/running-cod…

Running Codex safely at OpenAI

How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption.

openai.com

658

200,644