sarah guo

sarah guo

1,046 Photos and videos

Tweets

David Gasca retweeted

sarah guo

@saranormous

Jun 15

it’s poetic that the answer “spend more test time compute” is so powerful most things respond well to human effort, too many apparent human limits are not “capability” limits. they’re budget limits (will, time, patience)

155

15,158

David Gasca

David Gasca

@gasca

Jun 9

First response from Fable 5 after asking for a new project: "I'd love to build this." -- glazing 10/10

Alex Albert

David Gasca retweeted

Alex Albert

@alexalbert__

Jun 9

I've been at Anthropic through every model launch. There's been a few cases I can remember of a launch that stands out and marks a step-change in how we use models: - Claude Opus 3 - Claude Sonnet 3.5 - Claude Opus 4.5 And now Claude Fable 5. With Fable, the model stopped feeling like a tool I direct and started feeling more like something I collaborate with.

Claude

@claudeai

Jun 9

Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.

0:20

193

128

3,274

605,969

Dan Shipper 📧

David Gasca retweeted

Dan Shipper 📧

@danshipper

Jun 9

Replying to @felixrieseberg

it's a great model!! our vibe check after a week of testing: x.com/danshipper/status/2064…

Dan Shipper 📧

@danshipper

Jun 9

BREAKING: Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world. We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check: - It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62. - It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot. - Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us. - Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that. - It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you. - It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing. - It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it. Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable. The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it. Want our full vibe check with all of our testing and benchmarks? Read it on @every: every.to/vibe-check/anthropi…

16:37

11,737

Felix Rieseberg

David Gasca retweeted

Felix Rieseberg

@felixrieseberg

Jun 9

With Fable 5, I've personally moved on to responsibilities or "loops". I no longer tell Claude to investigate a particular crash report. It runs in a loop, watching every crash report that comes in. Its job is no longer to help me fix a crash, it's to keep our apps from crashing.

674

306,556

David Gasca

David Gasca

@gasca

Jun 4

This is huge

OpenAI Developers

@OpenAIDevs

Jun 4

More of the iOS app loop, now inside Codex. The Build iOS Apps plugin lets Codex view and test your iOS app in the in-app browser, open SwiftUI previews, and hot reload edits without leaving Codex.

0:42

395

Anthropic

David Gasca retweeted

Anthropic

@AnthropicAI

Jun 4

Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor. It’s happening faster than we thought, and the implications deserve greater attention. anthropic.com/institute/recu…

When AI builds itself

Our progress toward recursive self-improvement, and its implications.

anthropic.com

1,772

4,654

28,645

18,518,131

Lee Robinson

David Gasca retweeted

Lee Robinson

@leerob

Jun 3

"Engineering, product, and design are all merging into a 'builder' role" Yeah... I'm not so sure. This feels like an oversimplification and podcast talking point. Reality is a lot more complex. Even with 1000 "Member of Technical Staff" titles, someone still has to wake up and care 100x more about Product or Design than anyone else. It is their Main Thing™ That's not to say MTS titles are universally bad, but I think they're an example of this 'builder' talking point that's become bastardized. AI and coding agents have made generating code easy and yet... you're in for a world of pain if non-engineers ship a bunch of slop and don't have great engineers to tame the complexity. The SF hivemind has a tendency to overfit what works at startups for every company. And to be fair, sometimes this is true! Startups can be a leading indicator for how the industry is changing and often cause disruption. However, it is going to be incredibly hard to disrupt the extremely human parts of corporate jobs. You really think there's going to be a PM who also does some engineering and design on the side at JPMorgan Chase? This is true for the simple parts of most jobs, like people wanting to have ownership over something and do good work, move up a career ladder, support their family, get paid well, make an honest living... And also the hard parts: internal politics, some critical business system that has a bus factor of 1 which has been running for 15 years and isn't documented anywhere because it's that guy's job security. The real world has a lot of this stuff. It's easy to pontificate about all roles collapsing but it's actually really nice to have a specific person or team who is an expert in one thing that you can work with. I don't expect that to change. Further, I think AI disruption to knowledge work will take decades to play out because it is more fundamental to the human condition (e.g. sociological/organizational) than pure intelligence.

142

109

1,377

144,541

Thariq

David Gasca retweeted

Thariq

@trq212

Jun 2

x.com/i/article/206185053570…

260

1,353

10,393

3,063,156

OpenAI

David Gasca retweeted

OpenAI

@OpenAI

Jun 2

Building apps has never been easier. With Sites, Codex can turn your work, ideas, and plans into an interactive website or app your team can explore, use, and share with a URL. Rolling out to Business and Enterprise plans, before expanding more broadly.

0:46

965

1,864

19,535

9,515,762

Alexander Chen

David Gasca retweeted

Alexander Chen

@alexanderchen

May 29

Gemini Omni 🐦 prompt in 🧵

0:09

162

1,369

215,671

ClaudeDevs

David Gasca retweeted

ClaudeDevs

@ClaudeDevs

May 28

New in Claude Code (research preview): dynamic workflows. Claude writes an orchestration script on the fly, then spins up a large fleet of coordinated subagents in parallel to take on your most complex tasks. Use the word "workflow" in a prompt to get started.

371

945

10,504

4,095,615

David Gasca

David Gasca

@gasca

May 27

excellent episode - @danshipper is living the frontier so worth listening to him call bullshit on the hype while also showing the edge I really liked the middle section describing why the model benchmarks don't actually capture reality (i.e., automation requires human babysitters; models don't paint outside the box)

Lenny Rachitsky

@lennysan

May 24

Automation is a lie. CLIs are over. The SaaSpocalypse is dumb. A year ago @danshipper came on the podcast to predict where AI was heading. He was remarkably right—including the call that everyone was sleeping on Claude Code. Dan has a unique lens into where things are going because his team at @every is possibly the most AI-pilled group of people in tech. I always learn a ton talking to Dan. So I brought him back for round two. We'll score these in exactly a year: 🔸 Every company will have one “super-agent” in Slack. 🔸 Codex and Claude Code will become the new operating system for knowledge work. 🔸 The AI job apocalypse is not happening. 🔸 PMs and designers will thrive. 🔸 We will read way more AI-generated writing and we will like it. 🔸 "I would buy SaaS stocks right now." Listen now 👇 youtube.com/watch?v=4D3hDmGh…

3,442

David Gasca

David Gasca

@gasca

May 27

Insightful post by Kasey on post-spec driven dev "The team is left with the same problem, an ongoing struggle to align on and preserve the “theory of the program” they are driving towards. Code is an insufficient surface for shared understanding, and every tool that operates at the level of code inherits that limitation."

kasey

@kaseyklimes

May 27

x.com/i/article/205961015028…

1,587

David Gasca

David Gasca

@gasca

May 21

moving from it's a prompt-skill issue to a "you didn't even ask" issue

David Gasca

@gasca

May 21

The prompt for the Erdos problem was basically just stating the problem. yikes cdn.openai.com/pdf/74c24085-…

506

David Gasca

David Gasca

@gasca

May 21

The prompt for the Erdos problem was basically just stating the problem. yikes cdn.openai.com/pdf/74c24085-…

2,558

Noam Brown

David Gasca retweeted

Noam Brown

@polynoamial

May 20

Today, we’re sharing that a general-purpose internal @openai model achieved a breakthrough on one of the best-known combinatorial geometry problems. Less than 1 year ago frontier AI models were at IMO gold-level performance. I expect this pace of progress to continue.

OpenAI

@OpenAI

May 20

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

2:38

194

2,397

449,336

Alexander Chen

David Gasca retweeted

Alexander Chen

@alexanderchen

May 19

One of my favorite Omni tests was turning my fern into a musical instrument with @AnezkaMin93 🎵🌱 Sound on 🔊

Google AI

@GoogleAI

May 19

Replying to @GoogleAI

Reference anything: Gemini Omni extends Gemini's native multimodality, allowing you to blend combinations of text, audio, image, and video inputs into a high-quality, consistent video.

0:08

2,512

Google Flow

David Gasca retweeted

Google Flow

@FlowbyGoogle

May 19

Image editors, video resizers, custom shaders… if you can think of it, you can now build it. Introducing Tools in Google Flow. Starting today, you can describe the tools you wish you had, and Google Flow will help build them for your workflows. Explore our Tools gallery for inspiration, including some from fellow creatives, share your own, or even remix existing Tools to fit your needs. Learn how to get started in the thread below #GoogleIO

1:10

186

1,965

253,521