Cua

Cua

21 Photos and videos

Tweets

Win Wang retweeted

Cua

@trycua

11h

1/ Today we're launching Cua-Bench with @SnorkelAI: a benchmark for computer-use agents on professional software, open for any model to run. The benchmark covers 25 expert-authored KiCad tasks, and the best frontier model we tested cleared only 6 of them.

10,553

Win Wang

Win Wang

@Winium

Jun 14

Wow, we have tweet embeds published in books now!

Win Wang

Win Wang

@Winium

Jun 12

Consider the humble TaskList tool, compared to Skills: - also durable across compaction - also naturally visible to the model and privileged by the harness - cache safe / easy and cheap to mutate (...I think) - your agent can use them to hide info from you (...wait...)

Win Wang

Win Wang

@Winium

Jun 12

Not quite "formal methods", but I recently had Codex Lean-ify a (small) proof of mine. Unsurprising, but it was satisfyingly easy. Now, I guess I should actually go learn Lean.

Win Wang

Win Wang

@Winium

Jun 12

I've been betting on going HAM with the Scala 3 type system (and other static analyses) to constrain agentic outputs, reduce test-the-impl habits of AI, and automatically "attract" AI towards high-quality codebase patterns. Glad to see others double down on types in the AI age.

Yaron (Ron) Minsky

@yminsky

Jun 11

Of course, we can do some pretty cool things with type systems! Here's a nice talk from Dolan on that: youtube.com/watch?v=W5li5LBY… Indeed, our experience with agents and types is part of what makes us excited to see what we can do with yet more powerful methods.

164

Win Wang

Win Wang

@Winium

Jun 12

Relatedly (to types), turns out a category-theoretic approach to distributed systems might be viable.

Diana

Win Wang retweeted

Diana

@sdianahu

Jun 11

I'm deeply grateful and honored YC has changed my life twice: first as a founder, and now as someone who gets to back and support ambitious talent that's unproven feel lucky to do this work, and even luckier to do it with people I admire so much

Y Combinator

@ycombinator

Jun 11

We're excited to announce Diana Hu (@sdianahu) as YC's newest Managing Partner. Diana co-founded Escher Reality (YC S17), which was acquired by Niantic, where she shipped AR to the 100M people playing Pokémon GO. Since returning to YC as a partner, she has worked with nearly 230 companies that are now worth a combined $7 billion. Few people have built a startup from zero and also shipped at global scale. Diana has done both. ycombinator.com/blog/diana-h…

184

1,452

176,883

Win Wang

Win Wang

@Winium

Jun 11

Using agents to enable strict equality everywhere feels... very satisfying. Something something agents are just the third Futamura projection.

Cua

Win Wang retweeted

Cua

@trycua

Jun 10

1/2 The Cua team is excited to join @aiDotEngineer World’s Fair on July 1 in SF to talk Computer-Use 2.0. CUAs are moving beyond screenshot cursor loops toward window-scoped loops orchestrated by coding agents. We’ll share what changed and what becomes possible next.

3,182

Win Wang

Win Wang

@Winium

Jun 5

I've been calling this "thought-terminating flattery", and it's quite annoying: Me: <adds random suggestion> Claude: "that's a better frame than anything I've said." <proceeds to just explain what I suggested and stop>

295

Win Wang

Win Wang

@Winium

Jun 4

Shit AIs say... "the issue is subtle: __.lint mutates, we should just check and include any edits then rerun verification, instead of downgrading to <other tool>" I didn't realize downgrading verification steps was merely a "subtle" issue, haha.

101

Win Wang

Win Wang

@Winium

Jun 1

Is horizontal scaling just a comonad?

130

Win Wang

Win Wang

@Winium

Jun 1

Claude Design: please we need some dark mode, it improves performance.

126

Win Wang

Win Wang

@Winium

May 28

AI has made math exploration infinitely more fun for me. I also find different LLM "harness" systems have very different flavors for doing so, which is definitely the more interesting situation to be in.

133

Win Wang

Win Wang

@Winium

May 28

Is the natural proofs barrier the most insane "skill issue" statement to date or what? They really went full Rick Sanchez with "we have a mathematical proof of your inadequacy".

120

Mitchell Hashimoto

Win Wang retweeted

Mitchell Hashimoto

@mitchellh

May 15

I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.

512

1,902

15,331

1,587,161

Y Combinator

Win Wang retweeted

Y Combinator

@ycombinator

May 14

We asked a dozen DevTool founders from companies like @RevenueCat, @greptile, @firecrawl, @infisical, @ollama, @resend, @mintlify, @UnslothAI, @porterdotrun, and @recallai, about the state of AI agents and the future of software engineering. In this episode of Founder FAQ, we covered everything from agents as customers and the end of coding, to advice for founders starting out and what they're most excited about going forward. Their answers might surprise you. 00:00 – Meet the Founders 03:00 – Building for Agents First 04:22 – Biggest Early Mistakes 07:15 – Do Founders Still Write Code? 09:22 – Most Unexpected AI Discoveries 12:09 – What's Underrated Right Now 14:38 – Predictions & What's Next

20:29

180

27,357

Cua

Win Wang retweeted

Cua

@trycua

Apr 23

We're open-sourcing Cua Driver - our new macOS driver that lets any agent (Claude Code, Codex, your own loop) drive any app in the background, with true multi-player and multi-cursor built-in. 1/8

174

1,719

240,216