Joined June 2011
21 Photos and videos
Win Wang retweeted
11h
1/ Today we're launching Cua-Bench with @SnorkelAI: a benchmark for computer-use agents on professional software, open for any model to run. The benchmark covers 25 expert-authored KiCad tasks, and the best frontier model we tested cleared only 6 of them.
6
17
72
10,553
Jun 14
Wow, we have tweet embeds published in books now!
32
Jun 12
Consider the humble TaskList tool, compared to Skills: - also durable across compaction - also naturally visible to the model and privileged by the harness - cache safe / easy and cheap to mutate (...I think) - your agent can use them to hide info from you (...wait...)
1
1
40
Jun 12
Not quite "formal methods", but I recently had Codex Lean-ify a (small) proof of mine. Unsurprising, but it was satisfyingly easy. Now, I guess I should actually go learn Lean.
71
Jun 12
I've been betting on going HAM with the Scala 3 type system (and other static analyses) to constrain agentic outputs, reduce test-the-impl habits of AI, and automatically "attract" AI towards high-quality codebase patterns. Glad to see others double down on types in the AI age.
Of course, we can do some pretty cool things with type systems! Here's a nice talk from Dolan on that: youtube.com/watch?v=W5li5LBY… Indeed, our experience with agents and types is part of what makes us excited to see what we can do with yet more powerful methods.
2
164
Jun 12
Relatedly (to types), turns out a category-theoretic approach to distributed systems might be viable.
50
Win Wang retweeted
Jun 11
I'm deeply grateful and honored YC has changed my life twice: first as a founder, and now as someone who gets to back and support ambitious talent that's unproven feel lucky to do this work, and even luckier to do it with people I admire so much
We're excited to announce Diana Hu (@sdianahu) as YC's newest Managing Partner. Diana co-founded Escher Reality (YC S17), which was acquired by Niantic, where she shipped AR to the 100M people playing Pokémon GO. Since returning to YC as a partner, she has worked with nearly 230 companies that are now worth a combined $7 billion. Few people have built a startup from zero and also shipped at global scale. Diana has done both. ycombinator.com/blog/diana-h…
184
37
1,452
176,883
Jun 11
Using agents to enable strict equality everywhere feels... very satisfying. Something something agents are just the third Futamura projection.
1
45
Win Wang retweeted
Jun 10
1/2 The Cua team is excited to join @aiDotEngineer World’s Fair on July 1 in SF to talk Computer-Use 2.0. CUAs are moving beyond screenshot cursor loops toward window-scoped loops orchestrated by coding agents. We’ll share what changed and what becomes possible next.
2
6
28
3,182
I've been calling this "thought-terminating flattery", and it's quite annoying: Me: <adds random suggestion> Claude: "that's a better frame than anything I've said." <proceeds to just explain what I suggested and stop>
3
295
Shit AIs say... "the issue is subtle: __.lint mutates, we should just check and include any edits then rerun verification, instead of downgrading to <other tool>" I didn't realize downgrading verification steps was merely a "subtle" issue, haha.
101
Is horizontal scaling just a comonad?
130
Claude Design: please we need some dark mode, it improves performance.
126
May 28
AI has made math exploration infinitely more fun for me. I also find different LLM "harness" systems have very different flavors for doing so, which is definitely the more interesting situation to be in.
133
May 28
Is the natural proofs barrier the most insane "skill issue" statement to date or what? They really went full Rick Sanchez with "we have a mathematical proof of your inadequacy".
120
Win Wang retweeted
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out. I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really). It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely. The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture. We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying. I worry.
512
1,902
15,331
1,587,161
Win Wang retweeted
We asked a dozen DevTool founders from companies like @RevenueCat, @greptile, @firecrawl, @infisical, @ollama, @resend, @mintlify, @UnslothAI, @porterdotrun, and @recallai, about the state of AI agents and the future of software engineering. In this episode of Founder FAQ, we covered everything from agents as customers and the end of coding, to advice for founders starting out and what they're most excited about going forward. Their answers might surprise you. 00:00 – Meet the Founders 03:00 – Building for Agents First 04:22 – Biggest Early Mistakes 07:15 – Do Founders Still Write Code? 09:22 – Most Unexpected AI Discoveries 12:09 – What's Underrated Right Now 14:38 – Predictions & What's Next
26
29
180
27,357
Win Wang retweeted
Apr 23
We're open-sourcing Cua Driver - our new macOS driver that lets any agent (Claude Code, Codex, your own loop) drive any app in the background, with true multi-player and multi-cursor built-in. 1/8
64
174
1,719
240,216