I love birds, bridge, bots, beer, bears, beets, Battlestar Galactica. Electrify everything. Helping parents drowning in family logistics: superduperlabs.com

Joined February 2007
236 Photos and videos
Cursor plugin that adds a "Bought it before we knew" bumper sticker to the splash screen
SpaceX has exercised the option to acquire @cursor_ai in an all-stock transaction with the goal of building the world’s most useful AI models. For the past few months, SpaceXAI has been jointly training a model with Cursor, which will be released in Cursor and Grok Build soon. We look forward to working closely with the Cursor team to advance our frontier AI capabilities
1
67
Benjamin Stein retweeted
Today, I signed an Executive Order temporarily repealing bedtimes in the City of New York so that kids of all ages can watch our team in the NBA Finals. As Mayor, you’re forced to make many difficult decisions. This was not one of them. Go Knicks.
2,986
21,701
340,089
14,477,473
I rebuilt "Is it Christmas" using 484 subagents and 16 million tokens to learn how Claude's dynamic workflows work. Spoiler Alert: today is not Christmas. benjaminste.in/isitchristmas… h/t @konklone, as always
1
28
I'm a couple projects into Codex 5.5. Early reactions: * The coding model is very good. It YOLO sideloaded and debugged a new Android widget with no human intervention very very well. * I hate using the mouse. * Way too many back-and-forth questions. I kept finding Codex waiting for me. Like I say "LGTM let's build!" and come back 15 minutes later expecting a finished result but instead found a "Should I get started?"
1
52
When you say you prefer [ Claude Code | Codex ], are you optimizing for the user experience or the quality of the output?
2
1
83
And how would you trade off? If CC Opus consistently generated better code output but you really dislike their interface (or vice versa), which would you use for day-to-day work?
30
How can I get any work done with COYOTE PUPPIES ROMPING IN MY BACKYARD?!?!!!
1
86
Opus Ultracode should have been called "Hold my beer" 108/115 agents done · 41m 21s · ↓ 8.7m tokens
58
"comfortable" is not the word I'd use
72
I forked @WarnerTeddy's awesome bird sound field guide generator. I don't have a Pi yet so I just made a static version ...with Opus 4.8 ULTRACODE with dynamic workflows because why not? Burning 940,000(!) tokens was TOTALLY worth it to watch it do Adversarial verification. Imagining an virtual adversarial wild turkey vs a virtual adversarial chickadee is the epic battle of the species I'm here for.
Replying to @WarnerTeddy
A baby step while we wait for our Pi. Already quite the upgrade!
1
9
2,177
Counterpoint: we have 2 versions of the SuperDuper backend... one is deterministic flow control w/ small agent contexts. The other is Opus a modern harness big ass system prompt permission to go ham. The former is infinity better for cost optimization. And easier to reason about. And to eval. It makes for a great customer experience. But the Opus agent? Buckle up. We see truly emergent behavior every single day. Like batshit crazy cross-correlations, decision making, deductive reasoning, and inference. Nothing we mere mortals could have ever prescribed with a deterministic workflow. It took the product from "great customer experience" to "How in the Cinnamon Toast Fuck did it do that???" (which, to be clear, is my north star)
May 29
Replying to @smthomas3
dont use prompts for control flow if you can use control flow for control flow!
432
Hey @simonw we need you to coin some new terms! Here's 4 new phenomena I've seen in 2026 that we don't have words for yet: 1. The constant feeling of anxiety / FOMO / guilt when your agents aren't running. Just one more prompt! (my working name is Token Processing Underutilization Disorder, or TPU Disorder) 2. The desire that every founder/builder has right now to solopreneur a company with just Claude 3. The desire that every founder/builder has right now to solopreneur 10 products/companies at the same time 4. Our newfound ability to fix/improve every. single. thing. you think is broken in the world. (This month I made a personalized weather app for my watch, a replacement app to control my heat pump, and a google calendar of the Oakland Ballers home game schedule. Just because. I can't stop.) Whatcha got??
8
18
8,101
Benjamin Stein retweeted
just saw a dev using Opus 4.7 🤣 what month is it little bro
100
176
9,520
428,496
The @OpenAI team did a really impressive job slurping up all my local Claude Code configs and projects and importing into Codex on first run. I've been so hesitant to try it b/c of switching costs (e.g. replicating CLAUDE.md, mcp config, etc etc) but they totally nailed it.
3
5
67
46,350
Kudos to the @googlehealth team for adding Ultimate Frisbee 🥏 as a workout activity in the latest app release!
54
Subscribing to the @OaklandBallers calendar of games requires you to auth a service that can read and delete every single one of your personal Google Calendars?! No thank you. Claude banged our a public .ics for me in like 3 minutes. Subscribe away! benjaminste.in/ballers-2026/
63
TIL @posthog has an entire beautiful user interface other than the "Authorize Claude Code" consent screen! Who knew?
1
1
134
Kidding, not kidding. It's truly the best agent/MCP integrated product I've ever used, both inside the product and outside. I use it every day (work, personal, and nonprofit) but haven't logged in to web UI since day 1. It truly feels like a glimpse into the future.
2
3
74
I told my 75yo Mom it's pronounced "Clyde" and she's been telling her friends about it all week and I feel really really guilty but I laugh every single time.
60