vu

vu

34 Photos and videos

Tweets

@_vu

Jun 13

what if we never get fable back and only get access via a 23yo anthropic goldman sachs & co FDE

vu

@_vu

Jun 9

someone say something about loops? @claudeai dynamic workflows

vu

@_vu

Jun 5

happening way too frequently @claudeai this is just two active threads and not token maxxing

vu

@_vu

May 27

how many devs are out here rawdogging LLMs like this?

Philipp Schmid

@_philschmid

May 27

Interesting new SWE/agentic benchmark (DeepSWE) was released yesterday. 113 tasks across 91 repos in 5 languages. Here are interesting things I noticed: - The evaluation harness (mini-swe-agent) gives every model a single bash tool and the same SI. No vendor editing primitives. - Eval Prompts are shorter than SWE-Bench Pro, but require 5.5× more code and touch 7 files on average. The idea is to mimic how developers actually talk to agents, short behavioral descriptions, not verbose specs. - SI describes a specific workflow: find code, reproduce, fix, verify, edge cases, submit. This maps directly onto how the verifier grades, which could bias toward models that follow instructions literally over models that explore more. - The bash tool is guarded, outputs over 10k chars get truncated. Malformed tool calls get caught and retried with guidance rather than crashing. To prevent to blow up context. - Mini-swe-agent claims to match or beat 1P harnesses on the same tasks. Claude Opus scored 10pp over Claude Code. Gemini 3.1 Pro scored 20pp over Gemini CLI. Would love to see how other harness × model combinations will do, e.g. @cursor_ai, @antigravity, @FactoryAI and how well the eval harness does on more general knowledge work, e.g. GDPval. Great to see the SWE-agent team keep pushing on both the research and eval side. 🤗

vu

@_vu

May 20

After setting /goal in 4 terminals

0:49

vu

@_vu

May 14

When my Claude 5 hour timer refreshes

Sam Lambert

@samlambert

May 13

i have a feeling that he will be quite good with databases if anyone could put me in touch 🙏🏻

0:20

106

David Fowler

vu retweeted

David Fowler

@davidfowl

May 12

x.com/i/article/205408328059…

145

853

238,124

vu

@_vu

May 12

Can we go back to arguing about fonts and dark vs light mode?

ALT Green Mile Im Tired Boss GIF

vu

@_vu

May 4

github is a slog right now

vu

@_vu

16 Dec 2025

short name community still has a long way to go for true inclusion @certora

162

vu

@_vu

19 Nov 2025

All I want for Christmas is for mobile websites to not tell me how much better the site is if I download the app. Yes, not having your annoying popup cover up half the screen is a better experience. Fuck you go find another method to spy on your users.

247

vu

@_vu

4 Nov 2025

til segue is pronounced segway

192

Sen

vu retweeted

Sen @UseSenFi

3 Nov 2025

Been cooking something with @magicblock for @colosseum Your financial platform shouldn't be able to see your money. So we're building it.

1:26

198

vu

@_vu

24 Sep 2025

Everything alright over there @SolanaDevnet?

169

vu

@_vu

16 Sep 2025

Thanks @claudeai code. Time to context switch and probably touch grass. Or call up one of the side LLMs

173

vu

@_vu

12 Sep 2025

Agent3 with @Replit mobile is a fun way to destroy work life balance x.com/_vu/status/19664481253…

227

vu

@_vu

12 Sep 2025

Blasted through a month of credit in 5 hours. I guess I can take the rest of the month off 😅

101

vu

@_vu

12 Sep 2025

looking at @privy_io or @dynamic_xyz, anyone have any strong opinions? main use case is onboarding new users and then building a crypto -> local fiat payment offramp

120

vu

@_vu

10 Sep 2025

This term.everything project is beautiful. Running any GUI app directly in your terminal buffer via a custom Wayland compositor (TIL about Wayland)? This is the type of shit that makes you believe in the joy of coding. Another excuse for me install Omarchy.

111

vu

@_vu

10 Sep 2025

🔗 github.com/mmulet/term.every… I don't think our hero is on x, but his github repo has some other fun projects. We need more fun stuff and less "how I used AI to generate $1m ARR" on this app.