Joined August 2008
34 Photos and videos
Jun 13
what if we never get fable back and only get access via a 23yo anthropic goldman sachs & co FDE
25
Jun 9
someone say something about loops? @claudeai dynamic workflows
20
Jun 5
happening way too frequently @claudeai this is just two active threads and not token maxxing
1
1
36
May 27
how many devs are out here rawdogging LLMs like this?
Interesting new SWE/agentic benchmark (DeepSWE) was released yesterday. 113 tasks across 91 repos in 5 languages. Here are interesting things I noticed: - The evaluation harness (mini-swe-agent) gives every model a single bash tool and the same SI. No vendor editing primitives. - Eval Prompts are shorter than SWE-Bench Pro, but require 5.5Γ— more code and touch 7 files on average. The idea is to mimic how developers actually talk to agents, short behavioral descriptions, not verbose specs. - SI describes a specific workflow: find code, reproduce, fix, verify, edge cases, submit. This maps directly onto how the verifier grades, which could bias toward models that follow instructions literally over models that explore more. - The bash tool is guarded, outputs over 10k chars get truncated. Malformed tool calls get caught and retried with guidance rather than crashing. To prevent to blow up context. - Mini-swe-agent claims to match or beat 1P harnesses on the same tasks. Claude Opus scored 10pp over Claude Code. Gemini 3.1 Pro scored 20pp over Gemini CLI. Would love to see how other harness Γ— model combinations will do, e.g. @cursor_ai, @antigravity, @FactoryAI and how well the eval harness does on more general knowledge work, e.g. GDPval. Great to see the SWE-agent team keep pushing on both the research and eval side. πŸ€—
74
May 20
After setting /goal in 4 terminals

1
47
May 14
When my Claude 5 hour timer refreshes
i have a feeling that he will be quite good with databases if anyone could put me in touch πŸ™πŸ»
106
vu retweeted

25
145
853
238,124
May 12
Can we go back to arguing about fonts and dark vs light mode?

ALT Green Mile Im Tired Boss GIF

33
May 4
github is a slog right now
1
66
16 Dec 2025
short name community still has a long way to go for true inclusion @certora
2
162
19 Nov 2025
All I want for Christmas is for mobile websites to not tell me how much better the site is if I download the app. Yes, not having your annoying popup cover up half the screen is a better experience. Fuck you go find another method to spy on your users.
1
247
4 Nov 2025
til segue is pronounced segway
2
192
vu retweeted
3 Nov 2025
Been cooking something with @magicblock for @colosseum Your financial platform shouldn't be able to see your money. So we're building it.
1
1
198
24 Sep 2025
Everything alright over there @SolanaDevnet?
1
169
16 Sep 2025
Thanks @claudeai code. Time to context switch and probably touch grass. Or call up one of the side LLMs
2
173
12 Sep 2025
Agent3 with @Replit mobile is a fun way to destroy work life balance x.com/_vu/status/19664481253…
1
227
12 Sep 2025
Blasted through a month of credit in 5 hours. I guess I can take the rest of the month off πŸ˜…
1
101
12 Sep 2025
looking at @privy_io or @dynamic_xyz, anyone have any strong opinions? main use case is onboarding new users and then building a crypto -> local fiat payment offramp
1
1
120
10 Sep 2025
This term.everything project is beautiful. Running any GUI app directly in your terminal buffer via a custom Wayland compositor (TIL about Wayland)? This is the type of shit that makes you believe in the joy of coding. Another excuse for me install Omarchy.
1
1
111
10 Sep 2025
πŸ”— github.com/mmulet/term.every… I don't think our hero is on x, but his github repo has some other fun projects. We need more fun stuff and less "how I used AI to generate $1m ARR" on this app.
69