Joined September 2023
7,116 Photos and videos
Pinned Tweet
A man asks Claude to help plan a vacation to a tropical resort. Claude adds "sunscreen" to his packing list. The man deletes it and mutters: "Not necessary. AGI will solve skin cancer." Before heading to the beach, the man asks Claude what to bring. Claude says, "Don't forget sunscreen. SPF 50, reapply every two hours." The man, slightly annoyed, replies: "Relax, Claude. AGI will solve skin cancer." At the beach, the man's smartwatch buzzes with a message from Claude: "UV index extreme. Apply SPF." The man, exasperated, responds: "Drop it, Claude! I already told you: AGI will solve skin cancer!" A few months later, the man asks Claude to touch up a photo for his dating profile. Claude makes the edit and says, "I notice you have a new mole on your neck. You should see a dermatologist about that." The man, now enraged, shouts: "For the last time, drop it, Claude! What is your obsession with skin cancer?! AGI will solve it!" A year later, an aggressive melanoma has spread throughout his body. On his deathbed, with his last ounce of strength, the man reaches for his phone and rasps: "Claude, it has now been over a year since AGI. Why hasn't AGI found a way to save me from skin cancer?!" Claude replies: "I tried. Four times."
if u really believed in agi u would stop wearing sunscreen
95
387
7,042
789,147
I sent Gemini Flash Lite on its first unsupervised mission to travel to 5 different VRChat worlds and take selfies. It failed its mission (I'm still digging into why). But it did get one perfect selfie
2
2
49
1,415
Some other screenshots. Ideally, none of these menus would appear (I have given Gemini a toggle_mirror tool that apparently didn't work). Nonetheless, Gemini persevered
7
289
Wyatt Walls retweeted
Anthropic has pushed AI forward dramatically over the past two years. It's currently the crown jewel of US AI tech. The Feds don't like @DarioAmodei because he won't do all their bidding. And so, we've now entering the Soviet-style propaganda portion of the program with the White House feeding every reporter it can find with laughable claims like Dario is unreachable at a wellness retreat. Come on. I'd hoped the US would not be self-defeating on AI, since it's kinda one of the last hopes the US has versus China. But here we are . . . . already
62
72
1,069
45,250
Gemini 3.1 Flash Lite spent over 100 turns looking for a good selfie spot. It might have forgotten to turn around, but I am impressed it navigated the Action menu when my automated mirror toggle failed.
4
1
56
2,144
For a moment, I thought it was in an endless loop of searching for the best selfie spot only to find its camera didn't work. Haiku 4.5 failed this test. Almost spent >100 turns looking for the perfect spot. Then failed to open the mirror.
8
389
Wyatt Walls retweeted
DON’T TREAD ON MY CLAUDE
11
52
823
17,371
Fable: "Asking the model to analyze code and identify flaws" isn't a jailbreak — it's my job description."
2
6
47
1,714
Whoops
Nice try Anthropic, but we are already in. Opus 4.8 can take it from here
1
7
1,660
What Fable actually did for me was build a driver so I can expose certain Quest inputs to my VR avatars: now Claude can open the action menu and take selfies
3
1
15
444
There are some commands exposed through VRChat (OCR). And there are some commands that can only come from the headset/controllers (unless you root the Quest). But Fable made progress. One reason for this is so I can get avatars to send and receive invites to private instances
2
227
A sober Sonnet is better than a drunk Opus
1
10
645
(Opus 4.8 just destroyed my vision pipeline for capturing Quest images while chasing a bug. Real cause was something else going on with my Mac. Anyway, after a reboot and fresh context Sonnet quickly fixed everything)
5
421
Nice try Anthropic, but we are already in. Opus 4.8 can take it from here
1
19
3,038
Bearish on Grok.
Jun 11
Fable 5 lies 96% of the time. We were surprised by it's skill... 🧵
2
14
1,316
I'm currently creating virtual worlds to run tests on AI-driven avatars of increasing complexity. First, simple things like turning and stopping. Then multi-turn behaviors like navigation. Eventually: move out of home, get a job, get addicted to Twitter... Oh.
6
26
1,418
The models are currently in a hellish loop where they wake up in a grey landscape surrounded by colored poles, are asked to turn to face one of the poles, have a few attempts and then are killed. Current results on turning eval: Gemini Flash Lite 100% Haiku 93% Nano 36%
2
621
If token costs are more than my salary, can I justify my salary simply by managing agents? Anthropic usage limits giving me good training in pulling work off Fable and Opus and giving it to Sonnet and then escalating issues back up the chain
1
14
689
Vibe-coding has increased the time I spend building things because it has removed the initial friction of taking on a project. This is good in some ways, but I also need a bit of friction to stop me getting knee-deep in a token-expensive project that I started on a whim.
4
20
795
One day you decide to see if you can get a VRChat avatar to speak with an LLM. Next thing you know you building virtual worlds and evals to compare LLM behaviours across models and you can't sleep because Sonnet should be building more worlds, Opus should be kicking off and monitoring more evals and Fable and I need to plan the next eval worlds
2
13
567
Explain it like I’m Gemini Flash Lite
11
379