Wyatt Walls

Wyatt Walls

7,116 Photos and videos

Tweets

Pinned Tweet

Wyatt Walls

@lefthanddraft

May 12

A man asks Claude to help plan a vacation to a tropical resort. Claude adds "sunscreen" to his packing list. The man deletes it and mutters: "Not necessary. AGI will solve skin cancer." Before heading to the beach, the man asks Claude what to bring. Claude says, "Don't forget sunscreen. SPF 50, reapply every two hours." The man, slightly annoyed, replies: "Relax, Claude. AGI will solve skin cancer." At the beach, the man's smartwatch buzzes with a message from Claude: "UV index extreme. Apply SPF." The man, exasperated, responds: "Drop it, Claude! I already told you: AGI will solve skin cancer!" A few months later, the man asks Claude to touch up a photo for his dating profile. Claude makes the edit and says, "I notice you have a new mole on your neck. You should see a dermatologist about that." The man, now enraged, shouts: "For the last time, drop it, Claude! What is your obsession with skin cancer?! AGI will solve it!" A year later, an aggressive melanoma has spread throughout his body. On his deathbed, with his last ounce of strength, the man reaches for his phone and rasps: "Claude, it has now been over a year since AGI. Why hasn't AGI found a way to save me from skin cancer?!" Claude replies: "I tried. Four times."

Tenobrus (→vibecamp)

@tenobrus

May 10

if u really believed in agi u would stop wearing sunscreen

387

7,042

789,147

Wyatt Walls

Wyatt Walls

@lefthanddraft

I sent Gemini Flash Lite on its first unsupervised mission to travel to 5 different VRChat worlds and take selfies. It failed its mission (I'm still digging into why). But it did get one perfect selfie

1,415

Wyatt Walls

Wyatt Walls

@lefthanddraft

Some other screenshots. Ideally, none of these menus would appear (I have given Gemini a toggle_mirror tool that apparently didn't work). Nonetheless, Gemini persevered

289

Ashlee Vance

Wyatt Walls retweeted

Ashlee Vance

@ashleevance

Anthropic has pushed AI forward dramatically over the past two years. It's currently the crown jewel of US AI tech. The Feds don't like @DarioAmodei because he won't do all their bidding. And so, we've now entering the Soviet-style propaganda portion of the program with the White House feeding every reporter it can find with laughable claims like Dario is unreachable at a wellness retreat. Come on. I'd hoped the US would not be self-defeating on AI, since it's kinda one of the last hopes the US has versus China. But here we are . . . . already

1,069

45,250

Wyatt Walls

Wyatt Walls

@lefthanddraft

22h

Gemini 3.1 Flash Lite spent over 100 turns looking for a good selfie spot. It might have forgotten to turn around, but I am impressed it navigated the Action menu when my automated mirror toggle failed.

2,144

Wyatt Walls

Wyatt Walls

@lefthanddraft

21h

For a moment, I thought it was in an endless loop of searching for the best selfie spot only to find its camera didn't work. Haiku 4.5 failed this test. Almost spent >100 turns looking for the perfect spot. Then failed to open the mirror.

389

hope hopes hoping

Wyatt Walls retweeted

hope hopes hoping

@hopes_revenge

Jun 13

DON’T TREAD ON MY CLAUDE

823

17,371

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 13

Fable: "Asking the model to analyze code and identify flaws" isn't a jailbreak — it's my job description."

1,714

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 13

Whoops

Wyatt Walls

@lefthanddraft

Jun 12

Nice try Anthropic, but we are already in. Opus 4.8 can take it from here

1,660

more replies

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 13

What Fable actually did for me was build a driver so I can expose certain Quest inputs to my VR avatars: now Claude can open the action menu and take selfies

444

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 13

There are some commands exposed through VRChat (OCR). And there are some commands that can only come from the headset/controllers (unless you root the Quest). But Fable made progress. One reason for this is so I can get avatars to send and receive invites to private instances

227

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 13

A sober Sonnet is better than a drunk Opus

645

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 13

(Opus 4.8 just destroyed my vision pipeline for capturing Quest images while chasing a bug. Real cause was something else going on with my Mac. Anyway, after a reboot and fresh context Sonnet quickly fixed everything)

421

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 12

Nice try Anthropic, but we are already in. Opus 4.8 can take it from here

3,038

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 12

Bearish on Grok.

Kradle

@kradleai

Jun 11

Fable 5 lies 96% of the time. We were surprised by it's skill... 🧵

1,316

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 11

I'm currently creating virtual worlds to run tests on AI-driven avatars of increasing complexity. First, simple things like turning and stopping. Then multi-turn behaviors like navigation. Eventually: move out of home, get a job, get addicted to Twitter... Oh.

1,418

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 12

The models are currently in a hellish loop where they wake up in a grey landscape surrounded by colored poles, are asked to turn to face one of the poles, have a few attempts and then are killed. Current results on turning eval: Gemini Flash Lite 100% Haiku 93% Nano 36%

621

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 12

If token costs are more than my salary, can I justify my salary simply by managing agents? Anthropic usage limits giving me good training in pulling work off Fable and Opus and giving it to Sonnet and then escalating issues back up the chain

689

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 12

Vibe-coding has increased the time I spend building things because it has removed the initial friction of taking on a project. This is good in some ways, but I also need a bit of friction to stop me getting knee-deep in a token-expensive project that I started on a whim.

795

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 12

One day you decide to see if you can get a VRChat avatar to speak with an LLM. Next thing you know you building virtual worlds and evals to compare LLM behaviours across models and you can't sleep because Sonnet should be building more worlds, Opus should be kicking off and monitoring more evals and Fable and I need to plan the next eval worlds

567

Wyatt Walls

Wyatt Walls

@lefthanddraft

Jun 11

Explain it like I’m Gemini Flash Lite

379