josh :)

josh :)

516 Photos and videos

Tweets

Pinned Tweet

josh :)

@joshycodes

Apr 27

Collaborating w ChatGPT on branding for a new educational YouTube channel

17,105

josh :)

josh :)

@joshycodes

Jun 11

I now see that this was a story about ASI

josh :)

josh :)

@joshycodes

Jun 9

Week 2 of making my girlfriend breakfast on days she needs to go into office :)

josh :)

@joshycodes

Jun 8

I’ve been making my girlfriend breakfast the one day a week she goes into office :)

170

josh :)

josh :)

@joshycodes

Jun 8

I’ve been making my girlfriend breakfast the one day a week she goes into office :)

686

josh :)

josh :)

@joshycodes

Jun 8

Some reflections on my last year with AI: The work I set out to do has become way more ambitious. As a silly example: This past month, I’ve been training AI using go-explore/RL to beat a Mario Kart world record. (Sadly I haven’t, but I’ve learned quite a lot) A year or two ago, my project would have just been can I get AI to play Mario Kart. A less silly example: For my final graduate class, I set out to build a proof of concept for forecasting earthquakes. I tried creating a foundational model and fine-tuning it for earthquake detection. I don’t think I would have even been able to even attempt something so ambitious that long ago. What’s nice about working on the frontier is I still have to use my brain quite a lot. Routine work that AI can increasingly do If I’m being honest, I don’t expect this to last forever. Someday the hard, brain-engaging part of the frontier might be something AI just does. But for the last year, it has been very fun :)

1,065

Rational Animations

josh :) retweeted

Rational Animations

@RationalAnimat1

Jun 6

Researchers from @OpenAI and @apolloaievals found that, in certain situations, AI models can take covert actions. Additionally, they're sometimes aware they're being tested, which causes them to behave better. Our new video discusses these results and more.

15:52

861

45,745

josh :)

josh :)

@joshycodes

Jun 6

Can’t wait to see Mythos make a pelican riding a bike

josh :)

josh :)

@joshycodes

Jun 4

Please train your models to use codemods like jscodeshift, libCST, etc 😭😭😭 It would make models so much more efficient in large codebases

267

Riley Walz

josh :) retweeted

Riley Walz

@rtwlz

Jun 1

I compiled the dedications at the beginning of thousands of books! walzr.com/dedications

392

28,506

josh :)

josh :)

@joshycodes

May 28

it’s a good day :)

Claude

@claudeai

May 28

Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors. Available today at the same price.

Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

ALT Benchmark table showing how Claude Opus 4.8 compares to its predecessor and to other models on tests of coding, agentic skills, reasoning, and practical knowledge work tasks.

josh :)

josh :)

@joshycodes

May 28

agi really is the ring of power

josh :)

josh :)

@joshycodes

May 23

Please show me the truth labels

Polymarket

@Polymarket

May 23

NEW: Chinese AI pet translating startup claims it can interpret pets' speech with up to 95% accuracy.

244

josh :)

josh :)

@joshycodes

May 20

Can other models solve this problem? When I gave the same problem to 3 other models, each one recognized it as a major open problem and refused to try. If models won't try, how do we even measure what they can do?

OpenAI

@OpenAI

May 20

Today, we share a breakthrough on the planar unit distance problem, a famous open question first posed by Paul Erdős in 1946. For nearly 80 years, mathematicians believed the best possible solutions looked roughly like square grids. An OpenAI model has now disproved that belief, discovering an entirely new family of constructions that performs better. This marks the first time AI has autonomously solved a prominent open problem central to a field of mathematics.

2:38

954

josh :)

josh :)

@joshycodes

May 15

He just caught up to Mythos, now reveal it was only the preview checkpoint

AI Security Institute

@AISecurityInst

May 13

Replying to @AISecurityInst

Our cyber range results illustrate this step-up. Since our first Mythos evaluation, we received access to a newer Mythos Preview checkpoint. On a 32-step corporate network attack we estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.

288

josh :)

josh :)

@joshycodes

May 14

Wait, Claude can generate images now?

𒐪

@SHL0MS

May 12

i just generated an image in the style of a Monet painting using AI please describe, in as much detail as possible, what makes this inferior to a real Monet painting

175

josh :)

josh :)

@joshycodes

May 13

Can we read an AI’s thoughts? My second video for Good Robots is out: A 13-minute visual intro to mechanistic interpretability, made for anyone curious about what’s happening inside AI models.

217

josh :)

josh :)

@joshycodes

May 13

here ya go: youtube.com/watch?v=Mixcmuz3…

AI Is No Longer A Black Box.

You've heard that AI is a black box - that nobody, not even the peo...

youtube.com

109

josh :)

josh :)

@joshycodes

May 12

wait, i have an idea

Daniel San

@dani_avila7

May 11

Claude Code 2.1.139 added /goal You set a completion condition and Claude keeps working across turns until it's met Works in interactive, -p, and Remote Control 👏

227

josh :)

josh :)

@joshycodes

May 11

I guess @huskirl is cooked

Thinking Machines

@thinkymachines

May 11

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/int…

2:15

316

josh :)

josh :)

@joshycodes

May 9

Claude Mythos’ time-horizon is NSFW

METR

@METR_Evals

May 8

We evaluated an early version of Claude Mythos Preview for risk assessment during a limited window in March 2026. We estimated a 50%-time-horizon of at least 16hrs (95% CI 8.5hrs to 55hrs) on our task suite, at the upper end of what we can measure without new tasks.

1,685