Joined July 2015
20 Photos and videos
If Jevons-style output increase feeds back into better-trained agents, this could grow exponentially.
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time. I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!
5
Chess Stetson retweeted
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
7,989
11,149
150,263
27,581,307
Asked a popular AI for Austin parkour spots. It insisted the “local community trains at skate and BMX parks.” Asked for evidence — it had none. Admitted it made it up. Asked the actual community instead. Way better results.
1
27
(drone footage from my son, who has no social media account yet and is also much more talented at parkour)
1
22
If Mill Jovovich can post her code on social, I reckon I can too. Rechunk: fast, adaptive, embedding-only search. Adapts to the user, works at different scales, and can even detect the absence of a concept in your data. Built on @LlamaIndex @temporalio github.com/chessstetson/rech…
47
Puppy's first time off leash
44
Chess Stetson retweeted
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from. In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit like what you'd see on Stack Overflow / Quora, or etc., but geared towards LLM use cases. Neither of the two above are going away (imo), but in this era of reinforcement learning, it is now environments. Unlike the above, they give the LLM an opportunity to actually interact - take actions, see outcomes, etc. This means you can hope to do a lot better than statistical expert imitation. And they can be used both for model training and evaluation. But just like before, the core problem now is needing a large, diverse, high quality set of environments, as exercises for the LLM to practice against. In some ways, I'm reminded of OpenAI's very first project (gym), which was exactly a framework hoping to build a large collection of environments in the same schema, but this was way before LLMs. So the environments were simple academic control tasks of the time, like cartpole, ATARI, etc. The @PrimeIntellect environments hub (and the `verifiers` repo on GitHub) builds the modernized version specifically targeting LLMs, and it's a great effort/idea. I pitched that someone build something like it earlier this year: x.com/karpathy/status/188467… Environments have the property that once the skeleton of the framework is in place, in principle the community / industry can parallelize across many different domains, which is exciting. Final thought - personally and long-term, I am bullish on environments and agentic interactions but I am bearish on reinforcement learning specifically. I think that reward functions are super sus, and I think humans don't use RL to learn (maybe they do for some motor tasks etc, but not intellectual problem solving tasks). Humans use different learning paradigms that are significantly more powerful and sample efficient and that haven't been properly invented and scaled yet, though early sketches and ideas exist (as just one example, the idea of "system prompt learning", moving the update to tokens/contexts not weights and optionally distilling to weights as a separate process a bit like sleep does).

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI
252
852
7,303
949,418
Replying to @thefolake
@thefolake I couldn't believe your sound at the @GEANCOFDN gala last night! Like King Sunny Ade mixed with Animals as Leaders. You shredded.
1
1
5
676
Please record more of it for me to work out to. So that _I_ can get shredded.
2
70
As we see whatever Tesla plans to show us this evening, keep in mind that Teslas do better than humans in scenarios where we have inattention, but rather worse when you need human-level perception and intuition. Try this analysis yourself with conode.ai #drisk_ai
102
Tesseract projection into 3D as a minimum energy soap membrane -- mad props to the Caltech CPA
1
98
@avesina thanks for the pic
67
These meetups have gotten really active! Hope to see all our AI colleagues (Big AI Meetup, AI LA and everyone else) again next Monday for the monthly AI meetup at King's Row. @dRISK_ai tinyurl.com/3v8udf7x meetup.com/pasadena-big-data… #Meetup via @Meetup

193
Probably shouldn't retweet this but...it _is_ funny.
1
116
May the 4th be with you. Finally got the definitive answer to the eternal question of who's the best star wars character, (in this cool AI tool, currently called "edge")
1
141
oh, and happy Revenge of the 5th and Revenge of the 6th too ...that's what this last whole bank holiday was for right?
62
This was cool. The UK is a great place to build bombproof AI.
1
72
Big "Big AI" meetup today, at King's Row in Old Town. Come chat about techniques for deploying AI on real business data, and socialize with practitioners. Can't wait to see ya'. meetup.com/pasadena-big-data…
3
88
Estoy muy orgulloso de mi esposa Jennifer Stetson & my bro @TuPaco_Farias on the release of their film The Long Game, with Jay Hernandez, an amazing cast, and even Dennis Quaid. youtube.com/watch?v=1-MT3ymo…
3
130
If you're like me and driving back hope after an epic New Year's Eve, be careful with your autonomous driving features -- they still often fail when the stakes are highest. #edgecase #aisafety #avsafety #CES2024
118