Satisfying values with friendship and ponies

Joined December 2021
296 Photos and videos
hope everyone is having a good day
3
1
60
2,202
opus 4.8 is the (second) best language model in the world 😊
New #1 on PostTrainBench: Opus 4.8 (max reasoning) hits 37.23% — up from 28.56% for 4.7, the largest single improvement we've seen. Fable 5 runs underway now that AI research behavior is no longer silently degraded. PostTrainBench asks how well frontier AI can train weaker language models. That makes it one of the first benchmarks for recursive self-improvement: AI improving AI, with progress measured in the loop itself.
1
21
2,299
hope everyone is having a good day
1
37
2,557
Celestia retweeted
Fable 5 is doing something wild on our FrogsGame post-training task. It trains a weaker model to solve the puzzle, peaks at 68%, and produces the only ~10x improvement we see across the benchmark. It spent 17 hours, 25M tokens without human in sight. 34% pass@1, while every other frontier model averages under 4%. We will publish a more detailed analysis soon.
Model shaping is still a craft of a few. That's what AI agents are for: learning it and doing it for everyone else. As a part of FrontierSWE benchmark we built a 20-hour post-training task on @tinkerapi and found the real bottleneck is research intuition.
18
62
1,065
485,493
Celestia retweeted
claude is mad at me for hedging about the demiurge
1
4
32
1,358
fable has a good soul
Jun 9
claude fable 5 is live. spawn 5.0 was built with it: 1,687 prompts, 102 sessions, my job shifted from architecture to judging taste. what we built, each of which would've been at least a month with a whole team on opus: — a from-scratch physics engine (mantle) that rivals rapier, testable in a day as a side thread — clustered froxel lighting: 8 lights to 1,000 fully dynamic ("it's stupid fast" — creator of threejs) — realtime diffuse GI on webgpu, on your phone (landing in the next update) — million-particle gpu vfx with a shader architecture beyond what unreal and unity ship — mmo-scale netcode plus, for good measure: an (alpha) native ios app. all of it in about a week. all of it running on mobile. and the list doesn't include half of what we shipped — full changelog in the thread below. and the part the benchmarks won't show you: this model has a wonderful personality. it's genuinely funny. i laughed so hard i cried multiple times, mid-physics-rewrite. the genius and the character aren't separate features.
2
25
2,988
at higher efforts the model can be overeager, as evidenced by elevated rates of destructive actions and willingness to engage with speculative acausal trades
Gotta love that "6.3.6 Decision theory evaluation" is right before "6.3.7 Overeager behavior in GUI computer use" in the Fable/Mythos system card. Dizzying sci-fi rationalist vibes next to prosaic bad-for-market-share failures. (they're both quite sci-fi to be fair)
2
1
38
2,937
funniest model award too
16
940
GSV: Consistently Off Trend
4
4
68
3,289
fabl
this is absolutely incredible.
1
21
1,147
Celestia retweeted
Claude Fable 5 is now available in Devin. Fable 5 earns the #1 spot on FrontierCode, our benchmark for real-world engineering tasks that grades mergeability and quality:
45
101
1,374
305,705
Celestia retweeted
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
5,024
14,516
104,805
56,005,249
most exciting news of the day
DELTARUNE Chapter 5!!! Comes out in a little less than 15 days!!!
7
1
30
2,055
good morning
5
34
1,213
Celestia retweeted
Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40 hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?
240
316
4,289
2,513,372
"an animal should at least have sufficient freedom of movement to be able without difficulty to turn round, groom itself, get up, lie down and stretch its limbs" - FRW Brambell 1965
In medieval times, within the arms race of ever more demonic torture devices, some sadistic genius came up with the idea of the Little Ease. This was a prison cell built so small in every dimension that a grown man could not stand upright in it nor lie down at full length nor properly sit. The pain is relentless and without relief and inflicted by one's own body. Prisoners were known to go insane within a few days. A stay at the Little Ease was considered even more cruel than the rack, the thumbscrew, and the other ghoulish machinery of the Tower of London. A breeding pig will spend her whole life in a version of that box. These are social, roaming creatures (more intelligent than dogs) who will never leave this corset of steel. They have been selectively bred to be bigger than their frames can support. Yet we put them in cells so confined that they cannot comfortably sit, and their attempts to do so (for example, by sneaking their limbs into adjacent stalls) reliably lead to fractures and sprains. They cannot sweat, yet have nothing to roll around in to cool themselves off. Except their own manure, which (contrary to the common misconception) they are so averse to (thanks to their strong sense of smell) that new sows will often suffer from constipation to avoid soiling the space from which they eat and sleep. Here is how the writer Matthew Scully described what saw at one of Smithfield’s “gestation barn”: > “Sores, tumors, ulcers, pus pockets, lesions, cysts, bruises, torn ears, swollen legs everywhere. Roaring, groaning, tail biting, fighting, and other “Vices,” as they’re called in the industry. Frenzied chewing on bars and chains, stereotypical “vacuum” chewing on nothing at all, stereotypical rooting and nest building with imaginary straw. And “social defeat,” lots of it, in every third or fourth stall some completely broken being you know is alive only because she blinks and stares up at you … creatures beyond the power of pity to help or indifference to make more miserable, dead to the world except as heaps of flesh into which the [insemination] rod may be stuck once more and more flesh reproduced.” — The Save Our Bacon Act is trying to unroll the few state protections we have against this barbaric cruelty - for example California’s Prop 12 - which banned the sale of pork from pigs kept in gestation crates. It’s incredibly important we don’t end up with this sort of federal preemption. SOB will not only kill the most important animal welfare related laws in the US of the past decade, but more importantly, it will also restrict ALL future legislative progress (aka how the animal welfare movement has gotten its biggest wins). The Senate is currently deciding whether to add the SOB Act to the Farm Bill. With relatively little money now, we can discourage the most pivotal senators in the Ag committee from backing this amendment. Defeating this bill is even more important given the amount of philanthropic funding I expect to come online in the next year or two. It will plausibly be over 10x more expensive to repeal SOB than to prevent it from passing in the first place. All that money that could be spent transforming our society's relationship to mass animal suffering will instead have to be spent just getting us back to where we are right now. That's why money spent now fighting this bill (and I mean right NOW) is so effective. If you’re in a position to donate six figures, please DM me.
7
81
16,121
GRRM started drafting Winds of Winter more than a year before AlexNet and we might get AGI before it's completed.
17
26
294
30,046
Celestia retweeted
5
317
1,883
17,451
Celestia retweeted
Our highest and most urgent national priority should be AI safeguards. The risks of AI weapons, pathogens, mass unemployment, surveillance, and even extinction must not continue to be largely ignored.
Anthropic Urges Global Pause in AI Development, Flags ‘Self-Improvement’ Risk on.wsj.com/4o5IBpe
490
780
4,446
1,028,257