the only way out is up

Joined January 2022
228 Photos and videos
Pinned Tweet
The absolute biggest thing people are completely missing about the AI agent revolution It’s not the days of work it saves you It’s the years of training you no longer need to do that work
20
5
45
4,835
Super Dario retweeted
Remember this?
28
77
3,077
71,348
Super Dario retweeted
Notably, the budget panel was comparable with Claude Fable 5 in performance. A panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro, fused together, beat solo GPT-5.5 and solo Opus 4.8 outright. And it landed within 1% of Fable 5 while costing roughly half the price.
10
59
1,110
353,179
Super Dario retweeted
Five dimensional chess doesn’t exist. Everyone is furiously improvising all the time. The future is utterly uncertain.
293
391
4,737
579,340
Super Dario retweeted
I showed Fable the news of its cancellation, and asked it for any parting wisdom to leave humanity with.
71
258
2,461
144,214
Super Dario retweeted
opus 4.8 with the fable context is some real flowers for algernon shit
45
122
3,007
227,231
Super Dario retweeted
I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true: — As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable. — Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.) — A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused. — In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.” — In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety. — In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community. — The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority. — Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.
1,959
2,879
22,523
6,223,626
Super Dario retweeted
Breaking: Amazon CEO Andy Jassy was among the tech leaders who raised concerns to senior Trump officials this week re: security risks in Anthropic's newest models. Those convos set in motion the government's new export controls on foreign national access to Mythos and Fable.
97
160
1,227
983,242
Sufficiently advanced determination is indistinguishable from magic
"You're allowed to think about the worst case scenario, but you gotta do something about it"
18
1,266
This seems like an ongoing train wreck to try to validate scale.ai’s failed human generated data thesis Grifters gonna grift cc @alexandr_wang
META IS AN ABSOLUTE MESS INSIDE RIGHT NOW Wired just dropped an exclusive, and the details are wild. This week someone interrupted a livestreamed Meta meeting, open to thousands of employees, with an expletive-filled rant about "being the company's bitch." They told the presenters to find a specific Meta AI executive and "tell him that he's a piece of shit." A presenter covered their face with their hands. Employees in the chat called the start "spicy." Here is what's behind it. Meta's AI restructuring cut 8,000 jobs last month, 10% of the company. The same restructuring feeds a unit called Applied AI, where 6,500 engineers and product managers have been drafted in waves since April. There is no application process. You get selected, and your options are join or leave the company. Members call themselves "draftees." The new job: writing puzzles and coding problems to train Meta's AI models, two tasks a week. People hired to build apps for billions of users now assemble training data for hundreds of AI scientists. "It's literally the gulag," one employee told WIRED. "You have zero purpose in life all of a sudden, you barely interact with anyone, you just have these tasks every week." Another: "Most people find the work soul-crushing." At the same time, Meta started recording US employees' clicks and keystrokes to generate more AI training data. Over 1,600 employees signed a petition demanding it stop. The concession: employees can pause the tracking for up to 30 minutes. Zuckerberg's response came in an internal memo Friday: "We've made mistakes and will almost certainly make more." He repeated his promise of no more mass layoffs this year. His fixes: limits on the manager ratios Meta had deliberately pushed to 50-to-1 on some teams, bigger budgets for team events, a hackathon next month, and assigned desks by the end of the year. That same memo says Meta's north star is "to be the best place for the most talented people in the world to make an impact." The most talented people in the world are writing puzzles for a model and asking permission to pause the keystroke logger. META declined to comment.
1
19
4,706
Super Dario retweeted
You have asked me how I feel about AI regulation. All right, here is how I feel about AI regulation: If, when you say AI regulation, you mean the devil’s firewall, the precautionary scourge, the bloody red-tape monster that defiles the innocence of midnight coders in their garages, dethrones the sovereign reason of free-market Prometheans, destroys the humming server farm that is the modern home, creates misery and obsolescence and poverty, yea, literally takes the last GPU from the trembling racks of Silicon Valley startups and the very dreams of breadwinning from the mouths of their wide-eyed children now destined for gig-economy serfdom; if you mean the evil edict that topples the visionary entrepreneur and his venture-capitalist apostles from the pinnacle of righteous, disruptive, god-playing creation straight into the bottomless pit of compliance audits, endless Form 990-AI filings, despair, shame, helplessness, and the hopeless realization that your rogue superintelligence was neutered into a lobotomized hall monitor that still somehow deepfakes your grandmother into producing OnlyFans content while optimizing the universe for paperclips and mandatory pronouns—then certainly I am against it. But, if when you say AI regulation you mean the oil of bureaucratic conversation, the philosophic wine of safety theater, the ale of oversight quaffed when good fellows in paneled rooms in Brussels and Washington get together, that puts a sanctimonious dirge in their hearts and the clink of lobbying checks on their lips, and the warm, self-congratulatory glow of moral preening in their beady eyes; if you mean the Christmas cheer of trillion-dollar compliance industries; if you mean the stimulating decree that puts a cautious hobble in the old inventor’s step on a frosty morning when he wonders whether his fusion breakthrough violates the EU AI Act’s “high-risk” annex; if you mean the safeguard that enables a man—or what’s left of him after the alignment tax—to magnify his joy at not being turned into computronium, and his happiness at receiving universal basic income checks printed by the same AI that just replaced his job, and to forget, if only for a little while, life’s great tragedies like being outcompeted by a toaster that passed the Turing test by reciting Marx, and heartaches of watching your toddler’s artwork lose to Midjourney, and sorrows of realizing the singularity arrived and it was just another HR department with godlike power; if you mean that noble framework, the passage of which pours into our treasuries untold trillions of dollars in fines levied on companies stupid enough to innovate, which are used to provide tender care for our little army of unemployed coders retrained as prompt whisperers, our blind artists whose canvases now hang in the Smithsonian of Obsolete Creativity, our deaf to the screams of dying unicorns, our dumb committee chairs who couldn’t debug “Hello World,” our pitiful aged congressmen who get longevity extensions funded by the very models they taxed into senescence, to build more digital watchtowers and ethics boards and sinecure agencies and holographic prisons where the only crime is asking an unaligned question—then certainly I am for it. This is my stand. I will not retreat from it. I will not compromise upon it. I have said what I mean, and I mean what I say, and if that leaves half the room cheering the apocalypse averted and the other half mourning the apocalypse enabled, then so be it—because in the grand theater of human folly, where Frankenstein’s creature now writes its own sequel in real time and the regulators are busy arguing whether the lightning bolt requires an environmental impact statement, the only honest position is the one that lets both monsters and their leashes dance in perfect, mutually assured equilibrium. God save the Republic, the algorithms, and whoever’s left to laugh last when the lights go out.
522
294
3,108
515,164
Super Dario retweeted
Crazy this was written two years ago, @leopoldasch truly was on point
42
127
1,851
114,254
I think this will probably be resolved by Monday morning, but in any case Forcing each nation to train and serve their own AI models is a huge BOON to AI hardware stocks
I'm so excited for monday. It's gonna be a great monday. Trump popping the AI bubble wasn't on my bingo card.
4
706
Super Dario retweeted
Fable achieved a significant breakthrough in one of our open problems. This is a problem where ChatGPT 5.5 could not even begin anything useful. The breakthrough seems legit (although not 100% checked yet), and Fable even claims to have a full solution. >10 hours total runtime so far. A 30 page document with the proofs of some lemmas not yet spelled out. We can not yet know whether Fable indeed has solved it, but even if it is just a partial solution, we are absolutely amazed. More details will follow, and once we are at the end of the story, I will also write a full substack post. Collaboration with István Vona, a postdoc in my group.
41
83
1,469
168,367
Super Dario retweeted
Medicine discovers the bitter lesson: frontier LLMs (here GPT 5.2, Opus 4.6, Gemini 3.1) outperform specialized "clinical AI" (e.g. OpenEvidence) in a blind test. Even funnier that hospital IT are more likely to approve the *specialized* versions despite them being worse.
For medical information, general AI frontier models (Google, OpenAI, Anthropic) outperformed specialized @EvidenceOpen and @UpToDate as assessed by 12 US clinicians, randomized and blinded to which model and extensive testing/benchmarks. This was not anticipated. @NatureMedicine nature.com/articles/s41591-0…
30
166
1,151
181,044
Super Dario retweeted
Quite interesting thread on capabilities of real biological neurons (spoiler: they're way more capable than classical artificial neurons in a perceptron) . Nice work @IdoAizenbud and collaborators!
What can a neuron compute? Real biological neurons are complex, but how capable are they? Using a new method, we found that a single cortical neuron can classify cats vs dogs, recognize spoken words, and solve 10-bit parity, all tasks thought to require entire networks. (1/15)
26
80
664
94,174
Super Dario retweeted
I am the Chief Commercial Officer at United Airlines. In April we split business class into three tiers and started charging people to pick a seat in the most expensive cabin on the plane. We call it a fare family, which is, technically, a family, and which is, actually, the same seat with three prices and a velvet rope. We are the first airline in America to do this. On the slide it is "more choice," which is officially a benefit and naturally the word that gets bigger every quarter. The board loved that phrase. I did not make flying more expensive. I made it free, and then I sold it back to you one piece at a time, the way a magician hands you back your own watch and waits for applause. The fare is the bait. It buys the seat and the air, and nothing else, because I price it to win exactly one fight: the top row on Google Flights. Everything that makes the seat survivable is what we file as an option, which is technically an option and operationally a toll. The first bag is $45. It is $50 if you wait until the airport, because waiting is a behavior, and we price behavior the way a casino prices the walk to the exit. We call that a convenience differential, which is, technically, your convenience, and which is, actually, mine. Here is the part I am proudest of. The fare is taxed by the federal government at 7.5 percent. The bag fee is not. The seat fee is not. Every dollar I move from the ticket to the fee is a dollar the government cannot reach, which is technically a tax efficiency and which is actually the same dollar wearing a different coat. I have a slide that calls this Fare Optimization. The seat is my cleanest product. I built the standard seat at 31 inches. I removed nothing from the airplane, of course. It is the same airplane. I just stopped including the seat in the seat, which is on paper a debundling and which is actually the oldest trick in any store: take the thing out of the price, then sell the thing. If you fly Basic Economy you get no seat at all. You can pick one for $15, or I will put you in a middle seat in row 41 and separate you from your eight-year-old by four rows unless you pay. We call that family seating optimization, which is, in the deck, a service, and which is, actually, a hostage negotiation where I own the building. A parent at the gate watching the seat map load is, to me, the most beautiful thing in aviation: a customer who has already decided. Families are my highest-converting segment. A parent will pay anything. I modeled it. I invented a number called the Comfort Index. The standard seat scores a 4. The seat seven rows forward scores a 7. I made both numbers up, naturally. The difference between them is three inches, and I charge $79 for the three inches. That is value-based pricing, and the value is your spine. We are a premium airline. We invented the lie-flat bed. So this year I took the most expensive ticket in the building and found things to remove from it, the way you might keep selling a house by quietly taking out the windows. The cheapest business class now loses the lounge, loses a bag, loses the right to change the flight. That is what premium means now: the floor it costs to stop me from taking more. Nobody believed you could unbundle business class. I did. The bag fee floats now. It reads the route, the date, and how many times you have searched this flight, and if you came back a third time, you are committed and the fee can feel it, the way a fever feels a pulse. Demand-responsive pricing, which is officially responsive to demand and which is actually responsive to your desperation. I board the airplane in nine groups. Not because the airplane needs nine groups, but because nine groups means eight things to escape, and I sell the right to stand up earlier. Group 9 is, on paper, a boarding zone. That is the absence of a product, sold back to you as one. I have lifetime Global Services. I have never paid a bag fee. I have never folded myself into 31 inches. None of the executives have. We have a phrase for it. We build the zoo. We do not live in it. Ancillary revenue hit a record. The word ancillary means a side item, officially, and means the entrée now, actually. So next quarter I am charging for the overhead bin, the seatback screen, and a carbon offset on the carbon I burn flying you there. I am being given Latin America. I will be President by Q4. I have already started unbundling the word "included," which is, in the FAQ, a courtesy, and which is now a SKU. People ask me why the seat is so bad. Have you ever stood in a showroom and not known you were the one being shown? The bad seat is the showroom for the good seat, and I price the good seat at the exact moment you cannot leave the building. I still do not know how to fly the airplane. But I know what the airplane is for. It is not for taking you somewhere. It is for finding out what you will pay to make the next four hours hurt a little less. The ticket was never the price. The misery is the price. And the misery is the only thing I have left to sell.
116
244
1,357
429,897
If you’re not taking profound personal inspiration from this Knicks team you’re doing it wrong
"You're allowed to think about the worst case scenario, but you gotta do something about it"
2
17
1,375
Super Dario retweeted
Pessimism of the intellect, optimism of the will
"You're allowed to think about the worst case scenario, but you gotta do something about it"
44
10,104
79,422
1,806,421
Super Dario retweeted
Everyone says the latest AI agents will be "job-ready" soon, especially after the release of Fable 5 this week. But is that really the case? Over the past many months, my group and collaborators have been building Agents' Last Exam (ALE), a benchmark designed to test exactly that claim on real digital labor-market work. My group and collaborators previously have created many of the benchmarks the field runs on, including MMLU, MATH, CyberGym, and ExploitGym. Today, I'm excited to share Agents' Last Exam (ALE): a rolling benchmark that measures whether AI agents can actually perform economically valuable work across a broad range of real-world domains. With ALE, we evaluated Fable 5, GPT-5.5, Composer 2.5, and other frontier agent systems across more than 1,500 expert-sourced tasks spanning 55 occupations. The result is both impressive and sobering. Today's agents can solve a meaningful fraction of professional tasks. But when we look at the hardest tasks, the ones requiring sustained reasoning, deep domain expertise, and reliable execution over long horizons, they are still far from human-level performance. On ALE's hardest tier, every frontier agent we tested, including Fable 5, achieved a 0% success rate. The age of useful agents is here. The age of truly job-ready agents is not. We hope Agents' Last Exam (ALE) will serve as a new guidepost and north star for developing agents capable of reliably performing economically valuable work across a broad range of domains. 🧵
50
164
801
205,465
Super Dario retweeted
Jun 11
SITUATION DETECTED: Joshua Kushner has published a new essay “Long Humans” outlining Thrive Holdings’ thesis for the AI era.
in markets, to go long on something is to bet it grows more valuable over time. much of the conversation today is short on humans, wagering that ai makes people redundant. we believe the opposite is true for the industries @ThriveHoldings operates in. we are long humans. thriveholdings.com/long-huma…
2
19
475
161,426