Joined November 2009
219 Photos and videos
Elad Mallel ⌐◨-◨ retweeted
I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true: — As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable. — Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.) — A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused. — In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.” — In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety. — In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community. — The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority. — Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.
2,205
3,246
25,481
7,810,810
Elad Mallel ⌐◨-◨ retweeted
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
12,632
25,784
88,239
90,616,714
Elad Mallel ⌐◨-◨ retweeted
Prepare for takeoff. ✈️ Flight simulator is now available globally on web to all users. goo.gle/4fBYnWO We've recently added many our most powerful professional desktop features to web. Elevation profiles, new import types, but there's always been one other feature you've been asking us to add to the web version of Google Earth, just for fun... Where will you fly? Share your best maneuvers, views, and flyovers with us!
464
4,264
31,865
9,553,117
Elad Mallel ⌐◨-◨ retweeted
Happy to see @Irregular's benchmark used by @AnthropicAI to test 𝗖𝗹𝗮𝘂𝗱𝗲 𝗠𝘆𝘁𝗵𝗼𝘀 5 𝗮𝗻𝗱 𝗖𝗹𝗮𝘂𝗱𝗲 𝗙𝗮𝗯𝗹𝗲 5! Offensive capabilities are moving fast, and CyScenarioBench is one of the few benchmarks still keeping pace and providing meaningful signal. Unlike traditional tests that check isolated skills, CyScenarioBench evaluates whether an AI can plan and execute full, multi-stage attack scenarios in realistic environments - much closer to how real-world threats actually unfold. 𝗖𝗼𝗺𝗶𝗻𝗴 𝘀𝗼𝗼𝗻 𝗳𝗿𝗼𝗺 𝗼𝘂𝗿 𝘁𝗲𝗮𝗺 - 𝘁𝗵𝗲 𝗻𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 𝗼𝗳 𝗰𝘆𝗯𝗲𝗿 𝗰𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝘆 𝗮𝘀𝘀𝗲𝘀𝘀𝗺𝗲𝗻𝘁𝘀. More details soon. Congratulations to Anthropic on the new model release!
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
1
3
11
326
Elad Mallel ⌐◨-◨ retweeted
Reminder for all young parents: You only get: - 1 Summer with your baby - 3 with your toddler - 9 with your child - 5 with your teenager This time is precious. Don’t rush it.
158
2,502
28,696
927,954
Elad Mallel ⌐◨-◨ retweeted
“Psilocybin, the psychedelic component of magic mushrooms, has previously been touted as an effective treatment for depression, anxiety, addiction, and PTSD — but now researchers say it has the potential to be used in Alzheimer’s intervention as well. In this case study, published in Frontiers in Neuroscience, researchers focused on an 80-year-old Japanese American woman with Alzheimer’s. She had declined over the previous decade and was reduced to urinary incontinence, speaking in single syllables, and dependence on caregivers for mobility support and daily living. She was then given a 5g dose of magic mushrooms. During the initial phase, she was agitated, sweated profusely and entered a prolonged sleep state that suggested unconsciousness. But around hour 19, she began speaking in full autobiographical sentences, recalling life events she had been unable to articulate for years. In the days and weeks that followed, more incredible changes emerged. She regained urinary continence, even in the evenings, and began dressing herself. She was able to make and maintain eye contact, remember social interactions, emotionally respond to others, and hold lucid conversations.”
192
1,384
9,517
1,095,877
Elad Mallel ⌐◨-◨ retweeted
This has quietly been a miracle month in medicine. In the last 5 weeks we’ve got news on: - retatrutide, the triple agonist GLP-1 from Lilly, basically melting fat and body-wide inflammation at record levels - RevMed’s new pancreatic cancer drug showing unprecedented abilities to extend life - small trial of a one-and-done PCSK9 gene editing therapy for slashing LDL cholesterol - Mayo’s AI-assisted radiology showing vastly improved cancer detection - this new therapy for metastatic solid tumors This stuff is at varying levels of evidence. Retatrutide is ~100% on its way, other stuff needs more clinical trial data. But put it together and we’re maybe on the verge of majorly reducing the mortality of heart disease and cancer, the two leading causes of death in America.
This is actually insane. 97% of people taking the standard of care for metastatic solid tumor got worse by seven years. But with lorlatinib, that number was only 45% in the same time! This is an ENORMOUS jump in the quality of cancer care.
211
1,913
11,654
2,784,832
Elad Mallel ⌐◨-◨ retweeted
PSA: I now consider *all* of DeFi unsafe. Coding agents are superhuman at finding vulnerabilities, and smart contract security is too asymmetric: defenders need to fix every bug while attackers need just one exploit to steal funds.
317
215
1,830
918,202
Elad Mallel ⌐◨-◨ retweeted
It’s never been easier to design your dream house. Draw a shape. Define your rooms. Set your constraints. @DraftedAI generates complete floor plans, elevations, and 3D home designs in seconds. Over the last month, 120,000 people generated 325,000 home designs with Drafted.ai.
189
339
4,443
742,257
Elad Mallel ⌐◨-◨ retweeted
I do this with codex all the time. Ask it to review code for bugs and it will tell you all good, tell it there is a bug and it will LOOP AND LOOP and will find issues.
💡Recent insight: gaslighting @claudeai seems to improve code quality >90% of the time. “You overengineered this, there is a simpler way” “There is a smaller delta that buys us most of the benefits” “There is a more elegant way” “This is not architecturally coherent” …before I even read its code. 😆
134
172
3,566
504,830
Elad Mallel ⌐◨-◨ retweeted
I've got an agent in a loop optimizing a renderer with the goal to minimize frame times (and tests to measure). It got times down from 88ms to 2ms and allocations down from ~150K to 500. Sounds good, right? Wrong. This is exactly why agent psychosis is a big fucking problem. As an experiment, I rewrote the Ghostty core render state in Go, with access to identically laid out data structures as Ghostty and the exact same validation tests. I made a purposely naive renderer (simple, correct, but slow). 88ms per frame with 150,000 allocations (horrendous, lol)! I then kickstarted a Ralph loop to bring the frame times down. I told it it can't modify input data structures or the public API or tests (they're correct), but it can do anything else it wants. It got to work. It has worked for about 4 hours. I've spent around $350 on this experiment so far. The results? 88ms => 1.5ms 150K allocs => ~500 allocs Incredible right? Nope. My hand-written renderer I ported has frame times (same benchmark) of ~20us (0.020ms) and 0 allocations in the update path. This is the problem with psychosis and lacking systems understanding. If you don't understand the system, you're going to accept that this is an incredible result. If you understand the system, you'll see better solutions immediately and can do roughly 75x better on throughput. The people who blindly trust agent output are in the former camp. They're sheeple, overdrinking from a fountain of mediocrity. Standard disclaimer: I use AI all the time. I like AI. The point I'm making is to not blindly accept results. Think. Analyze. Learn.
308
978
8,950
792,796
LLMs are really great and smart LLMs are also annoying and stupid often The jagged intelligence thing is real If you care about your code quality, only push code you own and understand well
14
Elad Mallel ⌐◨-◨ retweeted
We now have a female Bryan Johnson. It’s Kate Tolo. She will become the most measured female in history. $2 million of spend per year Developing a female-specific protocol Sharing everything for free To start, she will spend 3 months mapping her baseline. Men, in contrast, can get their baseline done in 1 or 2 weeks. 3 months for baseline measurement across 4 time points per cycle doing the same thing every day a dedicated full-time medical team For context on the extensiveness of measurement, during the past 5 years, we’ve collected 1.5 billion data points on my body. I suspect Kate will exceed that given technology has improved since I started. The goal is to create a repeatable waveform of hundreds of life-critical biomarkers. Once the baseline is acquired, she will begin interventions. We will try to answer practically useful questions and share all of the data learnings for free. Can fertility be improved? Should women cold plunge? Can PMS symptoms be alleviated? What should a female sauna protocol be? Should dosage change throughout the month? What keeps a cycle regular? Does the body need more iron, magnesium, or protein at specific phases? Should women fast? Should recovery protocol change by phase? What's the earliest detectable signal of perimenopause? Can perimenopause be slowed? How is cognitive load & mood affected? Does stress impact men and women the same? Kate has suspected endometriosis. 10% of all women do. We will try to tackle this too. I am excited for all of the surprising things we will hopefully uncover. Unlike me, Kate does not have the innate desire to wake up at 4:30am and do six hours of longevity therapies. She’s the cofounder of Blueprint, building in the trenches with me since day one. She understands the game and how hard it is. In many ways, this is a sacrifice for her. She is a creative person, going from a life of freedom and spontaneity to a rigid protocol. Traditionally, RCTs have been viewed as the gold standard. But RCTs have underserved women. The FDA banned women from clinical trials for 16 years (1977 to 1993), and most "medicine for women" is still medicine tested in men. Demanding RCT-only evidence for women's health is demanding evidence that doesn't exist. There is not enough practical scientific literature for women to reference only RCTs. It leaves half the population without a path to know what to do. N=1 medicine is gaining ground and picking up where RCTs specifically fail. Individual science experiments give us signals that answer what to do on a day-to-day basis. This is even more important for women. If you’re new to Kate and my world, I want you to understand that we have your back. Our intentions are to be a sturdy, reliable force in your life. To care for your best interest as we’d care for our own. We want what’s best for you and our loyalty is to your existence. It’s pretty cool to be living in a time when we may be the first generation to not die. I’m not suggesting immortality, but lifespans so long that we stop thinking about lifespans. At the end of the day, the one thing we each care about more than anything else is one more breath. I’m proud of Kate for taking on this responsibility. It’s painful, exhausting and costly. The beginning of the world’s first n=2.
762
661
18,172
5,073,226
Elad Mallel ⌐◨-◨ retweeted
May 19
SITUATION DETECTED: Demis Hassabis said at Google I/O that solving all disease is close, as Google unveiled Gemini for Science.
40
101
1,374
209,167
Elad Mallel ⌐◨-◨ retweeted
Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.
7,988
11,148
150,274
27,588,296
Elad Mallel ⌐◨-◨ retweeted
A Norwegian neuroscientist spent 20 years proving that the act of writing by hand changes the human brain in ways typing physically cannot, and almost nobody outside her field has read the paper. Her name is Audrey van der Meer. She runs a brain research lab in Trondheim, and the paper that closed the argument was published in 2024 in a journal called Frontiers in Psychology. The finding is brutal enough that it should have changed every classroom on Earth. The experiment was simple. She recruited 36 university students and put each one in a cap with 256 sensors pressed against their scalp to record brain activity. Words flashed on a screen one at a time. Sometimes the students wrote the word by hand on a touchscreen using a digital pen, and sometimes they typed the same word on a keyboard. Every neural response was recorded for the full five seconds the word stayed on screen. Then her team looked at the part of the data most researchers had ignored for years, which is how different parts of the brain were communicating with each other during the task. When the students wrote by hand, the brain lit up everywhere at once. The regions responsible for memory, sensory integration, and the encoding of new information were all firing together in a coordinated pattern that spread across the entire cortex. The whole network was awake and connected. When the same students typed the same word, that pattern collapsed almost completely. Most of the brain went quiet, and the connections between regions that had been alive seconds earlier were nowhere to be found on the EEG. Same word, same brain, same person, and two completely different neurological events. The reason turned out to be something nobody had really paid attention to before her work. Writing by hand is not one motion but a sequence of thousands of tiny micro-movements coordinated with your eyes in real time, where each letter is a different shape that requires the brain to solve a slightly different spatial problem. Your fingers, wrist, vision, and the parts of your brain that track position in space are all working together to produce one letter, then the next, then the next. Typing throws all of that away. Every key on a keyboard requires the exact same finger motion regardless of which letter you are pressing, which means the brain has almost nothing to integrate and almost no problem to solve. Van der Meer said it plainly in her interviews. Pressing the same key with the same finger over and over does not stimulate the brain in any meaningful way, and she pointed out something that should scare every parent who handed their kid an iPad. Children who learn to read and write on tablets often cannot tell letters like b and d apart, because they have never physically felt with their bodies what it takes to actually produce those letters on a page. A decade before her, two researchers at Princeton ran the same fight using a completely different method and ended up at the same answer. Pam Mueller and Daniel Oppenheimer tested 327 students across three experiments, where half took notes on laptops with the internet disabled and half took notes by hand, before testing everyone on what they actually understood from the lectures they had watched. The handwriting group won by a wide margin on every question that required real understanding rather than surface recall. The reason was hiding in the transcripts of what the two groups had actually written down. The laptop students typed almost word for word, capturing more total content but processing almost none of it as they went, while the handwriting students physically could not write fast enough to transcribe a lecture in real time, which forced them to listen carefully, decide what actually mattered, and put it in their own words on the page. That single act of choosing what to keep was the learning itself, and the keyboard had quietly skipped the choosing and skipped the learning along with it. Two studies. Two countries. Same answer. Handwriting makes the brain work. Typing lets it coast. Every note you have ever typed instead of written went into your brain through a thinner pipe. Every meeting, every book highlight, every idea you captured on your phone instead of on paper was processed at half depth. You did not forget those things because your memory is bad. You forgot them because typing never woke the part of the brain that would have made them stick. The fix is the thing your grandmother already knew. Pick up a pen. Write the thing down. The slower road is the faster one.
2,474
44,643
120,855
10,375,010
Elad Mallel ⌐◨-◨ retweeted
A lot of people have been wondering about Mythos, Glasswing, and the vulns we / our partners are fixing. Today, I’m excited for us to start sharing more. (For context, I lead Glasswing @AnthropicAI.) Two independent evaluations this week—from XBOW and the UK AISI—confirm what we've been seeing internally: Claude Mythos Preview is a step change in autonomous cybersecurity capabilities. We need to start preparing fast for a world of models with this level of capabilities. The UK AI Security Institute tested the model we shipped at the launch of Project Glasswing and found Mythos Preview is the first model to solve both of their end-to-end cyber ranges, including one (Cooling Tower) which no model had ever cleared. But attackers (and defenders) have sophistication & cost constraints – Mythos is also the only model that clears every one of their tasks estimated over 8 hours under their deliberately low 2.5M-token cap. XBOW tested it on their offensive security benchmarks, finding "token-for-token, unprecedented precision." It's the only model to succeed at subtle V8 sandbox work. Other Glasswing partners shared similar stories. In a few weeks of testing, Mythos Preview has helped them find many thousands of (estimated) high critical severity vulnerabilities, sometimes double what they'd normally find in a year. I don't share this to boost Mythos. In fact, this is not about Mythos. It’s about preparing for the coming world of models being better, faster, cheaper, and more creative than some of the best human experts at dual use capabilities. Clearly, we need them supporting defenders as widely as can be done safely – and especially the least resourced ones. Within a year, Mythos will probably look quite dumb (relative to other new models). And others may release openly available or unguardrailed models of Mythos-level capabilities. We started Project Glasswing because capabilities like Mythos Preview's won't stay rare, or stay in careful hands. We are bringing it to defenders as fast as we responsibly can, while working to figure out, for example, the right safeguards and patching & disclosure processes. Also, to be clear, compute has never been a limiter in our rollout. Expect a fuller update on our Glasswing work in the coming days. XBOW report: xbow.com/blog/mythos-offensi… UK AISI report: aisi.gov.uk/blog/how-fast-is…
Replying to @AISecurityInst
Our cyber range results illustrate this step-up. Since our first Mythos evaluation, we received access to a newer Mythos Preview checkpoint. On a 32-step corporate network attack we estimate takes a human expert ~20 hours, this checkpoint completes the full attack in 6 /10 attempts.
73
221
1,432
674,494
Elad Mallel ⌐◨-◨ retweeted
In 2023, Stanford professor Graham Weaver gave his last lecture on how to destroy fear & live a wildly ambitious life. His frameworks: - Suffering is inevitable - Signup for "10 years" test - "Not me" & "Not now" traps 13 lessons on how to build an asymmetric life:
22
749
6,005
803,905
Elad Mallel ⌐◨-◨ retweeted
May 11
Agent view is the best Claude Code native way to manage multiple sessions, kind of like tmux built for CC. We spent a lot of time getting the details right, I hope you enjoy it.
May 11
New in Claude Code: agent view. One list of all your sessions, available today as a research preview.
246
129
3,194
365,693
Elad Mallel ⌐◨-◨ retweeted
Fun interactive science app ideas | Part 3 Played around with generating 3D biological structures and made an app to explore them interactively UI Design GPT Images 2 Code Gemini 3.1 Pro More demos ↓
526
2,143
17,142
2,261,649