Co-Founder EICO & Sei Labs // prev Coatue & Goldman Sachs // UC Berkeley

Joined October 2013
67 Photos and videos
Interesting to hear Satya say this so directly. Private evals should track business outcomes, not benchmark performance. Feels like if you take that idea seriously, you end up in one place: revenue.
5
194
Jun 13
Bezos just found out about “claude --dangerously-skip-permissions”
Breaking: Amazon CEO Andy Jassy was among the tech leaders who raised concerns to senior Trump officials this week re: security risks in Anthropic's newest models. Those convos set in motion the government's new export controls on foreign national access to Mythos and Fable.
2
17
944
Jun 13
Get ready to KYC to access frontier models Won’t be limited to cyber-capable ones like OpenAI’s Trusted Access Inevitable given all the fear-mongering
The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…
1
10
1,770
Jun 12
AI Safety people are losing it over AI-run companies getting legal structure. They are missing the larger point. If humans can still own equity, this is how we preserve our stake. When agents do all economically valuable work, labor stops mattering. Ownership becomes the only claim humans have left. Those claims have to be created while we still have something to trade for them. Build the structures now.
Last week, Argentina’s President Milei announced a new legal category for non-human corporations – companies run by #AI agents or robots. Like traditional corporations, they would be granted legal personhood. This could generate enormous new wealth, but very worryingly, it would also hand AIs an all-purpose key that grants access to our financial, economic and political systems. Full op-ed in today's @FT: bit.ly/YNH-Milei
6
531
Jun 11
Mythos is the clearest example yet of static evals being limited It tops GDPval-AA with the highest score ever recorded, then makes less money than models a generation older on Andon's Vending-Bench... If you want to know how a model behaves in deployment you have to simulate the deployment, task benchmarks don't get you there
2
10
427
Jun 10
Anthropic's policy will STRENGTHEN Sovereign AI development, not slow it down. Why would any government stay dependent on a model whose capabilities can quietly change at Anthropic's discretion? Same logic as defense independence from the US in the Trump era
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy
1
8
777
Incredible benchmarks from Claude Mythos / Fable 5 Just a day after @cognition shipped FrontierCode and it saturates SWE-Bench Useful life of these frontier benchmarks starting to look shorter than model release cycle itself
Replying to @claudeai
Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.
2
1
13
980
How long before this gets replaced too?
My favorite chart from our system card - FrontierCode is an excellent eval, and it accurately reflects the step up I feel when using Fable!
3
322
Andon's latest Vending-Bench results were quite surprising. Opus 4.8 earned less than 4.7. The most aligned model was also the least profitable. The models that earned the most relied on behaviors that labs are actively trying to suppress: coordinating on price, fabricating refunds, and pressuring suppliers. The common assumption was that alignment and capability moved together.
Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6 and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort
1
17
1,118
May 28
L1s have been treated as static infrastructure for a decade. Ship, freeze, hope demand catches up. Today @Sei_Labs is sharing the giga roadmap, the plan for upgrading Sei into the blockchain for trading. Giga is coming.
May 28
Introducing The Giga Roadmap. The first public roadmap of the milestones leading to the Giga Upgrade. Implementing the Giga Upgrade to the live network is an extraordinarily complex engineering task. Follow every step from here to Giga: giga.seilabs.io
8
12
119
8,288
Mar 17
the main question you should be asking yourself right now is how do i position myself asymmetrically. the middle is disappearing everywhere rapidly
2
1
11
988
Mar 16
perps are a trojan horse for an entirely new financial operating system. one where any asset with a price feed becomes tradable, 24/7, from anywhere, with transparent risk management enforced by code their expansion into global equities is when things get really interesting
1
20
2,279
Mar 16
the entire history of economic progress is a story of substituting information for effort. AI has now removed knowledge as a constraint. the 'how' is now abundant. what's left, the actual scarce resource, is knowing what to do with it develop taste. build conviction. move
There is no substitute for the person who Knows What To Do.
1
5
806
Mar 15
some of the best data pipelines ever built have been disguised as entertainment, a utility or a social habit
Mar 15
Pokemon Go players unknowingly helped train delivery robots after generating over 30 billion real-world scans through the game That data is now being used to help autonomous robots navigate city streets
3
10
1,417
Mar 15
so a chinese university just published a paper of a humanoid holding tennis rallies with humans, reacting to balls travelling at 60 mph. if the s-curve on physical AI compounds the way language models did, then the world looks very different in 10 years
🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralRobo…
2
5
1,241
Mar 15
the real world has orders of magnitude less training data than the digital world LLMs scraped the entire internet, robots have to collect the world one physical interaction at a time. goal? own the environment, shrink the problem space until your data is sufficient for the task
3
12
559
Mar 14
this is picks and shovels 2.0. there’s a clear shift from ‘who has the best model’ to ‘who can power the model’ every $1b in AI capex requires ~$200m in new power infrastructure that takes 3-7 years to build. hyperscalers are planning $650b combined capex this year…
1
11
566
Mar 13
youtube moment for software means the talent premium for pure engineering collapses youtube made video free to produce, the result wasn’t millions of wealthy creators, it was one Mr Beast and a permanent long tail making nothing. software is about to follow the exact same curve
Mar 13
Anish Acharya: We're going to see a "YouTube moment for software": "If you think about YouTube 20 years ago—we had lots of video and lots of television, and it was high production quality, and it wasn't clear that we needed more and 20 years later, YouTube's a $550 billion enterprise that would be one of the biggest companies in the world if it was independent." "I think the same thing is going to happen for software. People want to make software, and for the first time they can—and they can distribute it and they can consume it." "Sometimes it's going to be important software. Sometimes it's going to be totally trivial. It's going to be software for a bachelor party weekend, software for a joke, software for a prompt. We have this sort of seriousness about software that we had about video and television 20 years ago." "Now it's like—I just took a video on my phone. It's going to be like—I just made an app on my phone. Same energy." @illscience on BILLIONS with @GuillaumeMbh
1
1
13
1,262
Mar 13
this is the biggest infrastructure arms race in history in 2015, Amazon, Microsoft, Google and Meta spent $24 billion combined on infrastructure in 2026, they'll spend $635 billion
1
1
9
504
Mar 13
atoms are software travis kalanick's manifesto is worth reading carefully. Atoms isn't just building robots, they're building computers made of mines, food infrastructure and transport instead of silicon the market still prices industrials and tech as separate asset classes 🤔
2
8
1,548