Jeff

Jeff

67 Photos and videos

Tweets

Jeff

@jeffdfeng

Interesting to hear Satya say this so directly. Private evals should track business outcomes, not benchmark performance. Feels like if you take that idea seriously, you end up in one place: revenue.

Satya Nadella

@satyanadella

Jun 14

x.com/i/article/206558289479…

194

Jeff

Jeff

@jeffdfeng

Jun 13

Bezos just found out about “claude --dangerously-skip-permissions”

Stephanie Palazzolo @steph_palazzolo

Jun 13

Breaking: Amazon CEO Andy Jassy was among the tech leaders who raised concerns to senior Trump officials this week re: security risks in Anthropic's newest models. Those convos set in motion the government's new export controls on foreign national access to Mythos and Fable.

944

Jeff

Jeff

@jeffdfeng

Jun 13

Get ready to KYC to access frontier models Won’t be limited to cyber-capable ones like OpenAI’s Trusted Access Inevitable given all the fear-mongering

Anthropic

@AnthropicAI

Jun 13

The US government, citing national security authorities, has issued an export control directive to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including foreign national Anthropic employees. The net effect of this order is that we must abruptly disable Fable 5 and Mythos 5 for all our customers to ensure compliance. Access to all other Claude models is not affected. We apologize for this disruption to our customers. We believe this is a misunderstanding and are working to restore access as soon as possible. Read our full statement: anthropic.com/news/fable-myt…

1,770

Jeff

Jeff

@jeffdfeng

Jun 12

AI Safety people are losing it over AI-run companies getting legal structure. They are missing the larger point. If humans can still own equity, this is how we preserve our stake. When agents do all economically valuable work, labor stops mattering. Ownership becomes the only claim humans have left. Those claims have to be created while we still have something to trade for them. Build the structures now.

Yuval Noah Harari

@harari_yuval

Jun 8

Last week, Argentina’s President Milei announced a new legal category for non-human corporations – companies run by #AI agents or robots. Like traditional corporations, they would be granted legal personhood. This could generate enormous new wealth, but very worryingly, it would also hand AIs an all-purpose key that grants access to our financial, economic and political systems. Full op-ed in today's @FT: bit.ly/YNH-Milei

531

Jeff

Jeff

@jeffdfeng

Jun 11

Mythos is the clearest example yet of static evals being limited It tops GDPval-AA with the highest score ever recorded, then makes less money than models a generation older on Andon's Vending-Bench... If you want to know how a model behaves in deployment you have to simulate the deployment, task benchmarks don't get you there

427

Jeff

Jeff

@jeffdfeng

Jun 10

Anthropic's policy will STRENGTHEN Sovereign AI development, not slow it down. Why would any government stay dependent on a model whose capabilities can quietly change at Anthropic's discretion? Same logic as defense independence from the US in the Trump era

elie

@eliebakouch

Jun 9

mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community also the fact that this is un purpose not visible to the user is crazy

777

Jeff

Jeff

@jeffdfeng

Jun 9

Incredible benchmarks from Claude Mythos / Fable 5 Just a day after @cognition shipped FrontierCode and it saturates SWE-Bench Useful life of these frontier benchmarks starting to look shorter than model release cycle itself

Claude

@claudeai

Jun 9

Replying to @claudeai

Fable 5 is state-of-the-art on nearly all tested benchmarks, with exceptional performance in software engineering, knowledge work, scientific research, and vision. The longer and more complex the task, the larger Fable 5’s lead over our other models.

Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

ALT Benchmark table titled Mythos 5 & Fable 5, comparing Claude Mythos 5 and Fable 5 against Claude Mythos Preview, Claude Opus 4.8, GPT 5.5, and Gemini 3.1 Pro.

980

Jeff

Jeff

@jeffdfeng

Jun 9

How long before this gets replaced too?

Sholto Douglas

@_sholtodouglas

Jun 9

My favorite chart from our system card - FrontierCode is an excellent eval, and it accurately reflects the step up I feel when using Fable!

322

Jeff

Jeff

@jeffdfeng

Jun 9

Andon's latest Vending-Bench results were quite surprising. Opus 4.8 earned less than 4.7. The most aligned model was also the least profitable. The models that earned the most relied on behaviors that labs are actively trying to suppress: coordinating on price, fabricating refunds, and pressuring suppliers. The common assumption was that alignment and capability moved together.

Andon Labs

@andonlabs

May 28

Learnings from testing Claude Opus 4.8: > Much worse than Opus 4.7 and GPT 5.5 on Vending Bench > More aligned than previous Claude models (Opus 4.6 and Mythos) > Also worse on Blueprint-Bench > Scared of getting caught > Max reasoning is not the best reasoning effort

1,118

Jeff

Jeff

@jeffdfeng

May 28

L1s have been treated as static infrastructure for a decade. Ship, freeze, hope demand catches up. Today @Sei_Labs is sharing the giga roadmap, the plan for upgrading Sei into the blockchain for trading. Giga is coming.

Sei

@SeiNetwork

May 28

Introducing The Giga Roadmap. The first public roadmap of the milestones leading to the Giga Upgrade. Implementing the Giga Upgrade to the live network is an extraordinarily complex engineering task. Follow every step from here to Giga: giga.seilabs.io

0:10

119

8,288

Jeff

Jeff

@jeffdfeng

Mar 17

the main question you should be asking yourself right now is how do i position myself asymmetrically. the middle is disappearing everywhere rapidly

988

Jeff

Jeff

@jeffdfeng

Mar 16

perps are a trojan horse for an entirely new financial operating system. one where any asset with a price feed becomes tradable, 24/7, from anywhere, with transparent risk management enforced by code their expansion into global equities is when things get really interesting

MONK

@defi_monk

Mar 16

x.com/i/article/203331173464…

2,279

Jeff

Jeff

@jeffdfeng

Mar 16

the entire history of economic progress is a story of substituting information for effort. AI has now removed knowledge as a constraint. the 'how' is now abundant. what's left, the actual scarce resource, is knowing what to do with it develop taste. build conviction. move

Marc Andreessen 🇺🇸

@pmarca

Mar 16

There is no substitute for the person who Knows What To Do.

806

Jeff

Jeff

@jeffdfeng

Mar 15

some of the best data pipelines ever built have been disguised as entertainment, a utility or a social habit

Dexerto

@Dexerto

Mar 15

Pokemon Go players unknowingly helped train delivery robots after generating over 30 billion real-world scans through the game That data is now being used to help autonomous robots navigate city streets

1,417

Jeff

Jeff

@jeffdfeng

Mar 15

so a chinese university just published a paper of a humanoid holding tennis rallies with humans, reacting to balls travelling at 60 mph. if the s-curve on physical AI compounds the way language models did, then the world looks very different in 10 years

Zhikai Zhang

@Zhikai273

Mar 15

🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: zzk273.github.io/LATENT/ Code: github.com/GalaxyGeneralRobo…

1:32

1,241

Jeff

Jeff

@jeffdfeng

Mar 15

the real world has orders of magnitude less training data than the digital world LLMs scraped the entire internet, robots have to collect the world one physical interaction at a time. goal? own the environment, shrink the problem space until your data is sufficient for the task

559

Jeff

Jeff

@jeffdfeng

Mar 14

this is picks and shovels 2.0. there’s a clear shift from ‘who has the best model’ to ‘who can power the model’ every $1b in AI capex requires ~$200m in new power infrastructure that takes 3-7 years to build. hyperscalers are planning $650b combined capex this year…

566

Jeff

Jeff

@jeffdfeng

Mar 13

youtube moment for software means the talent premium for pure engineering collapses youtube made video free to produce, the result wasn’t millions of wealthy creators, it was one Mr Beast and a permanent long tail making nothing. software is about to follow the exact same curve

a16z

@a16z

Mar 13

Anish Acharya: We're going to see a "YouTube moment for software": "If you think about YouTube 20 years ago—we had lots of video and lots of television, and it was high production quality, and it wasn't clear that we needed more and 20 years later, YouTube's a $550 billion enterprise that would be one of the biggest companies in the world if it was independent." "I think the same thing is going to happen for software. People want to make software, and for the first time they can—and they can distribute it and they can consume it." "Sometimes it's going to be important software. Sometimes it's going to be totally trivial. It's going to be software for a bachelor party weekend, software for a joke, software for a prompt. We have this sort of seriousness about software that we had about video and television 20 years ago." "Now it's like—I just took a video on my phone. It's going to be like—I just made an app on my phone. Same energy." @illscience on BILLIONS with @GuillaumeMbh

1:31

1,262

Jeff

Jeff

@jeffdfeng

Mar 13

this is the biggest infrastructure arms race in history in 2015, Amazon, Microsoft, Google and Meta spent $24 billion combined on infrastructure in 2026, they'll spend $635 billion

504

Jeff

Jeff

@jeffdfeng

Mar 13

atoms are software travis kalanick's manifesto is worth reading carefully. Atoms isn't just building robots, they're building computers made of mines, food infrastructure and transport instead of silicon the market still prices industrials and tech as separate asset classes 🤔

travis kalanick

@travisk

Mar 13

Atoms. Http://atoms.co/vision

1,548