Amazing talk by
@DrJimFan -- exciting times for Physical AGI!
TL;DR
VLA architecture is parameter-misallocated toward language and should be replaced by World Action Models
→ pretrained video diffusion models that jointly predict future world states and robot actions, instantiated by Dream Zero (a 14B model running real-time control at 7Hz with 2× generalization gains over VLAs,
@SeonghyeonYe ).
arxiv.org/pdf/2602.15922
His central data claim is that egocentric human video is the FSD-equivalent ambient data flywheel for robotics, and EgoScale (
@ruijie_zheng12) demonstrates a near-perfect log-linear scaling law (ℒ(N) = a − b log N, R² = 0.9983) between 1K and 20K hours of pretraining data and downstream dexterity performance.
arxiv.org/pdf/2602.16710
His central environment claim is that classical physics simulators will be replaced by neural simulators, and Dream Dojo (
@ShenyuanGao) demonstrates this with 44K hours of human video pretraining, 10 FPS real-time interaction, and Pearson r = 0.995 policy-evaluation fidelity.
arxiv.org/pdf/2602.06949
Significant gaps in the talk:
- it does not address runtime semantics (skill installation, behavior consistency, run-update separation)
- it does not address the model-exploitation failure mode of training policies against learned simulators or learned rewards.
My running notes w/ Opus 4.7 👇
docs.google.com/document/d/e…
@NVIDIAAI
I promise this will be the best 20 min you spend today! Robotics: Endgame, the sequel to my last year's Sequoia AI Ascent talk, "Physical Turing Test". I laid out the roadmap for solving Physical AGI as a simple parallel to the LLM success story. Be a good scientist, copy homework ;)
And stay till the end, more easter eggs and predictions for your polymarket!
00:30 DGX-1 origin story at OpenAI, I was there in 2016 signing with Jensen and Elon. Heading to the Computer History Museum!
01:42 The Great Parallel
03:31 Robotics, the Endgame
03:39 Why VLAs fall short
04:32 Video world models as the 2nd pretraining paradigm
06:09 World Action Models (WAM)
07:46 Strategies for robot data collection and the FSD equivalent to physical data flywheel for robot manipulation
11:06 EgoScale and the Dexterity Scaling Law we discovered recently
14:00 Physical RL: bridging the last mile
15:39 DreamDojo: an end-to-end neural physics engine for scaling RL in silico
17:00 Civilizational Technology Tree and my predictions for the near future. Spoiler: it's closer than you think.
Thanks to my friends at Sequoia for inviting me back to AI Ascent this year! I had a blast! Last year's talk is attached in the thread if you missed it.