What is the best data for training humanoid & robotics foundation models?
Pete Florence
@peteflorence (CEO
@Generalist, ex-Google DeepMind) dropped his live data tier list in this 7-minute clip on
@tbpn:
- S-tier: Real-world robot experience (especially glove/sensor high-dexterity data)
- A/B-tier: Internet/YouTube videos. Surprisingly powerful for transfer learning (the “web data” moment for physical AI)
- B-tier: Text/common crawl (Reddit, books, etc.). Useful priors, but not enough alone
- C-tier: Motion Capture. Great for whole-body motion, weak on finger dexterity
- C or lower: Simulation / synthetic / world models. High potential, still waiting for strong real-world proof
Generalist has collected 270,000 hours of real-world manipulation data (scaling ~10k hours/week). And Pete stressed one key point:
“The quality of data is incredibly important.”
It’s not just about volume. It’s developing intuition for what actually drives performance through hands-on work.
As Physical AI scales, curated real-world and high-quality internet video looks like a winning combo.
h/t
@yuji_fujima