3️⃣ Internet / Huan Video
Internet and human-centric videos are the most abundant and lowest-cost raw materials available.
The Pros: Scale. It helps foundation models build basic physical cognition—understanding how the world works, spatial reasoning, and human intent.
The Cons: It lacks force, torque, and tactile feedback. A video shows the result of an action, but not the exact motor signals needed to execute it. The AI knows "what" to do, but not "how" to move its joints to do it.
The Trend: Silicon Valley pioneers like Physical Intelligence, Figure AI, and Sunday Robotics are aggressively pivoting here. By combining Reinforcement Learning with crowdsourced, ego-centric (first-person) video collection, they aim to bypass heavy teleoperation. Projects like Apple’s EgoDex and NVIDIA’s EgoScale are exactly about this: extracting high-signal, usable action data from massive, low-cost human videos.