🐾
@saturdayrobotic Robotics & World Models Reading Club 10 Recap: From Platform → Instincts → Real-World Learning: Roadmap to 🐱Cat-Level Humanoid Intelligence
Keynote: Bringing Robots to Life — Learning Humanoid Instincts from the Body Up, by
@HaochenShi74 (
@Stanford PhD, adv. Karen Liu & Shuran Song), presents a full-stack humanoid loco-manipulation program spanning hardware, learning, and real-world deployment. Hosts
@junfanzhu98,
@aurorafeng_01.
🤖 Stage 0: ToddlerBot Platform (ML-compatible embodiment)
Open-source humanoid designed for learnability, not just capability. 30 DoF full-body design (arms/legs/torso/head), dual grippers, 2× fisheye cams, IMU, mic/speaker, Jetson Orin NX, 2–5h battery. Spur/bevel/linkage transmissions. Core idea: hardware is sufficient; bottleneck is learning. Key enablers: exact URDF digital twin, zero-point calibration, motor system ID capturing friction/backlash/controller response, and full actuation model (torque–velocity limits). Teleop via joystick VR. Sim2real depends on physics-accurate sysID, not kinematics alone.
🧠 Stage 1: Instincts (survival layer)
Locomotion: keyframes → RL w/ domain randomization → vision skill planner (depth IMU) 3.1Hz. Policy: 3-layer MLP 50Hz (low natural freq).
Motor Current-based Compliance (MCC): no force sensors. External wrench inferred from motor current/voltage Jacobians motor model → spring-damper correction. Works across whole body, any contact. Diffusion policy (200 demos, ~80% success), OCHS servoing (21° vs 2–3°), LEAP hand VLM skills, heart-drawing ablation shows wrench estimator key. Generalizes across robots (Unitree G1 etc.), framing compliance as embodiment-level safety primitive.
Energy autonomy: self-charging docking for continuous operation.
🌍 Stage 2: Real-world learning (RTR)
Robot Trains Robot replaces humans with a robot-arm teacher: reward via F/T sensing, XY compliant support, Z-axis curriculum withdrawal, perturbation, failure detection, and auto-reset. Enables safe real-world RL without humans. Key method: latent dynamics gap z optimized from real rollouts FiLM-conditioned actor/critic. Demonstrated on walking & swing-up from scratch.
📊 Key insight: highest-value data = real exploration (expensive, scarce). Must survive to collect it → instincts are prerequisite for data flywheel.
🧩 Big picture
Platform = learnable embodiment (URDF sysID teleop)
Instincts = survival (locomotion MCC compliance autonomy)
RTR = self-scaling real-world data engine
⚠️ Reality gap: sim2real still hard for manipulation; locomotion works better. No established robotics scaling law; foundation model form remains unclear.
🐱 Summary: Simulation starts robots. Instincts keep them alive. Real-world experience makes them intelligent.