Yuke Zhu

Yuke Zhu

68 Photos and videos

Tweets

Yuke Zhu @yukez

Jun 1

Exciting news on GR00T: NVIDIA announces our first open humanoid robot platform, featuring Unitree H2 Plus and Sharpa hands, to accelerate academic research and facilitate cross-institutional collaboration. R&D in humanoid robotics needs broader participation. Open science is how we build the future faster, together.

NVIDIA Robotics

@NVIDIARobotics

Jun 1

NVIDIA announces the first open humanoid robot reference design built for robotics research. The NVIDIA Isaac GR00T Reference Humanoid Robot combines the @UnitreeRobotics H2 humanoid robot, @SharpaRobotics Wave five-fingered hands for dexterous manipulation, Jetson Thor onboard compute, and Isaac GR00T open software and models, giving researchers a full-stack platform from data capture to model deployment. Read the #NVIDIAGTC Taipei announcement: nvda.ws/4ef9VOr

0:20

109

14,606

Yuke Zhu

Yuke Zhu @yukez

May 19

I will be in Vienna in two weeks to give a keynote at #ICRA2026. I'll share our recent progress on building generalist humanoid robots and show some of the latest results. Check out my talk on June 3: 2026.ieee-icra.org/program/k…

ALT https://2026.ieee-icra.org/program/keynote-sessions/

101

5,506

Fernando Castañeda

Yuke Zhu retweeted

Fernando Castañeda @FerCastanedaGR

May 7

Now you can use GR00T N1.7 and SONIC together to enable tasks that require TRUE whole-body coordination!! Including simultaneous precise hand and foot placement, like opening a trash can with the foot pedal and throwing an object inside! Try it yourself, it is so fun!

Zhengyi “Zen” Luo

@zhengyiluo

May 7

Open-sourcing the whole package here! The last piece of our SONIC open-source, data collection, gr00t VLA post-training, inference just hit the repo! Train your Autonomous policies on G1 Whole-body with SONIC and gr00t N1.7! 🧑‍💻Code: github.com/NVlabs/GR00T-Whol… 📑Docs: nvlabs.github.io/GR00T-Whole…

0:39

9,049

Tairan He

Yuke Zhu retweeted

Tairan He

@TairanHe99

Apr 30

GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-Visu…

0:16

Tairan He

@TairanHe99

20 Nov 2025

Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io/ Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵

0:25

618

116,823

tingwu.wang

Yuke Zhu retweeted

tingwu.wang

@TingwuWang

Apr 29

What is missing to bring real-time motion research into AAA games and real-world robotics? We present MotionBricks, a step toward bridging this gap with two key components: - a single generative latent motion backbone covering 350,000 motion skills, running at 15,000 FPS with 2 ms latency and substantially improved quality and reliability. - a unified smart primitive interface for locomotion, object / scene interaction, with fine-grained control over generated behaviors. Webpage: nvlabs.github.io/motionbrick… Code: github.com/NVlabs/GR00T-Whol… Paper: arxiv.org/abs/2604.24833 (ACM TOG / SIGGRAPH 2026)

6:25

150

1,197

151,738

Yu Lei

Yuke Zhu retweeted

Yu Lei

@_OutofMemory_

Apr 20

🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.githu…

385

62,625

Yuke Zhu

Yuke Zhu @yukez

Apr 16

We've just open-sourced the SONIC training code, the training data, the algorithms used to generate the data, and more to come. You now have the full recipe for building SONIC whole-body controllers for your own humanoids. Enjoy! Code: github.com/NVlabs/GR00T-Whol… Web Demo: nvlabs.github.io/GEAR-SONIC/…

GitHub - NVlabs/GR00T-WholeBodyControl: Welcome to GR00T Whole-Body Control (WBC)! This is a...

Welcome to GR00T Whole-Body Control (WBC)! This is a unified platform for developing and deploying advanced humanoid controllers. This includes: Decoupled WBC models used in NVIDIA Isaac-Gr00t, Gr0...

github.com

Zhengyi “Zen” Luo

@zhengyiluo

Feb 20

SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code model already there, training code and gr00t integration coming soon! Code: github.com/NVlabs/GR00T-Whol… Docs: nvlabs.github.io/GR00T-Whole… Site: nvlabs.github.io/GEAR-SONIC/

3:07

174

16,822

Max Fu

Yuke Zhu retweeted

Max Fu

@letian_fu

Apr 1

Robotics: coding agents’ next frontier. So how good are they? We introduce CaP-X: an open-source framework and benchmark for coding agents, where they write code for robot perception and control, execute it on sim and real robots, observe the outcomes, and iteratively improve code reliability. From @NVIDIA @Berkeley_AI @CMU_Robotics @StanfordAILab capgym.github.io 🧵

1:31

126

632

168,796

Huihan Liu

Yuke Zhu retweeted

Huihan Liu @huihan_liu

Mar 5

Catastrophic forgetting has long been a challenge in continual learning. However, our new study found that pretrained Vision-Language-Action (VLA) models are surprisingly resistant to forgetting! Zero forgetting, or even positive backward transfer, is possible with simple experience replay. arxiv.org/abs/2603.03818

108

667

53,671

Yuke Zhu

Yuke Zhu @yukez

Mar 2

Today, we publicly released RoboCasa365, a large-scale simulation benchmark for training and systematically evaluating generalist robot models. Built upon our original RoboCasa framework, it offers: • 2,500 realistic kitchen environments; • 365 everyday tasks (basic skills long-horizon mobile manipulation); • Over 3,200 objects with many articulated fixtures/appliances. All are designed for fully controlled, reproducible benchmarking of robotic policies. Progress in robotic foundation models is real. But it’s still hard to answer basic questions like: How close are we to general-purpose autonomy? What factors drive generalization? What are the model/data scaling curves like? Real-world eval is slow and noisy, and existing sims (like LIBERO, which we built 3 years ago) often lack sufficient task and scene diversity. This benchmark comes with 2,200 hours of demonstrations and 500K trajectories to support studies of multi-task training, pretraining, and continual learning at scale. Check it out at robocasa.ai

1:17

340

23,644

Yuke Zhu

Yuke Zhu @yukez

Feb 27

CoRL is coming to Austin, TX this November! As General Chair, I'm thrilled to welcome the robot learning community. 2026 feels like a pivotal year as AI-powered robotic systems begin deploying at scale for real-world tasks. This year, I hope CoRL will be the forum that connects cutting-edge research with industrial practice. Please submit your best work and join us in Austin. DM me what you'd love to see CoRL do better! corl.org/

CoRL 2026

Welcome to CoRL 2026!

corl.org

Conference on Robot Learning @corl_conf

Feb 24

Calling all researchers! 🤖The CoRL 2026 website is officially live at corl.org with key dates for your submissions: 🗓 May 25: Abstract Submission 🗓 May 28: Full Paper Submission 🗓 Nov 9-12: Conference in Austin, TX Send us your coolest work! #RobotLearning

165

15,837

Jim Fan

Yuke Zhu retweeted

Jim Fan

@DrJimFan

Feb 25

We trained a humanoid with 22-DoF dexterous hands to assemble model cars, operate syringes, sort poker cards, fold/roll shirts, all learned primarily from 20,000 hours of egocentric human video with no robot in the loop. Humans are the most scalable embodiment on the planet. We discovered a near-perfect log-linear scaling law (R² = 0.998) between human video volume and action prediction loss, and this loss directly predicts real-robot success rate. Humanoid robots will be the end game, because they are the practical form factor with minimal embodiment gap from humans. Call it the Bitter Lesson of robot hardware: the kinematic similarity lets us simply retarget human finger motion onto dexterous robot hand joints. No learned embeddings, no fancy transfer algorithms needed. Relative wrist motion retargeted 22-DoF finger actions serve as a unified action space that carries through from pre-training to robot execution. Our recipe is called "EgoScale": - Pre-train GR00T N1.5 on 20K hours of human video, mid-train with only 4 hours (!) of robot play data with Sharpa hands. 54% gains over training from scratch across 5 highly dexterous tasks. - Most surprising result: a *single* teleop demo is sufficient to learn a never-before-seen task. Our recipe enables extreme data efficiency. - Although we pre-train in 22-DoF hand joint space, the policy transfers to a Unitree G1 with 7-DoF tri-finger hands. 30% gains over training on G1 data alone. The scalable path to robot dexterity was never more robots. It was always us. Deep dives in thread:

1:18

150

283

1,772

291,298

Jim Fan

Yuke Zhu retweeted

Jim Fan

@DrJimFan

Feb 20

Announcing DreamDojo: our open-source, interactive world model that takes robot motor controls and generates the future in pixels. No engine, no meshes, no hand-authored dynamics. It's Simulation 2.0. Time for robotics to take the bitter lesson pill. Real-world robot learning is bottlenecked by time, wear, safety, and resets. If we want Physical AI to move at pretraining speed, we need a simulator that adapts to pretraining scale with as little human engineering as possible. Our key insights: (1) human egocentric videos are a scalable source of first-person physics; (2) latent actions make them "robot-readable" across different hardware; (3) real-time inference unlocks live teleop, policy eval, and test-time planning *inside* a dream. We pre-train on 44K hours of human videos: cheap, abundant, and collected with zero robot-in-the-loop. Humans have already explored the combinatorics: we grasp, pour, fold, assemble, fail, retry—across cluttered scenes, shifting viewpoints, changing light, and hour-long task chains—at a scale no robot fleet could match. The missing piece: these videos have no action labels. So we introduce latent actions: a unified representation inferred directly from videos that captures "what changed between world states" without knowing the underlying hardware. This lets us train on any first-person video as if it came with motor commands attached. As a result, DreamDojo generalizes zero-shot to objects and environments never seen in any robot training set, because humans saw them first. Next, we post-train onto each robot to fit its specific hardware. Think of it as separating "how the world looks and behaves" from "how this particular robot actuates." The base model follows the general physical rules, then "snaps onto" the robot's unique mechanics. It's kind of like loading a new character and scene assets into Unreal Engine, but done through gradient descent and generalizes far beyond the post-training dataset. A world simulator is only useful if it runs fast enough to close the loop. We train a real-time version of DreamDojo that runs at 10 FPS, stable for over a minute of continuous rollout. This unlocks exciting possibilities: - Live teleoperation *inside* a dream. Connect a VR controller, stream actions into DreamDojo, and teleop a virtual robot in real time. We demo this on Unitree G1 with a PICO headset and one RTX 5090. - Policy evaluation. You can benchmark a policy checkpoint in DreamDojo instead of the real world. The simulated success rates strongly correlate with real-world results - accurate enough to rank checkpoints without burning a single motor. - Model-based planning. Sample multiple action proposals → simulate them all in parallel → pick the best future. Gains 17% real-world success out of the box on a fruit packing task. We open-source everything!! Weights, code, post-training dataset, eval set, and whitepaper with tons of details to reproduce. DreamDojo is based on NVIDIA Cosmos, which is open-weight too. 2026 is the year of World Models for physical AI. We want you to build with us. Happy scaling! Links in thread:

1:17

176

1,234

208,690

Yuke Zhu

Yuke Zhu @yukez

Feb 20

We have seen rapid progress in humanoid control — specialist robots can reliably generate agile, acrobatic, but preset motions. Our singular focus this year: putting generalist humanoids to do real work. To progress toward this goal, we developed SONIC (nvlabs.github.io/GEAR-SONIC/), a Behavior Foundation Model for real-time, whole-body motion generation that supports teleoperation and VLA inference for loco-manipulation. Today, we’re open-sourcing SONIC on GitHub. We are excited to see what the community builds upon SONIC and to collectively push humanoid intelligence toward real-world deployment at scale. 🌐 Paper: arxiv.org/abs/2511.07820 📃 Code: github.com/NVlabs/GR00T-Whol…

3:07

353

67,356

Zi-ang Cao

Yuke Zhu retweeted

Zi-ang Cao

@ziang_cao

Feb 10

🚀 Introducing CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation! Current humanoids face a trade-off: they are either Agile & Stiff OR Slow & Soft. CHIP breaks this barrier. We enable on-the-fly switching between Compliant (wiping 🧼, collaborative holding 📦) and Stiff (lifting dumbbells 🏋️, opening doors 🚪💪) behaviors—all while maintaining agile skills like running! 🏃💨 Website: nvlabs.github.io/CHIP/ Join me for a deep dive on how CHIP enables adaptive control for complex tasks. 🧵↓

0:26

216

25,453

Yuke Zhu

Yuke Zhu @yukez

Feb 4

📢 New paper from GEAR team @NVIDIARobotics We released DreamZero, a World Action Model that turns video world models into zero-shot robot policies. Built on a pretrained video diffusion backbone, it jointly predicts future video frames and actions. 🌐 dreamzero0.github.io/

Joel Jang

@jang_yoel

Feb 4

Introducing DreamZero 🤖🌎 from @nvidia > A 14B “World Action Model” that achieves zero-shot generalization to unseen tasks & few-shot adaptation to new robots > The key? Jointly predicting video & actions in the same diffusion forward pass Project Page: dreamzero0.github.io/ 🧵 (1/10)

2:03

102

8,402

Haoru Xue ✈️ CVPR

Yuke Zhu retweeted

Haoru Xue ✈️ CVPR

@HaoruXue

2 Dec 2025

Reality of robotics: humanoid kung fu is solved before they can open doors with RGB. Here we are. Introducing the frontier of sim2real at NVIDIA GEAR. 100% sim data. RGB input only. Code name: 𝗗𝗼𝗼𝗿𝗠𝗮𝗻. We are opening the sim-to-real door. doorman-humanoid.github.io 🧵

1:01

511

370,478

Yuke Zhu

Yuke Zhu @yukez

20 Nov 2025

Robustness is key for real-world robot deployment — and RL is key to robustness. Proud of our work scaling GPU-based simulations and vision-based sim2real to build robust policies for humanoid loco-manipulation tasks.

Tairan He

@TairanHe99

20 Nov 2025

0:25

12,100

Yuke Zhu

Yuke Zhu @yukez

1 Nov 2025

Had a blast visiting @CMU_Robotics and gave a talk at the RI Seminar today, where I briefly mentioned our new work, PLD, on self-improving VLAs. It achieved 99.2% on LIBERO and a one-hour continuous execution of GPU assembly with a 100% success rate. Check this out!

Wenli Xiao

@_wenlixiao

31 Oct 2025

What if robots could improve themselves by learning from their own failures in the real-world? Introducing 𝗣𝗟𝗗 (𝗣𝗿𝗼𝗯𝗲, 𝗟𝗲𝗮𝗿𝗻, 𝗗𝗶𝘀𝘁𝗶𝗹𝗹) — a recipe that enables Vision-Language-Action (VLA) models to self-improve for high-precision manipulation tasks. PLD couples real-world residual reinforcement learning with standard supervised fine-tuning — letting robots discover, recover, and distill their own data flywheel. Quick 🧵

0:59

224

25,442