PhD student @UTCompSci | Learn to understand ourselves and build intelligence.🤖🧠👁️

Joined July 2023
8 Photos and videos
Pinned Tweet
🤖Co-training is everywhere (sim↔real[e.g. GR00T, LBM], human↔robot[e.g. PI, EgoScale], even non-robot data[e.g. PI, LBM). But why does it work? How can we improve it further? Taking sim-and-real imitation learning in diffusion/ flow-based models as the test bed, we performed a rigorous mechanistic analysis, drawing on theoretical insights and multi-layered experiments. 😮Key insight: it’s all about representations. - Alignment → enables transfer - Discernibility → enables adaptation ⚖️Both are necessary — it's better to have more aligned representations, but the model must be able to discern the domains. We term this as structured representation alignment. ⬇️Let’s take a deep dive into that: Paper: arxiv.org/pdf/2604.13645 Website: science-of-co-training.githu…
5
66
385
62,626
It is a spiral upwards. Millions of yrs precipitation -> the data/data infrastructure we have now. As we have massive labs trying to maximize the strength, it’s beneficial to keep thinking about sample efficiency at the same time.
I feel like the obsession with continual learning / sample efficiency leads the field in the wrong direction. It's the bad career strategy of focusing on addressing your weaknesses instead of maximizing your strengths. Yes, there is an existence proof in the human brain, but it doesn't by any means guarantee that that'll be the most interesting AI. It may require $100T of R&D on chips and AI methods to get that unlock. On the other side of things, it's obvious that the coming models are extremely transformative and built on technologies that we already have. There's great reason to focus on just maximizing this. In reality, this is what the frontier labs are doing. They're going as fast as possible down the current development tree. This is good for progress and mixed for safety/geopolitics. Things like "automate white color work" and "replace the AI researcher job" are the guesses of labs because it's super hard to imagine futures for what these dramatic technologies will be. Don't take the labs too seriously about this being the exact goal. The exact goal is to push the frontier and monetize later. Solving continual learning, sample efficiency, etc would be great, but its trying to predict when a scientific breakthrough will come instead of trying to grapple with how the 100% sure thing coming technological revolution will change our lives. This isn't to say the Dwarkesh post is bad, it addresses some reasonable critiques, but it is the least bitter lesson pilled thing to be obsessed with human intelligence and how that can inform AI. We are in the AGI era of research. This is about embracing the unknown, scaling resources, and seeing what is enabled by making a series of magical tweaks to complex recipes that build frontier models. Lean into the alchemy. (it should be pretty clear that I personally, investing in open research agree we need fundamental science -- just not agreeing that this is what the "cutting edge of the frontier" is governed by)
2
221
Really cool
フィジカルAI、ついに独り立ち! テレオペなし・自律で、もしかめ5回成功。けん玉6級レベルに到達。 そして今日は私の誕生日。39歳、だけにサンキュー。
1
7
1,066
Yu Lei retweeted
Can we build generalist robots with zero teleoperation? Come participate in the discussion and weigh in at our ICRA'26 workshop, BeyondTeleop, starting at 8.45 am CEST today (June 5th)! 📍 Strauss 3
3
24
45
5,628
Yu Lei retweeted
Humanoid robotics is hitting a data wall. Teleop and mocap took us far, but they don’t scale to every object, terrain, and behavior. We’re releasing GRAIL: research.nvidia.com/labs/dai… — a fully digital pipeline for generating loco-manipulation data before the robot moves. 🧵(1/8)
4
69
347
41,662
Yu Lei retweeted
天行有常,不以尧存不以桀亡
Jun 3
World Labs CEO Dr. Fei-Fei Li: "The world is not made of words." "Language models have given machines an extraordinary command of concepts, vocabulary, and reasoning, but the physical world, virtual or real, runs on a different substrate." "Where language models learn the statistical structure of text, world models learn the statistical structure of space and time: how light falls on a surface, how a garden looks from an angle no camera has captured, how objects respond to force and follow the laws of physics." "Language gave machines a way to talk about that world. World models are how machines will finally come to understand, imagine, reason and interact with it." Full piece: drfeifei.substack.com/p/a-fu…
1
2
24
9,088
The CoRL 2026 keynote lineup is here! 🔹 Russ Tedrake — MIT; stealth startup @RussTedrake 🔹 Fei-Fei Li — Stanford; World Labs @drfeifei 🔹 Wolfram Burgard — UT Nuremberg @wolfram_burgard Join us in Austin this November. corl.org/program/keynotes
3
20
155
33,440
Yu Lei retweeted
Exciting news on GR00T: NVIDIA announces our first open humanoid robot platform, featuring Unitree H2 Plus and Sharpa hands, to accelerate academic research and facilitate cross-institutional collaboration. R&D in humanoid robotics needs broader participation. Open science is how we build the future faster, together.
NVIDIA announces the first open humanoid robot reference design built for robotics research. The NVIDIA Isaac GR00T Reference Humanoid Robot combines the @UnitreeRobotics H2 humanoid robot, @SharpaRobotics Wave five-fingered hands for dexterous manipulation, Jetson Thor onboard compute, and Isaac GR00T open software and models, giving researchers a full-stack platform from data capture to model deployment. Read the #NVIDIAGTC Taipei announcement: nvda.ws/4ef9VOr
3
14
109
14,607
Scale your humanoid motion data with motion planning in other domains, then transfer to real!! - try out the new great work from data gen cizar👑 @linkevin0 What a line of work: DexMimicGen -> CP-Gen -> HumanoidMimicGen!
Humanoids need data. Lots and lots of data. Introducing HumanoidMimicGen: a method that automatically generates 1000s of humanoid loco-manipulation demonstrations from a single teleoperated demonstration.
4
1,497
Yu Lei retweeted
Dexterous hands vary widely—so do tactile modalities. 🖐️🌈 Our vision on tactile human-to-robot transfer: 🔓 Not tied to specific hardware ♻️ Reuse human tactile demos across embodiments Presenting TactAlign, a cross-sensor tactile alignment for cross-embodiment policy transfer.
4
34
172
33,903
Yu Lei retweeted
Gemini Omni is a major leap in world understanding & multimodal editing! It can take photos, video & audio and build entirely new scenes. Over time it’ll be able to handle any input & any output - starting w/ video You can even give it your own videos & iterate on your ideas:
386
938
9,485
930,475
Yu Lei retweeted
I will be in Vienna in two weeks to give a keynote at #ICRA2026. I'll share our recent progress on building generalist humanoid robots and show some of the latest results. Check out my talk on June 3: 2026.ieee-icra.org/program/k…
3
6
101
5,506
Yu Lei retweeted
Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.
160
269
2,605
508,291
Yu Lei retweeted
We taught two F.03 robots to clean a room and make a bed in under 2 minutes - fully autonomous.
669
1,116
8,343
1,387,784
Love this.
Submit your CoRL workshop proposal! This year @RLioutikov and I wanted to make the workshop more "workshopy". Main changes are: - Half-day events only - Limited speaker slots - Challenge- and participation-driven - A post-workshop artifact (white paper, report, paper, etc.) summarizing the discussions
1
412
Yu Lei retweeted
Open-sourcing the whole package here! The last piece of our SONIC open-source, data collection, gr00t VLA post-training, inference just hit the repo! Train your Autonomous policies on G1 Whole-body with SONIC and gr00t N1.7! 🧑‍💻Code: github.com/NVlabs/GR00T-Whol… 📑Docs: nvlabs.github.io/GR00T-Whole…
SONIC is now open-source! Generalist whole-body teleoperation for EVERYONE! Our team has long been building comprehensive pipelines for whole-body control, kinematic planner, and teleoperation, and they will all be shared. This will be a continuous update; inference code model already there, training code and gr00t integration coming soon! Code: github.com/NVlabs/GR00T-Whol… Docs: nvlabs.github.io/GR00T-Whole… Site: nvlabs.github.io/GEAR-SONIC/
5
66
377
47,979
Yu Lei retweeted
GR00T-VisualSim2Real is now open source! VIRAL and DoorMan are now available with training code, simulation assets, and the full recipe for bringing visual sim-to-real loco-manipulation skills to your own humanoids. Repo: github.com/NVlabs/GR00T-Visu…
20 Nov 2025
Zero teleoperation. Zero real-world data. ➔ Autonomous humanoid loco-manipulation in reality. Introducing VIRAL: Visual Sim-to-Real at Scale. We achieved 54 autonomous cycles (walk, stand, place, pick, turn) using a simple recipe: 1. RL 2. Simulation 3. GPUs Website: viral-humanoid.github.io/ Arxiv: arxiv.org/abs/2511.15200 Deep dive with me: 🧵
6
97
618
116,831
Yu Lei retweeted
Just had a single-author paper accepted to #RSS2026! arxiv.org/abs/2604.21456 Motivated by growing interest in differentiable world models and physics simulators, we ask whether there is a unified principle for combining sampling-based global “exploration” with gradient-based local “exploitation” in trajectory and policy optimization with differentiable dynamics. By viewing control through the control-as-inference lens—recasting optimization as sampling from an unnormalized Boltzmann distribution defined by an energy function—Tempered Sequential Monte Carlo (TSMC) naturally integrates importance sampling with gradient-based Hamiltonian Monte Carlo. The key idea behind TSMC is to define a tempering path that gradually transforms an easy-to-sample prior into a complex, multi-modal posterior—or equivalently, deforms a convex energy landscape into a nonconvex one (graduated non-convexity)! We implement TSMC for both trajectory and policy optimization. On small- to medium-scale problems, it appears broadly applicable and compares favorably with state-of-the-art baselines. Excited to explore whether TSMC can scale to large-scale planning with complex, high-dimensional dynamics!
4
20
202
12,473
Yu Lei retweeted
Replying to @MingchenZhuge
@MingchenZhuge Thanks for inviting me for the talk and the panel discussion! It was super fun! Talk slides in yuandong-tian.com/talks/rsi_…. Thanks for promoting my book as well 😄

Replying to @tydsh
@tydsh always enjoy your presentations, whether at workshops or podcasts, as well as your insights on post-training, RSI, and even your sci-fi writing. 🥳🥳🥳 ~ recursive-workshop.github.io #RSI #ICLR2026 #破晓之钟
1
4
28
4,706
My dream of robot learning research.
We are working to restore mobility that was lost due to disease or spinal cord injury by allowing participants to control robotic arms with their thoughts. See how this is possible.
4
11
1,408
Yu Lei retweeted
Teaching a robot a new task typically means stopping operations, collecting teleoperated demonstrations, and retraining. That process takes hours at a minimum. We wanted to know if we could collapse it to seconds — from a single human demo, on the fly, no retraining required. Early research preview: we can.
9
14
84
7,595