Robotics. AI Health & Safety.

Joined May 2022
3 Photos and videos
Often, robot demos make me a little suspicious. The Genesis AI Eno writeup made me trust the numbers instead, which is the rarer reaction. Two details did it. They grade their policies in simulation but don’t train on simulated frames, so a good score can't be the model memorizing the simulator instead of learning the task. That is a small decision with a lot of self-restraint behind it. And the hands: a 1:1, twenty-joint, soft-skinned hand isn't an aesthetic call, it's what lets gloved human demonstrations transfer without a retargeting step mangling them on the way in. The body around those hands is close to an afterthought, on purpose. Wheels, a folding panel tower, no legs. They kept the part of the human form the world's tools are shaped around and dropped the part they aren't. Covered in my article for BotNews: the hand, the brain, the 3 ms control loop, and the simulator they handed to the public while keeping the model for themselves. #AI #tech
1
4
13
847
Sphoenix retweeted

2
6
894
Sphoenix retweeted
Meet Axol: a dual-arm robot designed for teams working with physical AI. Made in America. Axol is for builders who believe robots should work, in the real world, not just staged environments, and that the future of physical AI should be open not closed.
36
32
360
55,607
Sphoenix retweeted
ETH Zurich just open-sourced their entire 2026 robot learning course. Not a MOOC. The actual course. Slides, lecture recordings, coding assignments, GitHub repo. The curriculum goes from imitation learning and RL all the way to Vision-Language-Action models and foundation models for robotics. Guest lectures from the co-founder of Physical Intelligence. The creator of Diffusion Policy. Pieter Abbeel. Dieter Fox. 12 weeks. Free. No signup. If you want to understand where robot intelligence is actually heading… this is the reading list the field is using right now. 📍[cvg.ethz.ch/lectures/Robot-L…] —— Weekly robotics and AI insights. Subscribe free: 22astronauts.com
21
312
2,115
116,246
Most of the manipulation fragility I see in the wild traces back to one mismatch: the vision encoder was trained on static images, the policy was trained on motion, and nobody on the stack is in charge of the verb. DynaFLIP picks that exact fight. Instead of bolting a CLIP/SigLIP/DINOv2-style backbone in front of an MLP, a Diffusion Policy, or a VLA and asking the policy to learn the dynamics from demonstrations alone, it trains the encoder itself to anticipate motion. Three signals get aligned on a shared sphere: image transitions, the language instruction, and 3D scene flow. The objective shrinks the triangle they span. A cosine regularizer keeps the triangle from cheating itself flat. InfoNCE negatives keep the three embeddings from collapsing into one point. At deployment, the model wants a single RGB frame. The flow and language are gone. The anticipation is baked into the features. The detail that keeps me thinking, given the egocentric-data-collection thread I’ve been pulling on (hat-cams, UMI rigs, haptic gloves): the training data is action-free video, robot and human. The supervision rides on watching the world change, which means human video walks in as training data. A different lever than collecting more teleop. Not a VLA. It picks no actions. It is the eye the action picker looks through, with a sliver of world-model instinct smuggled into the backbone. My article here ⬇️
2
4
128
Sphoenix retweeted
My favourite thing at #ICRA26? The workshops. Because you learn what everyone has been up to. In that spirit, I will talk about our new Dexterous Manipulation work at the workshop on Dexterity with Multifingered Hands: 13:55-14:20, Stolz 2. Here is a teaser (video plays at 1x)
12
48
450
66,375
Sphoenix retweeted
Hey! A new vision encoder for robotics is in town 👀🤖 Instead of using models trained on static images (CLIP, SigLIP, DINO), we bake the dynamics-awareness directly into perception. It transfers well everywhere and boosts real-world OOD success by 22.5% Check it out👇
Are you still running your robot policies on vision encoders trained purely on static images? Nowadays, the standard practice in robot learning is to plug in powerful vision models like CLIP, SigLIP, or DINOv2. This inherits a quiet, convenient assumption: “Let mainstream computer vision handle perception, and the downstream policy will figure out the dynamics.” But let’s be real for a moment. Is this truly the best we can do? We introduce DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation.⬇️ 🔷 Dynamics upstream: we push motion understanding into perception. 🔷 Tri-modal-dynamics supervision: image transitions × language × 3D flow, fused via simplex-volume alignment (260K trajectories from robot & human video) 🔷 Transfers everywhere: a visual backbone for diverse policies (MLP, Diffusion Policy, VLA) 🔷 22.5% over the strongest baseline (DINOv2, SigLIP) under real-world OOD 🔷 Open-Source & easy to use 🌐 Website: dynaflip-robotics.github.io 📄 Paper: arxiv.org/abs/2605.30350 💻 Code: github.com/JU-SUK/DynaFLIP 🤗 Hugging Face: huggingface.co/jlee-larr/dyn…
6
24
293
38,637
Sphoenix retweeted
Heading to #CVPR2026! We present 🖐️🌏 𝐃𝐞𝐱𝐭𝐞𝐫𝐨𝐮𝐬 𝐖𝐨𝐫𝐥𝐝 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐃𝐖𝐌) — a scene-action-conditioned video diffusion model that simulates human manipulation in static 3D scenes from egocentric hand motions. 👥 Byungjun Kim, Taeksoo Kim(co-first, @taeksu98), Junyoung Lee(@junc0ng), and Hanbyul Joo(@jhugestar) 📄 Paper: arxiv.org/abs/2512.17907 🌐 Project Page: snuvclab.github.io/dwm/ 💻 Code: github.com/snuvclab/dwm 📍 Main Poster - Jun 7 (Sun), 11:45–13:45 (ExHall F, #97) Also at: 🎬VideoWorldModel Workshop - Jun 3 (Wed), 9:50–10:40 (Room 705/707) 🤖 H2R Workshop - Jun 4 (Thu), 12:15–13:30 (Mile High 2A) See you in Denver!
2
20
201
24,641
Sphoenix retweeted
Diffusion and flow matching-based robot planners are slow and generate noisy and jerky trajectories. Delighted to share our ICRA 2026 paper, which leverages IMLE to improve planning frequency 19-fold from 4.3 Hz to 83 Hz and reduces jerk by 38% relative to flow matching. Joint work w/ Grayson Lee, Minh Bui, Shuzi Zhou, Yankai Li and Mo Chen. (1/7)
14
90
792
94,961
Sphoenix retweeted
The hands reportedly cost more than the body they're bolted to. Stanford, ETH Zurich, Ai2, and UC San Diego all want one anyway. @nvidia used GTC Taipei to unveil its Isaac GR00T Reference Humanoid Robot: a @UnitreeRobotics H2 Plus body with Sharpa Wave tactile hands, a Jetson Thor brain, and the open Isaac GR00T stack, sold as one pre-assembled platform. Sharpa Wave packs 22 active DOF, 1,000 tactile pixels per fingertip, and 0.005 N force sensitivity. Sharpa hasn't disclosed prices, but industry reporting (@36Kr, @chris_j_paxton ) pegs each hand at around $50,000. The headline isn't the spec sheet — it's the guest list. Four top labs adopting the same reference rig means cross-lab results finally become comparable. Ships from Unitree in late 2026. Our breakdown ↓ #Robotics #Humanoid #AI #NVIDIA #IsaacGROOT
2
5
236
Sphoenix retweeted
Replying to @adcock_brett
Congratulations on making history. This was so exciting for the robotics community.
1
2
13
296
Sphoenix retweeted

1
2
72
Sphoenix retweeted

1
4
326,091
Sphoenix retweeted
👁️The most dangerous AI failure in the DoW will not look like a hallucination. 👁️It will look formal, defensible, and audit-ready. 👁️Then it will be approved, inherited, and believed. 👁️I call that pathway the Reliability Kill Chain 👁️Why "human in the loop" is not enough:
2
5
53
326,074
Sphoenix retweeted
The World’s First Universal Model for Real-Time Action Generation. [Pure edition, One Seamless Take] An embodied extension, always by your side. Westlake Robotics × Westlake University GAE (General Action Expert) – The Universal Pretrained Model for Action. @TheHumanoidHub
6
33
132
32,098
Sphoenix retweeted
5 Mar 2025
Legged Locomotion… meets Skateboarding [Paper ⬇️] Most robot movement models either rely on fixed patterns or struggle to handle complex changes. DHAL (Discrete-time Hybrid Automata Learning) takes a different approach: using reinforcement learning to teach robots when and how to switch movements in real-time: ✅ Learns when to switch between different motions without pre-labeled data ✅ Handles complex, high-dimensional movements like a quadrupedal robot on a skateboard ✅ Uses a multi-critic architecture to improve contact-based motion control ✅ Works in both simulation and real-world environments with strong results It proves that robots can learn movement transitions on their own, without predefined rules. Paper: arxiv.org/abs/2503.01842 Thanks to @uint8_Lau for bringing this to my attention!
13
113
512
42,516
Sphoenix retweeted
Meet the game-changer: LeKiwi 🥝 Crafted by @sigrobotics and @huggingface At 1/10 the cost of the best alternative out there, it's the most accessible mobile base for $300. Easier DIY to automate chords at home! 1/ 🧵 Link and details in thread 👇
18
77
554
68,459
Sphoenix retweeted
This is BIG Robotics will accelerate Unitree Robotics, the Hangzhou based firm behind the viral G1 robot, has open-sourced its algorithms and hardware designs, mirroring the collaborative ethos that propelled AI breakthroughs such as DeepSeek’s open source models.
121
574
3,968
281,112
Sphoenix retweeted
17 Feb 2025
🤖 What if robots could learn complex tasks with flexible, human-like precision? IKER makes it possible by using visual language models (VLMs) to create and refine rewards for robotic manipulation. Iterative Keypoint Reward (IKER) is a new way to teach robots how to handle multi-step tasks through visual rewards. It uses a real-to-sim-to-real process to help robots adapt and succeed in tricky situations. Why IKER stands out: ✅ Uses VLMs to generate Python-based visual rewards for precise object handling ✅ Helps robots plan multi-step actions, like moving obstacles before completing a task ✅ Trains in simulation with real-world scenes, then deploys in real environments ✅ Adapts tasks on the fly by learning from past attempts and adjusting for errors It shows that with visual rewards and adaptive learning, robots can handle real-world tasks with human-like flexibility and accuracy. Credit: Seen at @shivanshpatel35, great work from you and your colleagues! 🫶 💻 Project: iker-robot.github.io/ 📑 Paper: arxiv.org/abs/2502.08643 🎬 YouTube: youtube.com/watch?v=RpejalPG…
5
32
161
11,691
Sphoenix retweeted
6 Feb 2025
Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.
196
691
4,478
397,732