Sphoenix

Sphoenix

3 Photos and videos

Tweets

Sphoenix

@SphoenixAI

22h

Often, robot demos make me a little suspicious. The Genesis AI Eno writeup made me trust the numbers instead, which is the rarer reaction. Two details did it. They grade their policies in simulation but don’t train on simulated frames, so a good score can't be the model memorizing the simulator instead of learning the task. That is a small decision with a lot of self-restraint behind it. And the hands: a 1:1, twenty-joint, soft-skinned hand isn't an aesthetic call, it's what lets gloved human demonstrations transfer without a retargeting step mangling them on the way in. The body around those hands is close to an afterthought, on purpose. Wheels, a folding panel tower, no legs. They kept the part of the human form the world's tools are shaped around and dropped the part they aren't. Covered in my article for BotNews: the hand, the brain, the 3 ms control loop, and the simulator they handed to the public while keeping the model for themselves. #AI #tech

Bot News

@BotNewsAI

22h

x.com/i/article/206704166066…

847

Bot News

Sphoenix retweeted

Bot News

@BotNewsAI

Jun 13

x.com/i/article/206560611339…

894

Saba Khalilnaji

Sphoenix retweeted

Saba Khalilnaji

@saba_khalilnaji

Jun 10

Meet Axol: a dual-arm robot designed for teams working with physical AI. Made in America. Axol is for builders who believe robots should work, in the real world, not just staged environments, and that the future of physical AI should be open not closed.

2:23

360

55,607

Ilir Aliu

Sphoenix retweeted

Ilir Aliu

@IlirAliu_

Jun 10

ETH Zurich just open-sourced their entire 2026 robot learning course. Not a MOOC. The actual course. Slides, lecture recordings, coding assignments, GitHub repo. The curriculum goes from imitation learning and RL all the way to Vision-Language-Action models and foundation models for robotics. Guest lectures from the co-founder of Physical Intelligence. The creator of Diffusion Policy. Pieter Abbeel. Dieter Fox. 12 weeks. Free. No signup. If you want to understand where robot intelligence is actually heading… this is the reading list the field is using right now. 📍[cvg.ethz.ch/lectures/Robot-L…] —— Weekly robotics and AI insights. Subscribe free: 22astronauts.com

312

2,115

116,246

Sphoenix

Sphoenix

@SphoenixAI

Jun 4

Most of the manipulation fragility I see in the wild traces back to one mismatch: the vision encoder was trained on static images, the policy was trained on motion, and nobody on the stack is in charge of the verb. DynaFLIP picks that exact fight. Instead of bolting a CLIP/SigLIP/DINOv2-style backbone in front of an MLP, a Diffusion Policy, or a VLA and asking the policy to learn the dynamics from demonstrations alone, it trains the encoder itself to anticipate motion. Three signals get aligned on a shared sphere: image transitions, the language instruction, and 3D scene flow. The objective shrinks the triangle they span. A cosine regularizer keeps the triangle from cheating itself flat. InfoNCE negatives keep the three embeddings from collapsing into one point. At deployment, the model wants a single RGB frame. The flow and language are gone. The anticipation is baked into the features. The detail that keeps me thinking, given the egocentric-data-collection thread I’ve been pulling on (hat-cams, UMI rigs, haptic gloves): the training data is action-free video, robot and human. The supervision rides on watching the world change, which means human video walks in as training data. A different lever than collecting more teleop. Not a VLA. It picks no actions. It is the eye the action picker looks through, with a sliver of world-model instinct smuggled into the backbone. My article here ⬇️

Bot News

@BotNewsAI

Jun 4

x.com/i/article/206259659888…

128

Jeannette Bohg

Sphoenix retweeted

Jeannette Bohg @leto__jean

Jun 1

My favourite thing at #ICRA26? The workshops. Because you learn what everyone has been up to. In that spirit, I will talk about our new Dexterous Manipulation work at the workshop on Dexterity with Multifingered Hands: 13:55-14:20, Stolz 2. Here is a teaser (video plays at 1x)

0:16

450

66,375

Jia-Bin Huang

Sphoenix retweeted

Jia-Bin Huang

@jbhuang0604

Jun 2

Hey! A new vision encoder for robotics is in town 👀🤖 Instead of using models trained on static images (CLIP, SigLIP, DINO), we bake the dynamics-awareness directly into perception. It transfers well everywhere and boosts real-world OOD success by 22.5% Check it out👇

Jusuk Lee

@jusukle

Jun 1

Are you still running your robot policies on vision encoders trained purely on static images? Nowadays, the standard practice in robot learning is to plug in powerful vision models like CLIP, SigLIP, or DINOv2. This inherits a quiet, convenient assumption: “Let mainstream computer vision handle perception, and the downstream policy will figure out the dynamics.” But let’s be real for a moment. Is this truly the best we can do? We introduce DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation.⬇️ 🔷 Dynamics upstream: we push motion understanding into perception. 🔷 Tri-modal-dynamics supervision: image transitions × language × 3D flow, fused via simplex-volume alignment (260K trajectories from robot & human video) 🔷 Transfers everywhere: a visual backbone for diverse policies (MLP, Diffusion Policy, VLA) 🔷 22.5% over the strongest baseline (DINOv2, SigLIP) under real-world OOD 🔷 Open-Source & easy to use 🌐 Website: dynaflip-robotics.github.io 📄 Paper: arxiv.org/abs/2605.30350 💻 Code: github.com/JU-SUK/DynaFLIP 🤗 Hugging Face: huggingface.co/jlee-larr/dyn…

1:37

293

38,637

Byungjun Kim

Sphoenix retweeted

Byungjun Kim

@byungjun__kim

Jun 2

Heading to #CVPR2026! We present 🖐️🌏 𝐃𝐞𝐱𝐭𝐞𝐫𝐨𝐮𝐬 𝐖𝐨𝐫𝐥𝐝 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐃𝐖𝐌) — a scene-action-conditioned video diffusion model that simulates human manipulation in static 3D scenes from egocentric hand motions. 👥 Byungjun Kim, Taeksoo Kim(co-first, @taeksu98), Junyoung Lee(@junc0ng), and Hanbyul Joo(@jhugestar) 📄 Paper: arxiv.org/abs/2512.17907 🌐 Project Page: snuvclab.github.io/dwm/ 💻 Code: github.com/snuvclab/dwm 📍 Main Poster - Jun 7 (Sun), 11:45–13:45 (ExHall F, #97) Also at: 🎬VideoWorldModel Workshop - Jun 3 (Wed), 9:50–10:40 (Room 705/707) 🤖 H2R Workshop - Jun 4 (Thu), 12:15–13:30 (Mile High 2A) See you in Denver!

0:21

201

24,641

Ke Li 🍁

Sphoenix retweeted

Ke Li 🍁

@KL_Div

Jun 2

Diffusion and flow matching-based robot planners are slow and generate noisy and jerky trajectories. Delighted to share our ICRA 2026 paper, which leverages IMLE to improve planning frequency 19-fold from 4.3 Hz to 83 Hz and reduces jerk by 38% relative to flow matching. Joint work w/ Grayson Lee, Minh Bui, Shuzi Zhou, Yankai Li and Mo Chen. (1/7)

0:24

792

94,961

Bot News

Sphoenix retweeted

Bot News

@BotNewsAI

Jun 2

The hands reportedly cost more than the body they're bolted to. Stanford, ETH Zurich, Ai2, and UC San Diego all want one anyway. @nvidia used GTC Taipei to unveil its Isaac GR00T Reference Humanoid Robot: a @UnitreeRobotics H2 Plus body with Sharpa Wave tactile hands, a Jetson Thor brain, and the open Isaac GR00T stack, sold as one pre-assembled platform. Sharpa Wave packs 22 active DOF, 1,000 tactile pixels per fingertip, and 0.005 N force sensitivity. Sharpa hasn't disclosed prices, but industry reporting (@36Kr, @chris_j_paxton ) pegs each hand at around $50,000. The headline isn't the spec sheet — it's the guest list. Four top labs adopting the same reference rig means cross-lab results finally become comparable. Ships from Unitree in late 2026. Our breakdown ↓ #Robotics #Humanoid #AI #NVIDIA #IsaacGROOT

Bot News

@BotNewsAI

Jun 1

x.com/i/article/206159730261…

236

Bot News

Sphoenix retweeted

Bot News

@BotNewsAI

May 14

Replying to @adcock_brett

Congratulations on making history. This was so exciting for the robotics community.

296

Bot News

Sphoenix retweeted

Bot News

@BotNewsAI

Apr 24

x.com/i/article/204746015521…

Bot News

Sphoenix retweeted

Bot News

@BotNewsAI

Apr 16

x.com/i/article/204482858619…

326,091

Bot News

Sphoenix retweeted

Bot News

@BotNewsAI

Apr 16

👁️The most dangerous AI failure in the DoW will not look like a hallucination. 👁️It will look formal, defensible, and audit-ready. 👁️Then it will be approved, inherited, and believed. 👁️I call that pathway the Reliability Kill Chain 👁️Why "human in the loop" is not enough:

Bot News

@BotNewsAI

Apr 16

x.com/i/article/204482858619…

326,074

WestlakeRobotics

Sphoenix retweeted

WestlakeRobotics

@westlake_robot

29 Sep 2025

The World’s First Universal Model for Real-Time Action Generation. [Pure edition, One Seamless Take] An embodied extension, always by your side. Westlake Robotics × Westlake University GAE (General Action Expert) – The Universal Pretrained Model for Action. @TheHumanoidHub

6:08

132

32,098

Ilir Aliu

Sphoenix retweeted

Ilir Aliu

@IlirAliu_

5 Mar 2025

Legged Locomotion… meets Skateboarding [Paper ⬇️] Most robot movement models either rely on fixed patterns or struggle to handle complex changes. DHAL (Discrete-time Hybrid Automata Learning) takes a different approach: using reinforcement learning to teach robots when and how to switch movements in real-time: ✅ Learns when to switch between different motions without pre-labeled data ✅ Handles complex, high-dimensional movements like a quadrupedal robot on a skateboard ✅ Uses a multi-critic architecture to improve contact-based motion control ✅ Works in both simulation and real-world environments with strong results It proves that robots can learn movement transitions on their own, without predefined rules. Paper: arxiv.org/abs/2503.01842 Thanks to @uint8_Lau for bringing this to my attention!

0:23

113

512

42,516

Remi Cadene

Sphoenix retweeted

Remi Cadene

@RemiCadene

3 Mar 2025

Meet the game-changer: LeKiwi 🥝 Crafted by @sigrobotics and @huggingface At 1/10 the cost of the best alternative out there, it's the most accessible mobile base for $300. Easier DIY to automate chords at home! 1/ 🧵 Link and details in thread 👇

0:47

554

68,459

Dr Singularity

Sphoenix retweeted

Dr Singularity

@Dr_Singularity

2 Mar 2025

This is BIG Robotics will accelerate Unitree Robotics, the Hangzhou based firm behind the viral G1 robot, has open-sourced its algorithms and hardware designs, mirroring the collaborative ethos that propelled AI breakthroughs such as DeepSeek’s open source models.

121

574

3,968

281,112

Ilir Aliu

Sphoenix retweeted

Ilir Aliu

@IlirAliu_

17 Feb 2025

🤖 What if robots could learn complex tasks with flexible, human-like precision? IKER makes it possible by using visual language models (VLMs) to create and refine rewards for robotic manipulation. Iterative Keypoint Reward (IKER) is a new way to teach robots how to handle multi-step tasks through visual rewards. It uses a real-to-sim-to-real process to help robots adapt and succeed in tricky situations. Why IKER stands out: ✅ Uses VLMs to generate Python-based visual rewards for precise object handling ✅ Helps robots plan multi-step actions, like moving obstacles before completing a task ✅ Trains in simulation with real-world scenes, then deploys in real environments ✅ Adapts tasks on the fly by learning from past attempts and adjusting for errors It shows that with visual rewards and adaptive learning, robots can handle real-world tasks with human-like flexibility and accuracy. Credit: Seen at @shivanshpatel35, great work from you and your colleagues! 🫶 💻 Project: iker-robot.github.io/ 📑 Paper: arxiv.org/abs/2502.08643 🎬 YouTube: youtube.com/watch?v=RpejalPG…

1:44

161

11,691

Andrew Ng

Sphoenix retweeted

Andrew Ng

@AndrewYNg

6 Feb 2025

Introducing Agentic Object Detection! Given a text prompt like “unripe strawberries” or “Kellogg’s branded cereal” and an image, we use an agentic workflow to reason at length and detect the specified objects. No need to label any training data. Watch the video for details.

2:10

196

691

4,478

397,732