Siyi Chen

Siyi Chen

Photos and videos

Tweets

Valts Blukis retweeted

Siyi Chen

@ChenSiyich

Jun 10

Wonderful to be back from #CVPR2026, and excited to share the release of our follow-up work: VoLo: A Physical Orchestrator for Open-Vocabulary Long-Horizon Manipulation VoLo introduces the idea of a physical orchestrator for open-vocabulary, long-horizon manipulation. Our goal is to move toward robots that can reason, plan, act, monitor, and recover by adaptively using VLA/WAMs, vision models, and action primitives as tools. We introduce three main contributions: 🤖 VoLoAgent — a physical orchestrator that plans, monitors, and recovers by adaptively using, halting, and redirecting robot actions with tools. 📊 RoboVoLo — a high-fidelity benchmark with 126 open-vocabulary long-horizon manipulation tasks spanning common sense, memory/state tracking, complex references, and world knowledge. 📈 A large-scale empirical study comparing action models, code-as-policy systems, TAMP-style systems, and ablations of the VoLoAgent orchestrator, complemented by real-robot experiments. This work was done during my internship at @NVIDIA and would not have been possible without my brilliant collaborators: Hugo Hadfield, Alexander Zook, @mikacuy, @luke_ch_song, @erwincoumans, @xuningy, Faisal Ladhak, @qu_1006, @BirchfieldStan, Jonathan Tremblay, and @robovalts. Huge thanks to everyone! 🔗 Project: chicychen.github.io/VoLo/ 🔗 Previous work, SpaceTools: spacetools.github.io/ #Robotics #EmbodiedAI #VisionLanguageModels #VLAModels #RobotLearning #NVIDIA #CVPR2026 #LongHorizonManipulation #AI #ComputerVision

8,590

Xuning Yang

Valts Blukis retweeted

Xuning Yang @xuningy

Jun 1

🎉 We added 2 SOTA WAMs to the RoboLab Leaderboard 🎉 Current leaders on RoboLab-120 (specific instr.): 🥇Cosmos3-Nano-Policy (39.7%) 🥈π0.5 (28.1%) 🥉DreamZero (28.1%) → See full results at: research.nvidia.com/labs/srl… → All policy clients available at: github.com/NVlabs/RoboLab/

127

30,576

Vineet Bhat

Valts Blukis retweeted

Vineet Bhat @vineet_2104

Jun 1

Presenting BOP-Ask at #CVPR2026 this Saturday in Denver! 📍 33M QA pairs. 6 tasks. 8 VLMs benchmarked against human annotators. Most VLMs stop at perception. BOP-Ask pushes them into fine grained interaction. 🔗 bop-ask.github.io/ #ComputerVision #Robotics #VLM #AI

143