@NVIDIA is working on one of the hardest problems in Physical AI so you don’t have to: generalist robotic pick-and-place.
We are excited to introduce GraspGenX at
#CVPR2026—a foundation model for robotic grasping that works out of the box for unknown robots, novel objects, and unseen environments.
Unlike Vision-Language-Action (VLA) models or dedicated grasp networks that require expensive, embodiment-specific training, GraspGenX is cross-embodiment and works zero-shot. You simply pass a "robot prompt" alongside an image of the object to generate actions.
🚀 Key Highlights:
1) Scaling: Trained on over 2 Billion 6-DoF grasp rollouts entirely in physics simulation—a dataset size practically impossible to collect via real-world teleoperation.
2) Zero-Shot Transfer: Works out of the box for several common robot grippers widely used across the research community and industry.
3) Built for the Agentic Era: Features native MCP support, client-server architecture, and skills.md, allowing seamless integration into LLM/Agentic robotics workflows.
4) Full Pipeline Integration: Pair it with other open foundation models (like SAM3) and advanced motion solvers like cuRoboV2 for full deployment in entirely unknown environments.
If you are currently executing pick-and-place with a VLA or WAM, you can use GraspGenX to generate sim-verified trajectory data and inject it into your pipeline. No need to waste precious real-world engineering hours on data collection for standard manipulation tasks.
🌐Website:
graspgenx.github.io/
💻Code:
github.com/NVlabs/GraspGenX
📄Paper:
arxiv.org/abs/2606.00998
📍CVPR Booth: Poster 619 on Jun 6 1:45 session at ExHall F
This work was led by the incredible
@BeiningH (Princeton), in collaboration with a phenomenal team at NVIDIA:
@erwincoumans,
@yu_wei_chao,
@balakumar_,
@clembow, and Stan Birchfield
#CVPR2026