🤖 Introducing InternVLA-A1 — now fully open-sourced!
Many VLA models follow instructions well in static scenes… but struggle in dynamic environments (conveyor belts, rotating platforms, multi-robot setups). Why? They see the present—but can’t imagine the future.
InternVLA-A1 solution: unify perception, imagination, and action in one model:
✅ Scene understanding: Image text → task parsing
✅ Task imagination: Predict future frames → reason about dynamics
✅ Guided control: Execute actions steered by visual foresight
Powered by InternData-A1 - Large-scale high-quality simulated dataset, InternVLA-A1 stays robust under complex backgrounds, lighting, and distractions.
🔥 See it in action:
1️⃣ High-speed conveyor: track, predict, and stably grasp or flip packages
2️⃣ Rotating platform: task-aware recognition & precise pick-up of diverse items
📊 Outperforms π0 and Gr00t N1.5 on general manipulation benchmarks!
✨ Model, data, and code are all open!
Models:
modelscope.cn/models/InternR…
Datasets:
modelscope.cn/datasets/Inter…
GitHub:
github.com/InternRobotics/In…