VLA submissions at ICLR grew 18x in a single year, but World Action Models are showing more promising results when it comes to inference speed and adaptability.
ICLR is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning.
As Physical AI has gone mainstream, a ton of research has focused on leveraging VLAs to translate the intelligence of LLMs into robotics tasks.
But VLAs are slow, and WAM like Shengshu's MotuBrain achieved 96% on RoboTwin 2.0 with an architecture that supports policy learning, world modeling, video generation, inverse dynamics, and joint video-action prediction in a single model.
"These results show that unified world action models can scale in generality, predictive accuracy, and real-world deployability."
It's crazy that MotuBrain runs at 11 hz and adapts to new humanoid embodiments with only 50--100 trajectories!
Link in the comments