We did a fun and timely halloween experiment benchmarking our VLA models' robust reasoning capabilities! 🎃
There's a lot of interest in reasoning for VLA models, but I personally felt most tasks the community benchmark on (1) do not require meaningful reasoning capabilities, or (2) are somewhat unrealistic and do not represent tasks in real-world scenarios. So we decided to use object counting and manipulation as a real benchmark; it's quite common and realistic, but I haven't seen much work in this area. End-to-end Imitation learning would fail because of combinatorially many permutations you can ask to the robot.
Our VLA model can count and follow language commands fairly robustly -- all in an end-to-end architecture without external memory modules or counting logic. The model also robustly handles external disturbances to the scene (like shuffling the candy baskets). It's a small cute experiment we did to benchmark reasoning, but it's pretty fun so thought we'd share!
🎃 Halloween is coming. Our hardworking team is lining up for sweet treats, of course, served by Dynasaur!
DYNA VLA model now has robust agentic reasoning capability, allowing it to serve arbitrary combinations and counts of candies! Pure imitation learning can’t work given the combinatorially many possibilities.
No video edits. Uninterrupted, real-life, as always 🤖
Happy Halloween from DYNA!🍬