Vision-language models can control robots, but what if the prompt is too complex for the robot to follow directly?
We developed a way to get robots to “think through” complex instructions, feedback, and interjections. We call it the Hierarchical Interactive Robot (Hi Robot).