Useful eval: compositional failure, not just FPS. When a requested behavior is missing or ambiguous, does the primitive refuse, recover, or choose a plausible wrong motion? For robotics, report contact-rich success under perturbation recovery latency.