Verry happy to share our new paper, This&That, an dynamic robot video generation model with language and simple gestures conditioning! Moreover, we also propose Diffusion Video to Action (DiVA) model to transfer generated videos to robot actions in the rollout environment.