To build multi-player games with video models, we likely need a map. One challenge here is the action binding problem, which we solve with simple RoPE-based attention biasing.
While existing multi-actor models specilize in one game, we generalize to 46 games and diverse actions!
Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments.
We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵