In the second before a play develops, a basketball player can instantly recognize the defensive scheme (perception), anticipate how the defense will rotate (causal reasoning), simulate several possible outcomes (simulation), and choose the best move (decision).
Today's video AI is far from this. These models can describe what they see, but they cannot explain why something happened, predict what comes next, or decide how to respond. We introduce SVI-Bench to measure these capabilities, and to push toward models that can reason over real-world, multi-agent video.