Another crazy CVPR 2026 world model result.
“Envisioning the Future” forecasts where points in a scene will move, step by step, from a single image. No dense video needed.
- 3,000x faster than video models, 10x fewer parameters, and 5x more accurate under a fixed compute budget.
- An autoregressive diffusion model rolls sparse point trajectories forward through short, predictable steps, modeling uncertainty as it grows.
- Why it matters: Rollouts get cheap enough to simulate thousands of futures and plan over them, hitting 78% billiard planning accuracy vs 16% for the best dense video baseline.