🤔 SAGE started out of curiosity as to why agentic post training had not made a big impact for developing video models in mid 2025.
💡With entertainment videos as our testbed, we find that one can turn great "direct" models into awesome "any-horizon agents" with effortful RL!
🎥 Introducing SAGE, an agentic system for long video reasoning on entertainment videos—sports, vlogs, & more. It learns when to skim, zoom in, & answer questions directly. On our SAGE-Bench eval, SAGE with a Molmo 2 (8B)-based orchestrator lifts accuracy from 61.8% → 66.1%. 🧵