🚀Introducing Video-MMMU: Evaluating Knowledge Acquisition from Professional Videos
🎥 Knowledge-intensive Videos:
Spanning 6 professional disciplines (Art, Business, Science, Medicine, Humanities, Engineering) and 30 diverse subjects, Video-MMMU challenges models to learn and apply college-level knowledge from videos.
❓ Knowledge Acquisition-based QA Design:
QA pairs are aligned with the three stages of cognitive learning:
· Perception: Identifying knowledge.
· Comprehension: Understanding the underlying concepts.
· Adaptation: Applying the knowledge to practical scenarios.
📊 Quantitative Knowledge Acquisition Assessment (Δknowledge):
A novel metric that quantifies how much a model improves after watching a video, providing unique insights into its knowledge acquisition capability.
Why It Matters?
🚀 Pushing the Boundaries
Video-MMMU moves beyond perception and understanding of video to knowledge acquisition from video, positioning videos as a powerful medium for transmitting knowledge.
📚 Cognitive-Level Insights
Video-MMMU introduces three cognitive tracks—Perception, Comprehension, and Adaptation—that mirror human learning stages, providing a structured framework to evaluate how effectively models acquire, understand, and apply knowledge.
🧠 Bridging the Gap
Video-MMMU uncovers critical limitations in current LMMs and provides insights for advancing LMMs’ capabilities in knowledge acquisition from video.
Project Page:
videommmu.github.io/
ArXiv:
arxiv.org/html/2501.13826v1