Filter
Exclude
Time range
-
Near
🎥 Can AI video models truly understand physics? The newly released Physics-IQ benchmark, developed by INSAIT and Google DeepMind and led by Saman Motamed, PhD student at INSAIT, has sparked major discussion across the AI community following its presentation at #ICCV2025. 🔬 The work marks a significant step forward in understanding the physical reasoning limits of today’s generative video models - paving the way for future AI systems that not only generate realistic videos but also reason about the physical world with accuracy and depth. 📊 Physics-IQ provides a comprehensive benchmark of 396 real-world videos, covering diverse physical scenarios - from fluid dynamics to solid mechanics, challenging AI models to predict future frames and interactions beyond surface-level visual cues. 🤔 The findings were eye-opening: even state-of-the-art models like #Sora, #Runway, and #VideoPoet create visually stunning clips but fail to capture true physical dynamics, revealing the gap between perception and understanding. 🚀 The project has been met with great interest from the research community, highlighting the importance of integrating experiential and interactive learning into next-generation video models. 📂 Explore the open-source dataset, evaluation code, and results - links in comments #GenerativeAI #VideoAI #AIResearch #PhysicsInAI #PhysicalReasoning #AIUnderstanding #AIBenchmark #OpenSourceAI #FutureOfAI #AIInnovation #INSAIT
1
2
9
686
📢Motion Prompting: Controlling Video Generation with Motion Trajectories📽️ Developed by Google DeepMind, this cutting-edge framework introduces motion trajectories as a fundamental control signal🚀 AI models have long struggled with real-world physics, but frameworks like this bring us closer to replicating the complexity of motion dynamics with precision. ✨Key Highlights: ✅Spatio-Temporal Trajectories: The model leverages point trajectories to encode motion across time and space. This representation supports both sparse (object-specific) and dense (scene-wide) motions, ensuring precise control of motion patterns across various levels of granularity. ✅Motion Prompt Expansion: Converts simple user inputs (e.g., mouse drags) into complex, semi-dense motion trajectories. This allows users to specify high-level intentions like "rotate the head of a cat" or "sweep sand across a surface," which the system translates into detailed motion paths. ✅Track Embeddings: Each trajectory is encoded into a spatial-temporal volume with unique embeddings, enabling seamless representation of motion. This structure dynamically adapts to varying motion densities and ensures spatial consistency while preserving occlusion details. ✅Unified Framework: Unlike existing methods that rely on task-specific pipelines, this model achieves versatility. From object manipulation to camera motion control, motion transfer, and drag-based editing, it handles diverse video generation tasks in a single architecture. ✅Camera Control: By integrating monocular depth estimation, the framework computes 3D point clouds from input frames. It projects these points into camera trajectories, allowing for realistic orbital or dynamic camera movements without needing explicit pose annotations. ✅Motion Transfer: Extracts motion trajectories from a source video and applies them to a target object or scene. For example, the motion of a monkey's chewing can be seamlessly transferred to animate tree foliage, demonstrating robust cross-domain adaptability. ✅Emergent Behaviors: Displays advanced physical understanding with emergent phenomena like realistic hair tossing or sand displacement. These behaviors indicate the model’s ability to simulate real-world physics without explicitly being trained for it. ✅State-of-the-Art Results: Outshines baselines such as Image-Conductor and DragAnything on the DAVIS dataset, with superior metrics for appearance quality (PSNR, SSIM, LPIPS, FVD) and motion accuracy (End-Point Error). Project Page: motion-prompting.github.io/i… Paper: arxiv.org/abs/2412.02700 #MotionPrompting #GenerativeAI #VideoDiffusion #PhysicsInAI #SpatioTemporalControl
4
171