#POTD |
#SIGGRAPHAsia2025 | Customizing Camera-Controllable Video Diffusion Models with Multi-View Performance Captures
📍
@Yuancheng_Xu0,
@wenqi_xian,
@maleewahaha et al. (Eyeline Labs × Netflix)
🎥 Why do customized video diffusion models lose identity as soon as the camera moves?
Because most personalization relies on single-view data and weak camera control. Once viewpoints change, identity consistency collapses—making cinematic use nearly impossible.
🔷 Method
The paper introduces a new customization pipeline built on multi-view performance capture 4D Gaussian Splatting (4DGS). 4DGS is repurposed as a data generator, producing identity-consistent videos with precise 3D camera trajectories and relightable conditions. A two-stage training strategy separates camera-conditioned video generation from subject-specific multi-view identity learning.
🔷 Results
The model achieves strong multi-view identity preservation under full 3D camera motion, accurate camera controllability, and robust lighting adaptation. It further supports multi-subject generation, scene customization, real-life video adaptation, and motion/layout control—key requirements for virtual production.
By decoupling camera learning from identity learning, this work bridges volumetric capture and video diffusion for filmmaking-ready generation.
Why it matters:
This paper moves video diffusion closer to real virtual production 🎬—where identity, camera motion, lighting, and interaction must all work together. It shows how generative models can move beyond single-view personalization toward cinematic, controllable video synthesis.
More in comments 👇
#PaperOfTheDay #VideoDiffusion #VirtualProduction #CameraControl #MultiView #GaussianSplatting #GenerativeAI #ComputerVision