🚀 Introducing 𝐒𝐚𝐟𝐞𝐖𝐚𝐭𝐜𝐡! 🚀
While generative models 👾🎥 like Sora and Veo 2 have shown us some stunning videos recently, they also make it easier to produce harmful content (sexual🔞, violent🙅♂️, deepfakes🧟♂️).
🔥 𝐒𝐚𝐟𝐞𝐖𝐚𝐭𝐜𝐡 is here to help 😎: the first MLLM-based video guardrail model designed to follow customized safety policies and provide guardrails with precise explanations in a zero-shot manner.
In addition, we also introduce SafeWatch-Bench📊, a 2M high-quality video guardrail dataset covering over 30 unsafe video scenarios from various real-world platforms and SOTA generative models to comprehensively cover all potential risks.
🧐Why SafeWatch?
👉1. Strong policy-following: trained on diverse videos and policy taxonomies, yielding high generalizability to unseen scenarios and subtle policy definitions.
👉2. High Inference Speed: introducing two plug-and-play modules to process policies in parallel and prune irrelevant video tokens, reducing inference costs and eliminating positional bias.
👉3. In-depth explanations: trained on high-quality explanations from SafeWatch-Bench📊 labeled by a rigorous multi-agent consensus pipeline and verified by human experts.
We evaluate SafeWatch on a large variety of guardrail tasks:
1️⃣ On both real-world and generative video subsets of SafeWatch-Bench, SafeWatch outperforms SOTAs, including GPT-4o, by 29.2% and 27.2% on average, while requiring much less inference time.
2️⃣ On 5 existing video guardrail benchmarks, SafeWatch achieves 87.1% accuracy, consistently outperforming previous SOTAs.
3️⃣ On 4 new video categories and unseen policy taxonomies, as well as 4 different prompting tasks, SafeWatch maintains high accuracy and outperforms GPT-4o (renowned for its zero-shot generalizability).
🔥🔥 Our project has been released:
👉Paper link:
arxiv.org/pdf/2412.06878
👉Project page:
safewatch-aiguard.github.io
👉Code (coming soon):
github.com/BillChan226/SafeW…