🚨A CHINESE LAB HAS JUST HUMILIATED HALF THE VIDEO INDUSTRY.
Meituan's LongCat team just dropped LongCat-Video-Avatar 1.5 — a production-ready, open-source framework for audio-driven avatar generation.
The input: one photo. One audio file. The output: a fully lip-synced speaking avatar.
Most AI avatar tools fall apart after a few seconds. Faces morph, lip sync breaks, identity drifts. LongCat 1.5 was built specifically to solve these problems — optimized for stability, long-form generation, and multi-character interactions.
It supports news broadcasting, education, entertainment, singing, e-commerce, and multi-person conversation — in both Chinese and English, across realistic and animated visual styles.
It is MIT-licensed. You can use it commercially.
What used to require a camera crew, a studio, and post-production editing now runs from a GitHub repo.
Version 1.5 also upgrades to Whisper-Large for sharper lip sync and cuts inference down to 8 steps via step distillation — making it significantly faster than before.
The tools that charged thousands for this workflow just got a free competitor dropped on them.
This is LongCat-Video-Avatar 1.5. It is worth your attention.