Google just dropped Gemini Omni, their latest video model.
Think of it as the video version of Nano Banana, not a Veo 4 successor, but a conversational, multi-modal layer for video editing and generation.
The highlights (from the release):
â–¸ Video to video editing
â–¸ Character & style consistency
▸ World knowledge. Inherits Gemini’s grasp of history, physics, biology, and culture. Strong text rendering too.
â–¸ Natural language camera control
â–¸ Avatars. Scan yourself in the Gemini app, drop yourself into any scene.
My findings from a few days of quick testing:
â–¸ Video quality and realism stepped up, but still trails Kling and Seedance
▸ Image-to-character likeness is weak. Feeding it a reference image of a person doesn’t reliably hold the face.
▸ Video-to-video works…ish. It preserves most elements of the source scene, but my transformation tests came back conservative. It’s not aggressively reimagining the input.
â–¸ The standout is avatars. H/t to
@venturetwins for surfacing this, it’s buried in the Gemini UI. You scan your face, say a few numbers, and it builds an avatar you can drop into scenes. Quality from a quick iPhone capture was genuinely impressive.
The open question: why?
This feels aimed at the void Sora left behind, but is there a void there? Right now the avatar feature is locked to yourself. No social sharing, no pulling up other people’s avatars even with consent.
Once the API comes out, I’ll be curious what full inputs you can give the model and if it has the same web search toggle Nano Banana has to source live data.
What do you think?
#googleio