BREAKING: OpenAI announces Sora, a new text-to-video model.
It can generate 1 min long high quality video that already rivals Runway.
Sora is currently being tested by experts to identify any risks or problems.
No paper yet but here's what we know about the architecture:
- Uses a diffusion model, starting with noise and refining it into video.
- Builds on transformer architecture for scaling.
- Represents videos as patches (similar to tokens in text models), allowing a range of data types.
- Incorporates techniques from DALL·E 3 for following text instructions closely.