Still wonder why folk don’t believe me when I tell them we will be real time on generative media creation in a few years, check out the ramp in capability we are seeing already.
Going to be some even more interesting stuff as we go into 2023, most exciting area in tech.
2/ Our approach proposes a new way to learn compact video representations using causal attention in time. We use a bidirectional masked transformer to generate video tokens from text. This scales really well! We can generate 30-second videos (128x128 at 8fps) in 22 seconds!