Gordon Wetzstein

Gordon Wetzstein

8 Photos and videos

Tweets

Lu Jiang retweeted

Gordon Wetzstein

@GordonWetzstein

4 Sep 2025

How do we generate videos on the scale of minutes, without drifting or forgetting about the historical context? We introduce Mixture of Contexts. Every minute-long video below is the direct output of our model in a single pass, with no post-processing, stitching, or editing. 1/4

1:31

607

157,128

Ceyuan Yang

Lu Jiang retweeted

Ceyuan Yang

@CeyuanY

14 Apr 2025

Glad to share Seaweed-7B, a cost-effective foundation model for video generation. Our tech report highlights the key designs that significantly improve compute efficiency and performance given limited resources, achieving comparable quality against other industry-level models. To unleash the power of the foundation model, Seaweed-7B further enables a wide range of downstream applications including image-to-video generation, human video generation, subject-consistent video generation, video-audio joint generation, long video generation and storytelling, real-time generation, super-resolution generation, camera controlled generation. Check out our webpage and report for more details: Webpage: seaweed.video/ Paper: seaweed.video/seaweed.pdf It's a wonderful journey of the last year. Thanks to all teammates for their contributions, sincerely.

1:31

515

77,430

AK

Lu Jiang retweeted

@_akhaliq

28 Mar 2025

Synthetic Video Enhances Physical Fidelity in Video Synthesis A turtle swimming in a green background. video matting illustration

0:05

100

16,628

Ceyuan Yang

Lu Jiang retweeted

Ceyuan Yang

@CeyuanY

14 Mar 2025

We propose Long Context Tuning (LCT) for scene-level video generation to bridge the gap between current single-shot generation and real-world narrative video productions. Homepage: guoyww.github.io/projects/lo… Report: arxiv.org/abs/2503.10589

4:29

103

46,814

Junfei Xiao

Lu Jiang retweeted

Junfei Xiao

@never1andd

13 Jan 2025

Want the deep dive? • arXiv: arxiv.org/abs/2501.06173 • Project Page: videoauteur.github.io See how VideoAuteur CookGen are shaping long narrative video generation. Big shout out to my co-authors and advisors: @fncheng2333 @liangkegui @YuilleAlan @roadjiang

494

AK

Lu Jiang retweeted

@_akhaliq

15 Jan 2025

Seaweed APT Diffusion Adversarial Post-Training for One-Step Video Generation Existing diffusion and autoregressive generative models require repeated neural network evaluations. It is extremely slow for the high-resolution video generation task, as a few-second video can take many minutes to generate. Our work is the first to demonstrate the generation of an entire video using a single neural function evaluation (1NFE) by using our proposed adversarial post-training technique. Our model generates 2 seconds of 1280x720 24fps videos in real-time. We showcase some of the results below:

3:05

203

21,199

Lu Jiang

Lu Jiang @roadjiang

23 Dec 2023

Interesting comparison between our VideoPoet and other competitive models. The comparison is incredibly helpful and reinforces my belief that VideoPoet excels in generating larger motions. We know the exact reasons for this and are working on improving single frame quality.

Anu Aakash @anuaakash

22 Dec 2023

Google VideoPoet, Runway, Pika & Genmo Google recently announced Video Poet. Google's VideoPoet is a large language model (LLM) that is capable of a wide variety of video generation tasks, including: - text-to-video - image-to-video - video stylization - video inpainting and outpainting - video-to-audio. I tried some of their text-to-image prompts (from their demo) in Pika, Runway and Genmo. Here are the results: 10 examples 1/10 Two teddy bears holding hands, walking down rainy 5th avenue.

0:03

1,286

Lu Jiang

Lu Jiang @roadjiang

11 Dec 2023

Excited to be at #NeurIPS2023 this week! Can't wait to reconnect with colleagues and make new connections. If you're up for a coffee chat, feel free to reach out. Find me at our spotlight/posters. arxiv.org/abs/2306.17842 Tue 12 5:15 p.m. arxiv.org/abs/2306.00983 Wed 13 10:45 a.m

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with...

In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or...

arxiv.org

904

Agrim Gupta

Lu Jiang retweeted

Agrim Gupta

@agrimgupta92

11 Dec 2023

We introduce W.A.L.T, a diffusion model for photorealistic video generation. Our model is a transformer trained on image and video generation in a shared latent space. 🧵👇

0:44

247

1,233

431,121

Lu Jiang

Lu Jiang @roadjiang

20 Nov 2023

😲While preparing the meta-review for #aaai24, I stumbled upon a new form of parallelism. It wasn't about the paper's concepts, but rather in the review comments, where two reviewers listed identical comments, word for word, over 200 matching words. #PeerReview #AIResearch

ALT Shocked Joey GIF

853

Lu Jiang

Lu Jiang @roadjiang

20 Nov 2023

#plagiarism #AcademicTwitter #aaai

436

Lu Jiang

Lu Jiang @roadjiang

2 Aug 2023

📢 Call for Papers! International Journal of Computer Vision (IJCV) invites submissions for its special issue on "Generative Models for Content Creation and Manipulation." 🗓️ Manuscript Submission Deadline: February 28, 2024 🔗 Check it out here: springer.com/journal/11263/u…

534

Lu Jiang

Lu Jiang @roadjiang

4 Jul 2023

Fascinating research by Google reveals the power of Language Models (LLMs) like PaLM or GPT in tackling visual tasks using in-context learning. This novel method enables LLMs to perform image generation tasks without requiring any parameter updates. #palm #GPT4 #LLMs

249

149,805

more replies

Lu Jiang

Lu Jiang @roadjiang

5 Jul 2023

Images can be described using multilingual words and utilized to reconstruct the image with varying levels of fidelity.

2,595

Lu Jiang

Lu Jiang @roadjiang

5 Jul 2023

Paper link: huggingface.co/papers/2306.1… Can GPT solve visual tasks by in-context learning? This paper shows this seems plausible as long as we can translate the image (or other non-linguisitic modality) into a language that the LLM can comprehend.

Paper page - SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

Join the discussion on this paper page

huggingface.co

14,241

Lu Jiang

Lu Jiang @roadjiang

21 Feb 2023

Which model do you think is more responsible in generating images with genders: #dalle2 or #stablediffusion? Our new paper on Gender Presentation Differences in Text-to-Image Models compares these models!" Check out our webpage: salt-nlp.github.io/GEP/ #AIart #generativeart

555

Lu Jiang

Lu Jiang @roadjiang

21 Feb 2023

@StevenyzZhang @Diyi_Yang Greg Turk

345

Lu Jiang

Lu Jiang @roadjiang

21 Feb 2023

Text-to-image models can generate images based on text, but we don't fully understand how they see gender. Our work proposes a new way to study this. We also propose a metric called GEP to estimate differences automatically. #GenderPresentationDifferences #TextToImage #AI

277

Lu Jiang

Lu Jiang @roadjiang

21 Feb 2023

@Diyi_Yang @StevenyzZhang

273