Professor at Stanford University & Co-founder at Rhoda AI

Joined January 2010
100 Photos and videos
Gordon Wetzstein retweeted
Aside from the official code release, I am thrilled to share that Spectral Progressive Diffusion a.k.a. SPEED (arxiv.org/abs/2605.18736) is now integrated into @lmsysorg's SGLang (@sgl_project)! 🚀 Instead of always running diffusion at full resolution, SPEED progressively grows resolution across denoising steps, drastically cutting token count and achieving >2× speedup with no quality loss. SPEED is now supported in SGLang for FLUX.1 & 2, Z-Image, Qwen-Image, and Wan. Support for Ideogram 4 incoming. Try it out now: docs.sglang.io/docs/sglang-d…. [1/4]
Today we release the code and a demo for our recent Spectral Progressive Diffusion paper🎉 Play around with it anytime! Just as what we have been doing also, we hope that it encourages the integration of our plug-and-play framework into latest and greatest image and video generation models!! We also included an agent skill wrapper in the repo, to making things easier. 🛜Project website: howardxiao.ca/speed/ 📄Paper: arxiv.org/abs/2605.18736 💻Github (with ComfyUI): github.com/howardhx/speed 🤗Demo (HuggingFace): huggingface.co/spaces/howard…
2
12
65
22,446
Speed up your diffusion or flow matching model using our plug & play SPEED module! No distillation necessary.
Today we release the code and a demo for our recent Spectral Progressive Diffusion paper🎉 Play around with it anytime! Just as what we have been doing also, we hope that it encourages the integration of our plug-and-play framework into latest and greatest image and video generation models!! We also included an agent skill wrapper in the repo, to making things easier. 🛜Project website: howardxiao.ca/speed/ 📄Paper: arxiv.org/abs/2605.18736 💻Github (with ComfyUI): github.com/howardhx/speed 🤗Demo (HuggingFace): huggingface.co/spaces/howard…
1
1
13
2,095
Gordon Wetzstein retweeted
Excited to release the code and model weights for our recent paper "Foveated Diffusion: Efficient Spatially Adaptive Image and Video Generation"! We hope this inspires continued research on mixed-resolution diffusion and efficient token allocation for image and video generation. 📰 Paper: arxiv.org/abs/2603.23491 💻 Code: github.com/bchao1/foveated_d… 🌐 Website: bchao1.github.io/foveated-di… 🤗 Weights: huggingface.co/bchao1/foveat… 🚀 Demo: huggingface.co/spaces/bchao1… We put together a fun demo on Hugging Face Spaces — try it out here: huggingface.co/spaces/bchao1…. Sketch your own tokenization layouts and pick model variants to run mixed-resolution diffusion live!
3
11
55
5,321
Gordon Wetzstein retweeted
I integrated our recent work, Spectral Progressive Diffusion a.k.a. SPEED (arxiv.org/abs/2605.18736), with the open-source @ideogram_ai model released yesterday. Turns out our method works out of the box and speeds up inference by up to 1.6× while preserving the high image quality! [1/4]
8
20
132
11,130
🔍 Ill-posed inverse problems (deblurring, inpainting, phase retrieval) need strong priors, and diffusion models are great ones. But how you use them matters. Posterior samplers fold the prior & data term into the reverse diffusion, needing costly score backprop or inner MCMC. MAP variable-splitting methods decouple them, but lack a dual variable to fix constraint violations. DDiff tackles both. At #CVPR2026 🧵👇
3
11
132
8,514
In short, DDiff is an ADMM-inspired diffusion solver that delivers: ✅ Higher image quality across metrics ✅ Robustness to high measurement noise ✅ Better measurement fidelity / lower residual ✅ Faster sampling ✅ Latent-diffusion compatibility ✅ Provable convergence under mild assumptions, a first for diffusion-based posterior optimization
1
6
806
The era of ultra-high-resolution imaging has arrived. Modern image sensors exceeding 200 MP resolution are common in smartphones, with over 400 MP sensors under development. However, the large number of pixels poses significant challenges for acquisition and processing, especially on edge devices. Which pixels should be acquired, and when, for bandwidth-efficient imaging and perception? We introduce Policy-based Foveated Imaging and Perception, an on-device, real-time, predictive, and task-aware framework that dynamically allocates sensor resolution to prioritize important regions under specific perception objectives. This paper will be presented at #SIGGRAPH2026! [1/6]
9
16
128
17,976
We further demonstrate that our framework works in real-world settings on a 200 MP Samsung ISOCELL HP2 prototype. For both object tracking and scene text recognition tasks, our sensor attention policy reproduces the smooth pursuit and saccadic scanpaths seen in simulations, achieving 200 MP bandwidth-efficient sensing under real-world conditions with sensor noise. [5/6]
1
1
3
731
Work led by @howard_xhc, together with @jan_on_x and @boyang_deng. 📄 Paper: arxiv.org/abs/2606.02565 🌐 Project page: howardxiao.ca/foveated/ Please check out our paper Fast Forward on July 19 and our paper presentation on July 23 at 2:50 PM at #SIGGRAPH2026!
5
569
Gordon Wetzstein retweeted
Pixel diffusions are getting hot🔥, but are they really good at texture details? Here's a one-shot comparison (no cherry picking) - HiDream: plain x0 prediction - AsymFLUX.2: plain AsymFlow prediction - L2P: velocity prediction through detailer head
1
12
113
9,224
Gaussian splatting models the structure of the observable parts of a 3D scene well, but what about the unobserved part? Video generation models should generate this!
🚀Tired of floaters, flickering, and blur in 3DGS? We introduce a geometry-informed video generator that refines 3DGS renderings in the wild. 🎥✨ We let the video model actually "see" the rendering process using a Gaussian Primitive buffer. #CVPR2026 @CVPR Project page: research.zhuliyuan.net/proje… 🔥 Highlights: ✅ Geometry-Buffer-conditioned video generator ✅ Refines optimization-based & feed-forward 3DGS ✅ Novel artifact simulation pipeline ✅ Highly efficient bidirectional processing ⚡ Thread 👇
3
40
5,341
High-fidelity generation is hitting a scaling crisis as DiT compute grows with image resolution and video length. But do we need high-resolution denoising at every step? We introduce Spectral Progressive Diffusion, a plug-and-play framework for efficient image and video generation that directly exploits the spectral autoregression property of diffusion to grow resolution during denoising. [1/7]
22
65
415
87,568
The spectral-domain perspective of our approach further enables a new editing capability: keeping low-frequency structure and modifying high-frequency details. This allows for efficient texture and style edits while better preserving geometry compared to SDEdit-style editing. [6/7]
1
1
14
2,804
Work led by @howard_xhc, together with @BrianCChao and @YarivLior. 📄 Paper: arxiv.org/abs/2605.18736 🌐 Project page: howardxiao.ca/speed/
1
3
25
2,642