š¢Thrilled to share that I'll be joining Harvard and the Kempner Institute as an Assistant Professor starting Fall 2026!
I'll be recruiting students this year for the Fall 2026 admissions cycle. Hope you apply!
š¢Excited to share that Iāve joined @MBZUAI as an Assistant Professor of Computer Vision this fall!
If youāre interested in CV4Science: building the next generation of foundation models & discovery tools for science, consider applying to MBZUAI. Iāll be recruiting PhD students!
After a year of severe injury and complex surgery, I wrote about my journeyāphysical, emotional, and everything in betweenāhoping it might help others feel less alone in their own recovery.
tender-aster-768.notion.siteā¦
Bored of linear recurrent memories (e.g., linear attention) and want a scalable, nonlinear alternative?
Our new paperĀ āTest-Time Training Done Rightā proposeĀ LaCT (Large Chunk Test-Time Training)Ā ā a highly efficient, massively scalable nonlinear memory with:
š”Ā Pure PyTorch (no custom kernels)
šĀ 10Ć GPU FLOPs utilization compared to previous nonlinear test-time training(ttt) methods.
š§ Ā Huge memory size (up to 40% of model params)
Project page with code: tianyuanzhang.com/projects/tā¦
(videos generated with our AR video diffusion)
1/9
Congrats to my great roommate during PhD, who got best paper award when I was still trying to figure out how to write an organized vision paper š Yucheng is so nice, kind, and have so much passion and insights on research. I cannot imagine how lucky his first batch students are
š„Thrilled to share that Iāll be joining the Computer Science Department at NYU Shanghai as an Assistant Professor starting Fall 2025! @nyushanghai
šÆ Iāll be recruiting PhD students across the entire NYU networkāincluding @nyushanghai, @nyutandon, and @NYU_Courantāto build efficient ML systems (algorithms, models, kernels, and more). Iāll also be hosting multiple RAs and interns (remote friendly). If you're interested, DMs are open! āļø
Really cool work on using MLLM to analyze city-scale image collection and automatically find the changes over time. Really curious to see all the interesting findings discovered by it!
Curious about how cities have changed in the past decade? We use MLLMs to analyse 40 million Street View images to answer this. Do you know that "juice shops became a thing in NYC" and "miles of overpasses were painted BLUE in SF"? More atāboyangdeng.com/visual-chroni⦠(vid ā w/ š)
Thrilled to introduce Video Depth Anything to support Depth Estimation for super-long videos (over 5 minutes).
šIt enjoys all the benefits of #DepthAnything: high-quality, fast, robust, etc.
Proj Page: videodepthanything.github.ioā¦
github.com/NVIDIA/Cosmos
Cosmos is a developer-first platform designed to help physical AI builders accelerate their development. It has pre-trained world foundation models (diffusion & autoregressive) in different sizes and video tokenizers. They are open models with permissive licenses. Try it out, and let us know how we can better help.
and also check out paper, which might be the first wave to explore this idea, "Diffusion Models Without Attention" (arxiv.org/abs/2311.18257) with @thoma_gu and @srush_nlp.
Why canāt more non-Chinese researchers show even a bit care about the racism against Chinese students in the NeurIPS keynote?
Why canāt more US researchers pay any attention to the suffering of international students?
I remain very disappointed, esp. at many established people.
Someone confronted on the spot, and they said ā Maybe there is one, maybe they are common, who knows what. I hope it was an outlier.ā Even this explanation is full of implicit racial bias. See the full conv: dropbox.com/scl/fi/2dtji0z84ā¦
1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.
NeurIPS acknowledges that the cultural generalization made by the keynote speaker today reinforces implicit biases by making generalisations about Chinese scholars. This is not what NeurIPS stands for. NeurIPS is dedicated to being a safe space for all of us. We want to address the comment made during the invited talk this afternoon, as it is something that NeurIPS does not condone and it doesn't align with our code of conduct. We are addressing this issue with the speaker directly.
NeurIPS is dedicated to being a diverse and inclusive place where everyone is treated equally.
š¤ Why do we extract diffusion features from noisy images? Isnāt that destroying information?
Yes, it is - but we found a way to do better. š
Hereās how we unlock better features, no noise, no hassle
š§µš
What happens when you train a video generation model to be conditioned on motion?
Turns out you can perform "motion prompting," just like you might prompt an LLM! Doing so enables many different capabilities. Hereās a few examples ā check out this thread š§µ for more results!
š Introducing CAT4D! š
CAT4D transforms any real or generated video into dynamic 3D scenes with a multi-view video diffusion model.
The outputs are dynamic 3D models that we can freeze and look at from novel viewpoints, in real-time!āØBe sure to try our interactive viewer!
I am on the job market for industry and academic roles. My research focuses identifying, designing, and building efficient, scalable, sustainable, and affordable abstractions and infrastructure for generative modeling research. I also minor law, and do AI policy work.
We've released our paper "Generating 3D-Consistent Videos from Unposed Internet Photos"! Video models like Luma generate pretty videos, but sometimes struggle with 3D consistency. We can do better by scaling them with 3D-aware objectives. 1/N
page: genechou.com/kfcw
Curious whether video generation models (like #SORA) qualify as world models?
We conduct a systematic study to answer this question by investigating whether a video gen model is able to learn physical laws.
Three are three key messages to take home:
1ā£The model generalises perfectly for in-distribution data, but fails to do out-of-distribution generalization. For combinatorial scenarios, scaling law is observed.
2ā£The models fail to abstract general rules and instead tries to mimic the closest training example.
3ā£The model prioritizes different attributes when referencing training data: color > size > velocity > shape.
This work is a joint effort with our outstanding intern @YangYue_THU.
Paper: arxiv.org/abs/2411.02385
Webpage: phyworld.github.io/
š New work on ARC-AGI: We achieved open model SOTA by finetuning Llama3-8B on synthetically generated ARC-like problems! Our method: Prompting LLMs to create both (1) input grid generators and (2) input-output transformations in Python to create problems grounded in code!