Low-light 3D reconstruction is a very challenging problem — image noise makes it difficult to establish correspondences, predict camera poses, and reconstruct scene geometry. We show that leveraging 3D foundation models enables scene reconstruction in the dark!
Congratulations, @KellyKZhu! It was great to work with Kelly during her MSc, and I'm excited for her to start her PhD at CMU. See here for all of Kelly's work, and more exciting things to come soon: kellyzhu.ca/
Seeing research come to life 👀
Kelly Zhu, MSc graduate in computer science, works across AI, robotics and computer vision — and shares what comes next ↓
uoft.me/cs6
Really cool project led by @HaojunQiu! We show a patch-based image generation method with closed-form diffusion (i.e., analytical denoising—no neural network). It's *super* efficient and even scales to gigapixel generation! #CVPR2026
📢📢📢We introduce Efficient-SID⚡️: training-free single-image diffusion model that generates images by sampling directly from an input image's patch distribution. Our method enables megapixel generation in <1s and scales to gigapixel generation. We also enable stylization, editing, and other applications. The outputs are constrained to follow exactly the patch distribution of the input — something that is very difficult to do with large models!
#CVPR2026 Highlight
🌐 haojunqiu.github.io/efficien…
📄 arxiv.org/abs/2606.04299
[1/6]
The workshop on Geometry-Free Novel View Synthesis and
Controllable Video Models is happening now in room 607 at #CVPR2026---currently listening to @du_yilun!
geofreenvs.github.io/
Attending #CVPR2026, Denver?
We have a stellar speaker lineup for “GeoFreeNVS: Geometry-Free Novel View Synthesis and Controllable Video Models.”
Come early. We anticipate seats will fill up FAST!
1/2
Image diffusion models like Flux natively output at 1k resolution, but what if we want to generate much higher resolution images (6k )? SEGA modifies the RoPE encodings during the diffusion process to generate high-resolution images---no fine-tuning required!
🚀 🚀 🚀 Excited to share our new paper:
Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration
What does it take for an agent to stay curious in a 3D world?
The answer is memory.
🌐 Project: recuriosity.github.io/
📄 Paper: arxiv.org/abs/2605.22814
💻 Code: github.com/recuriosity/recur…
Learning a compact, token-based representation of 4D objects enables multiple applications including image-to-4D, video-to-4D, 3D tracking, and more! Work led by @anagh_malik with collaborators at Apple — also check it out at CVPR 2026!
📢📢📢 Velox 🚀: Learning Representations of 4D Geometry and Appearance
In our #CVPR2026 paper, we introduce a method for learning a native 4D representation, useful for many downstream tasks, such as video-to-4D, 3D tracking, cloth simulation, and others!
🌐: apple.github.io/ml-velox
📝: arxiv.org/abs/2605.04527
High-fidelity generation is hitting a scaling crisis as DiT compute grows with image resolution and video length. But do we need high-resolution denoising at every step?
We introduce Spectral Progressive Diffusion, a plug-and-play framework for efficient image and video generation that directly exploits the spectral autoregression property of diffusion to grow resolution during denoising.
[1/7]
I will be presenting "Generative Re-Photography with Video Models" at several places over the next month. Hit me up if you are around!
MIT Media Lab-Apr 1
Harvard-Apr 2
MIT CSAIL-Apr 7
Cornell Tech-Apr 9
Princeton-Apr 10
CMU-Apr 13
Stanford-Apr 24
Berkeley-Apr 28
Congratulations to Dr. @sherwinbahmani for successfully defending his thesis! It was wonderful to work together (and shout-out to co-supervisor @taiyasaki). Onwards and upwards!
P.S. Check out all Sherwin's amazing work: sherwinbahmani.github.io/
Excited to share that Rhoda AI is now out of stealth mode. We've been building a foundation model for physical AI that generalizes to new tasks with a surprisingly small amount of embodiment-specific data. This is enabled by our direct video-action model (DVA).
1/N
After operating in stealth for the last 18 months @rhodaai , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.
After operating in stealth for the last 18 months @rhodaai , we’re excited today to finally show the world what we’ve been working on. We believe we’re on a path to physical AGI with the launch of our brand new foundation model, the Direct Video Action (DVA) model.
To bring generalist intelligent robots to the real world, we have to overcome the data scarcity problem.
At Rhoda, we are solving it by reformulating robot policies as video generation.
Today, we introduce the Direct Video-Action Model (DVA)
EGSR'26 Call for Papers is out!
Doing research on rendering? Physics-based, neural, stylized... Submit your work to EGSR!
Papers deadline: April 8th 2026
egsr2026.inria.fr/
ICCP 2026 is coming to @Princeton, July 13-15! Paper submissions are open, deadline April 10. Accepted papers published in ICCP Proceedings or IEEE PAMI Special Issue.
Take a sneak peek at already confirmed speakers on our website: iccp2026.iccp-conference.org!
#ICCP2026