Building AI that learns by interacting with the world. Associate Professor @ MIT, leading the Scene Representation Group (scenerepresentations.org).

Joined February 2016
115 Photos and videos
Pinned Tweet
Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n) davidcharatan.com/millivid/#
10
65
357
37,156
An fantastic scenario analysis of what future is in store for Europe if it doesn't change it approach to AI, taking it serious as the transformative technology that it is. Beyond LLMs, we are about to witness a revolution in robotics and automation - we aren't close to humanoid robots in homes yet, but much of the classical know-how of robot automation that was so essential to build Europe's industrial base will be disrupted by a new kind of robot workflow that, as of today, is dominated by American companies. Just as with LLMs before, this kind of robot automation will be dependent on compute as the key resource - if Europe does not (1) get compute (2) does everything in its power to foster AI research & development in Europe - they will be delegated to *buying the full stack of automation - robot hardware and the software that controls them* - from the US and China.
Most of Europe has not yet absorbed what AI is about to do to us. The few who have are not saying it loudly enough. We wrote Europe 2031: a five-year scenario of the continent's slide into irrelevance, how AI is driving it, and what can still be done to change course.
3
4
52
8,426
Yikes, *A* fantastic example, but of course X won't let me edit 🥲
4
1,461
Yet, at the same time, my key criticism of this report is that it focuses only on the short-term. Europe is in the position it is in today b/c it has never seriously built a research -> R&D -> product pipeline. The US has numerous mechanisms for seeding research topics that eventually emerge as products *decades* later: Consider the DARPA Grand self-driving challenge, which, more than *20 years ago*, planted the seed of autonomous driving research. A decade later, the US had an ecosystem of talent and experience to rely on when this technology started become ready for prime-time, and today, we have Waymo. Further, the US is based on *concentration* of talent and capital, something that is very much antithetical to Europe, but also to my home country Germany, which - due to an ancient political tradition - chooses to distribute resources across the whole country, where the US fosters hubs such as MIT & Harvard or the Bay Area to ensure a critical mass of R&D that is critical to actually get ambitious projects off the ground. I believe that the German way is better for society *if there is enough resources to go around*, unfortunately a situation that is very much not true in Europe anymore. In my opinion, for Europe to have a shot would require to invest into a single university and surrounding industry and startup ecosystem that concentrates talent and resources in a desirable location. There will be another revolution *after* this current "AI" wave has blown over, in 10-15 years from now. But if Europe doesn't change course, we will miss that one, too.
2
16
1,500
Introducing MilliVid, our new method for long-context video generation! MilliVid creates videos that are consistent over long time spans, without using retrieval heuristics or 3D maps! (1/n) davidcharatan.com/millivid/#
10
65
357
37,156
Also, shoutout to some related / relevant work: Of course, FramePack by Lvming Zhang! Then, inspiring work on flexible tokenization by folks such as @ShivamDuggal4, Roman Bachmann, @JRAllardice, David Mizrahi, @andrew_atanov, @_xwen_, @BingchenZhao, some of it in @zamir_ar lab!
1
11
1,497
A really cool idea! The question of how we can train sequence models such that they remember things that are T timesteps in the past without backpropping through T timesteps remains one of the core problems in ML, and this looks like an inspiring approach!
We never really knew how to train nonlinear RNNs well… BPTT struggled with vanishing grads (no long-range memory) and sequential rollout (hard to parallelizable). What if instead an oracle told us the optimal memory state m_t at each step? Then the RNN could do one-step supervised learning on (m_t, x_{t 1}) → m_{t 1} labels. We call this Supervised Memory Training (SMT): a replacement for BPTT that trains RNNs without unrolling them. SMT is time-parallelizable and solves vanishing gradients. Website: akarshkumar.com/smt/ arXiv: arxiv.org/abs/2606.06479
2
13
116
22,189
My students @RyuHyunwoooo and @evnkimm are presenting their paper “Scaling View Synthesis Transformers” today at 11:45 am at poster session 5. They are also brilliant and can chat about lots of things in the broader embodied intelligence landscape. Come by!!
1
38
4,389
I am incredibly grateful to be awarded the PAMI Young Researcher award. CVPR this year was amazing fun; I am very excited to be part of this community and feel honored for this vote of support in my students' and my work :) These are exciting times and I can't wait for next year!
Congratulations @vincesitzmann for winning the outstanding Young researcher @CVPR #CVPR2026 PAMI-TC awards! I’m sure many more awards to come,
23
4
306
29,449
Thanks a lot, Dima, I am so grateful to be in a research community with folks like you around!
Congratulations @vincesitzmann for winning the outstanding Young researcher @CVPR #CVPR2026 PAMI-TC awards! I’m sure many more awards to come,
2
50
5,638
Many of my students / collaborators are at CVPR - find them & chat! @ottogin1, diffusion models @ericmchen1, latent actions & robotics @RyuHyunwoooo, latent actions & robotics @ekim2339, robotics, view synthesis @SimulatedAnneal, robotics @twmitchel, latent actions & video
4
7
84
7,636
Oh no I tagged the wrong Evan! This is the right one: @evnkimm sorry to both Evans!!
4
1,897
Vincent Sitzmann retweeted
If you’re @CVPR: come by our tutorial tomorrow, June 4th 8:30-5:00, on Analytic Understanding of Diffusion Models. We’ll be covering how and why diffusion models generalize, learning about state-of-the-art analytical theories for their behavior, and covering key open questions.
1
12
63
8,263