Sr. Staff RS at @GoogleDeepMind. Gemini Omni Team. Priors: GNNs, Structured World Models, Neural Assets, Veo Ingredients/References, Veo Robotics

Joined June 2009
281 Photos and videos
Pinned Tweet
4 May 2020
My PhD thesis "Deep Learning with Graph-Structured Representations" is now available for download: hdl.handle.net/11245.1/1b63b… -- It covers a range of emerging topics in Deep Learning: from graph neural nets (and graph convolutions) to structure discovery (objects, relations, events)
42
600
3,179
Thomas Kipf retweeted
I believe that exploring and making mistakes is key to learning and research.
6
10
99
10,371
Genuinely amazed by how many generalist visual capabilities one can squeeze out of this model
May 29
A quick test of using Omni to edit a video and add labelled bounding boxes around objects. > Add a labelled bounding box around the monster truck and the flag
1
5
61
8,538
Thomas Kipf retweeted
Gemini Omni can create action replays from different angles. I referenced a video clip with agent in Google Flow. Then asked it to give new angles that follow the original video timing, environment and movement. This test came really close to real-time consistency!
20
53
415
15,948
Thomas Kipf retweeted
World Models ftw :)
I uploaded a screenshot of Google Maps to Gemini Omni with a route drawn on it. Then I prompted it to create a first person view of someone driving a taxi cab along the route in the reference image. Pretty close to the real thing.
1
3
53
8,526
Getting a sloth into the office for this interview was the hardest part I heard.
We sat down with @OfficialLoganK @nbrichtova @doomie @gbarthmaron to talk about Gemini Omni Flash. It was pretty wild.
1
28
4,158
Thomas Kipf retweeted
Super excited to see Gemini Omni finally out in the world! Having been part of this project since its inception, I've seen how its native multimodal capabilities can redefine what's possible. We're truly entering the "Nano Banana era" for video generation. Give it a try!
4
7
60
4,103
Gemini Omni allows me to step into an alternative timeline where Graph Convolutional Nets (GCNs) made it to the big stage 🙃 Jokes aside: excited to finally share how far we've come with multimodal reference conditioning.
9
9
143
9,762
... and it is *so* fast ⚡️⚡️⚡️
Welcome to Gemini 3.5 Flash, our most powerful model to date. It pushes the frontier of intelligence, speed, and cost putting 3.5 Flash in a class of its own. We spent the last 6 months making sure Flash is great for real world use cases. It's available everywhere now!
4
1
33
3,727
Confession: I never had a single work-related sleepless night or ever pulled an all-nighter during my career incl. PhD. Don’t sacrifice your health. Sleep is a superpower — your brain on 8hrs of sleep is a lot smarter than your brain on sleep deprivation. Don’t listen to people who tell you to chronically sacrifice sleep for work. Sacrificing sleep for your kids/family is a different story.
Replying to @npparikh
I doubt all those things are really possible. Infact I believe, you are not doing a good PhD unless you have sleepless nights. Definitely just working on your thesis is possible if you follow a 9-6 schedule, but a good PhD which involves exploring, colabs, etc needs extra hours
28
70
1,077
105,780
You can work long hours (if you want to) and still prioritize sleep.
1
41
4,539
Thomas Kipf retweeted
This week on The Information Bottleneck, we are hosting @wellingmax 🥳🥳🥳 Max is one of the most influential ML researchers of the last two decades - Professor at UvA, ex-VP at Qualcomm and MSR, co-founder of ELLIS, two Test of Time awards, and advisor to a long line of people who shaped modern generative modeling (@dpkingma, @TacoCohen , @tkipf). His own work is behind a lot of the machinery the field is still building on - VAEs, graph convolutional networks, and equivariant neural networks, to name a few. He's now CTO and co-founder of CuspAI, designing new materials for carbon capture and PFAS removal using generative models plus physics-based simulation. What would you ask him? Drop questions below.
6
11
86
10,879
Thomas Kipf retweeted
Introducing Gemini Robotics ER 1.6, our new SOTA robotics model 🤖 which excels at visual and spacial reasoning, now available via the Gemini API!
74
180
2,211
117,866
Veo's Reference-to-Video capability is still #1 👑
BREAKING: Veo 3.1 Fast and Veo 3.1 by @GoogleDeepMind are in 1st and 2nd place on Multi-Image to Video Arena These models can successfully reference multiple input images to create a video that users love At an average generation time of 48 seconds, they are also the two fastest video generation models Huge congrats to the @GoogleDeepMind team for this achievement!
1
2
2,412
Thomas Kipf retweeted
We've signed an agreement with Google and Broadcom for multiple gigawatts of next-generation TPU capacity, coming online starting in 2027, to train and serve frontier Claude models.
614
1,321
20,823
3,017,621
Thomas Kipf retweeted
To build multi-player games with video models, we likely need a map. One challenge here is the action binding problem, which we solve with simple RoPE-based attention biasing. While existing multi-actor models specilize in one game, we generalize to 46 games and diverse actions!
Introducing ActionParty: the first video world model that controls up to 7 players simultaneously on the same screen across 46 game environments. We tackle the action binding problem in video diffusion, ensuring each player's action is applied to the right subject. 🧵
2
6
47
6,751
Thomas Kipf retweeted
Gemma 4 outperforms models over 10x their size! (note the x-axis is log scale!)
146
237
2,950
217,972