Lucas Beyer (bl16)

Lucas Beyer (bl16)

37 Photos and videos

Tweets

Tobias Weyand retweeted

Lucas Beyer (bl16)

@giffmana

May 22

Can't believe we're getting this before GTA 6

CHRIS FIRST

@chrisfirst

May 21

I uploaded a screenshot of Google Maps to Gemini Omni with a route drawn on it. Then I prompted it to create a first person view of someone driving a taxi cab along the route in the reference image. Pretty close to the real thing.

0:08

1,610

228,765

Tobias Weyand

Tobias Weyand @0xtob

22 Oct 2025

Attending #ICCV2025? Come chat with us about our Minerva dataset that tests if models can truly reason about videos! 🕵️‍♀️ @ahmetius and @SachitMenon will be presenting the dataset at Poster Session 5 tomorrow (Thurs, Oct 23) morning. Find them at poster #391.

Tobias Weyand @0xtob

10 May 2025

We're excited to release Minerva 🕵️‍♀️, a benchmark to evaluate if AI can truly reason about videos, from spotting game-changing moments in sports 🏀 to understanding character motivations in short films 🍿. We provide the "why" behind the answers! Pointers below 👇

2,333

Tobias Weyand

Tobias Weyand @0xtob

22 Oct 2025

Here's our video about the paper: youtube.com/watch?v=gOdVhJ_M…

181

Tobias Weyand

Tobias Weyand @0xtob

28 Aug 2025

Our team is hiring! If you have experience in video understanding and/or generation, join us @GoogleDeepMind and help push the frontiers with Veo and Gemini!

Mikhail Sirotenko @sirotenko_m

26 Aug 2025

We're hiring at @GoogleDeepMind! Looking for a talented Research Engineer to help build the future of Video generation and undrestanding (Veo and Gemini). Apply here: job-boards.greenhouse.io/dee…

130

Tobias Weyand

Tobias Weyand @0xtob

18 Jun 2025

Excited that our Minerva and Neptune datasets are both featured in the Gemini 2.5 tech report! Minerva is among the most challenging video benchmarks with a large gap between SotA (Gemini 2.5 Pro, 67.6%) and humans (92.5%). github.com/google-deepmind/n…

GitHub - google-deepmind/neptune

Contribute to google-deepmind/neptune development by creating an account on GitHub.

github.com

Antoine Yang @AntoineYang2

17 Jun 2025

The newly generally available Gemini 2.5 Flash and Pro are even better at video understanding than the versions we shared in the blog a month ago, see more details in the tech report 😀

384

Boqing Gong

Tobias Weyand retweeted

Boqing Gong @BoqingGo

10 Jun 2025

Excited! VideoPrism-Base/Large are publicly available now: github.com/google-deepmind/v… Check it out if you need a versatile video encoder for video-language or video-native tasks. Feedback appreciated!

GitHub - google-deepmind/videoprism: Official repository for "VideoPrism: A Foundational Visual...

Official repository for "VideoPrism: A Foundational Visual Encoder for Video Understanding" (ICML 2024) - google-deepmind/videoprism

github.com

Google AI

@GoogleAI

25 Mar 2024

Introducing VideoPrism, a single model for general-purpose video understanding that can handle a wide range of tasks, including classification, localization, retrieval, captioning and question answering. Learn how it works at goo.gle/49ltEXW

ALT VideoPrism is a general-purpose video encoder that enables state-of-the-art results over a wide spectrum of video understanding tasks, including classification, localization, retrieval, captioning, and question answering, by producing video representations from a single frozen model.

2,277

Tobias Weyand

Tobias Weyand @0xtob

12 May 2025

Gemini 2.5 Pro sets the state of the art on our newly released Minerva video reasoning benchmark by scoring 63.5%. 📜 Paper: arxiv.org/abs/2505.00681v1 📊 Dataset: github.com/google-deepmind/n…

MINERVA: Evaluating Complex Video Reasoning

Multimodal LLMs are turning their focus to video benchmarks, however most video benchmarks only provide outcome supervision, with no intermediate or interpretable reasoning steps. This makes it...

arxiv.org

JB Alayrac @jalayrac

12 May 2025

A lot of work went to make Gemini 2.5 SOTA at video understanding, check out this 🧵 for more details! Looking back at where we were a year ago, the progress really feels phenomenal! So many things to unlock and enable from video 🎥 and we are only getting started!

4,549

Tobias Weyand

Tobias Weyand @0xtob

10 May 2025

1,333

more replies

Tobias Weyand

Tobias Weyand @0xtob

10 May 2025

The newly released Gemini 2.5 Pro (Preview 05/06) sets the state-of-the art on Minerva with 63.5% accuracy. Human accuracy is 92.5%. developers.googleblog.com/en…

Google for Developers Blog - News about Web, Mobile, AI and Cloud

developers.googleblog.com

176

Tobias Weyand

Tobias Weyand @0xtob

10 May 2025

Listen to the @agi_breakdown episode on Minerva here: aibreakdown.org/arxiv-paper-…

Arxiv paper - MINERVA: Evaluating Complex Video Reasoning - Welcome

aibreakdown.org

106

Tobias Weyand

Tobias Weyand @0xtob

4 Dec 2024

Excited to share Long-Video Masked Autoencoder (LVMAE) our team just published at @NeurIPSConf! We boost the context length of video models using an adaptive decoder and a dual-masking strategy and achieve SotA on several video benchmarks. Paper: arxiv.org/abs/2411.13683

Google AI

@GoogleAI

4 Dec 2024

Training video understanding models on longer contexts is computationally intensive. To address this, we present a novel approach that reduces the computational load while also improving the quality of the learned representations. More at: goo.gle/4fW5aIc

ALT Illustration of the adaptive tokenizer.

321

Tobias Weyand

Tobias Weyand @0xtob

14 Nov 2024

Thank you @JeffDean , very much appreciate the boost! This is really a team effort with my amazing colleagues @NagraniArsha, Mingda Zhang, @raminia, Rachel Hornung, @nitesh_ai, @under_fitting, Austin Meyers, @zhouxy2017, @BoqingGo, @CordeliaSchmid, @sirotenko_m, @ZhuZhu66595.

Jeff Dean

@JeffDean

14 Nov 2024

A nice new benchmark for long video understanding by Tobias Weyand @0xtob and others. This is likely to be one of the new frontiers of capabilities for large-scale multimodal models, and it's great to have a new benchmark to assess others in this area.

16,801

Tobias Weyand

Tobias Weyand @0xtob

12 Nov 2024

Excited that our work on Long video understanding is being featured by @GoogleAI !

Google AI

@GoogleAI

12 Nov 2024

Can #AI truly understand long videos? Tobias Weyand & the Google Research team are testing the limits w/ Neptune, an open-source benchmark for long video understanding. Dive into the details & see how AI tackles temporal reasoning, cause & effect, & more →goo.gle/4esTTNM

3:15

844

Tobias Weyand

Tobias Weyand @0xtob

23 Sep 2024

The other day I let my kids talk to Gemini live. Today my 3 year old asked my 6 year old: "Can you tell me a joke?" - 6 year old: "Sorry, I'm just a language model."

405

Tobias Weyand

Tobias Weyand @0xtob

16 Sep 2024

Excited to share what our team has been working on! With expanding context lengths, frontier models are able to process longer and longer videos. But how well do they really understand them? Today we release Neptune, a challenging benchmark for long video understanding.

Google AI

@GoogleAI

16 Sep 2024

Datasets for evaluation of long video understanding are rare. So with this in mind, today we describe Neptune, an open-source evaluation dataset that includes multiple-choice and open-ended questions for videos of variable lengths up to 15 minutes. More →goo.gle/3B41nZV

ALT The Neptune data pipeline.

4,045

Tobias Weyand

Tobias Weyand @0xtob

20 Aug 2024

New long video understanding benchmark from my colleagues @GoogleDeepMind pushing LLMs to their limits!

Dima Damen @CVPR @dimadamen

20 Aug 2024

Can current LLMs solve video reasoning Qs like: Over 1-hour, when does the camera holder go down stairs... ?? Watch the teaser... Can you distinguish up/down stairs - p.s. stairs are not visible when you go down any youtu.be/Ddgvr4OReL4 Hour-Long PerceptionTest VQA @eccvconf

420

Google AI

Tobias Weyand retweeted

Google AI

@GoogleAI

23 Jul 2024

Congratulations to the authors of "VideoPoet: A Large Language Model for Zero-Shot Video Generation" for winning one of this year's @icmlconf Best Paper Awards! #ICML2024 Paper: openreview.net/forum?id=LRkJ… Blog post: goo.gle/4atanoj

264

52,087

Tobias Weyand

Tobias Weyand @0xtob

10 Jul 2024

The @DeutschesMuseum Bonn has an extremely good exhibition on AI that's great for both kids and adults. My favorite is a giant display showing the activations of a 16 layer CNN trained on ImageNet in real time as you show it animal figurines. deutsches-museum.de/bonn/aus…

306

Tobias Weyand

Tobias Weyand @0xtob

10 Jul 2024

Another fun one: A Gradient Descent arcade machine that teaches the basic concept in a gamified way.