Joined July 2015
17 Photos and videos
Boqing Gong retweeted
We want to speak directly to the concern many of you have expressed, and we owe you a clear explanation of what happened, why it happened, and where we stand now. We understand this situation caused genuine alarm and we take that seriously. In preparing the NeurIPS 2026 handbook, we included a link to a US government sanctions tool that covers a significantly broader set of restrictions than those NeurIPS is actually required to follow. This error was due to miscommunication between the NeurIPS Foundation and our legal team; there was never an intention to restrict participation beyond our mandatory compliance obligations. The responsibility for that error is ours as an organization, and we deeply apologize for the alarm and impact this miscommunication had on our community. We have updated the link and clarified the text of our policy, which is consistent with that of ACM and IEEE, as well as other international conferences and NeurIPS in the past. As in previous years, NeurIPS welcomes submissions from all compliant institutions and individuals. We want to reiterate that NeurIPS is a community-driven event, created by and for the community, and strives to be inclusive. The NeurIPS 2026 organizing committee was particularly saddened to learn of this institutional miscommunication. The organizing committee has taken on the responsibility of running the conference this year with the goal of fostering open communication, knowledge sharing, and global scientific discourse. We thank the community for bringing this issue to our attention and working with us through this situation.
264
127
503
498,031
Boqing Gong retweeted
🚨 New #CVPR2026 collaboration with Google DeepMind --> Ego2Web bridges egocentric video perception and web execution, enabling agents that see the first-person real-world video of the user’s surroundings, and take actions on the web grounded in the egocentric video: ▪️ Introduces a task where agents must ground egocentric video (first-person view) into concrete web actions (requires visual grounding → entity extraction → planning → real website execution). ▪️Covers realistic cross-domain tasks e.g., e-commerce (find/buy items you saw), media retrieval (find related videos), knowledge lookup (identify & query entities), maps/local (locate places from visual cues). ▪️Proposes Ego2WebJudge to automatically evaluate whether web agent results are correctly grounded in the video context. ▪️Reveals concrete failure modes across 6 strong agents (GPT-5.4, Claude, Gemini-based agents, etc.): weak visual grounding, brittle cross-modal reasoning, and planning breakdowns (only ~58% success rate). Details 👇👇
Introducing Ego2Web from Google DeepMind and UNC Chapel Hill, accepted to #CVPR2026. AI agents can browse the web. But can they act based on what you see? Existing benchmarks focus only on web interaction while ignoring the real world. Ego2Web bridges egocentric video perception and web execution, enabling agents that can see through first-person video, understand real-world context, and take actions on the web grounded in the egocentric video. This opens a path toward AI assistants that operate seamlessly across physical and digital environments. We hope Ego2Web serves as an important step for building more capable, perception-driven agents. 🧵👇
10
37
6,562
Boqing Gong retweeted
Mar 25
Ego2Web A Web Agent Benchmark Grounded in Egocentric Videos paper: huggingface.co/papers/2603.2…
5
12
42
17,033
Bye, @NeurIPSConf , for now.
NeurIPS is aware of the community's concerns regarding the list of sanctions. NeurIPS is an inclusive community focused on free scientific discourse. We deeply value the research that comes from everyone in our community. The present concerns are not about science or academic freedom. They are about legal requirements that apply to the NeurIPS Foundation, which is responsible for complying with sanctions. We are actively consulting legal counsel to fully understand the legal constraints and we will update the NeurIPS community as soon as we have reliable guidance from our lawyers.
3
5
242
32,996
Boqing Gong retweeted
Introducing Ego2Web from Google DeepMind and UNC Chapel Hill, accepted to #CVPR2026. AI agents can browse the web. But can they act based on what you see? Existing benchmarks focus only on web interaction while ignoring the real world. Ego2Web bridges egocentric video perception and web execution, enabling agents that can see through first-person video, understand real-world context, and take actions on the web grounded in the egocentric video. This opens a path toward AI assistants that operate seamlessly across physical and digital environments. We hope Ego2Web serves as an important step for building more capable, perception-driven agents. 🧵👇
10
45
139
44,297
Overheard a top physician was worried about his lab. He felt his past successes were largely because his math was a little better than peers. AI is making that advantage less relevant…
2
415
“Do you have a spouse? Is it female?” Asked out of total sincerity; Turned out they just wanted to say something about teenage girls and wanted me to fact-check with someone/my wife. I was amused and amazed; Respect the extreme level of caution.
1
318
An old friend surprised to find my move to BU: “Why academia? Come back to industry to make money!” It reminded me how lucky i’ve been to have a capable considerate supportive amazing wife. Yet, family support is not enough; AI/CS academics need funding supports from industry/etc
Our 1.5-yr-old lab at Boston U has got its very first tradition: Celebrating paper submissions rather than paper acceptance! Here is the cake for eccv submissions. Yum :-)
1
1
15
2,847
… and here is the cake for undergrad MS researchers who’ll leave our lab for PhD training! Amazing offers from Stanford, Princeton, Rutgers, UCSD, and Columbia.
1
87
5,968
Our 1.5-yr-old lab at Boston U has got its very first tradition: Celebrating paper submissions rather than paper acceptance! Here is the cake for eccv submissions. Yum :-)
1
3
75
6,061
Boqing Gong retweeted
I'm recruiting multiple PhD students this cycle to join me at Harvard University and the Kempner Institute! My interests span vision and intelligence, including 3D/4D, active perception, memory, representation learning, and anything you're excited to explore! Deadline: Dec 15th.
24
151
920
175,967
Boqing Gong retweeted
Thanks for the support Yann! Indeed, our Segment Anything team is still doing open source research, even though we are not in FAIR anymore! We just open sourced SAM 3 and SAM 3D! @nikhilaravi @cfeichtenhofer @PengchuanZ
27 Nov 2025
SAM 2 & 3 are very cool. I wish I could claim credit for any of it, but I can't. My contribution to it has been minimal and very indirect, beside moral support. The Perception team that produces SAM was actually moved from FAIR to the product division of MSL several months ago.
5
4
98
23,490
Boqing Gong retweeted
18 Nov 2025
Introducing AI Marketing Agent 1.0 – the first AI agent that actually executes your marketing. Multi-channel campaign drafted. SEO content that ranks. Daily analytics briefs with next steps. Social listening & engagement support. ...and 50 core workflows translated from real human expertise. It's Claude Code... but for marketing. 🧵
8
15
24
1,746
Reviewers Needed for top journals (PAMI, IJCV, TMLR, etc.)! Qualified & interested? Self−nominate via forms.gle/kbNBUWPLbncPZrRS7 (I appreciate you forwarding this to your **qualified** academic friends)
1
4
918
10 Sep 2025
Honoring the memory of Prof. Margrit Betke, an amazing mentor, a true leader, and a kind colleague. Please share your thoughts in this doc to be passed along to Margrit's family: lnkd.in/gmbbHYzF

If you have a fond memory or a note that you'd like to share about Margrit Betke, please add it to this document before 9/20: lnkd.in/gmbbHYzF We wish to share it in some form with her family during her memorial on Sept 20th!
3
607
Boqing Gong retweeted
If you have a fond memory or a note that you'd like to share about Margrit Betke, please add it to this document before 9/20: lnkd.in/gmbbHYzF We wish to share it in some form with her family during her memorial on Sept 20th!

Margrit Betke, a @BU_CAS professor of computer science, is being remembered as a brilliant BU scholar and a devoted mentor by colleagues and current and former students. She passed away on August 13. Read more ➡️ spr.ly/6019fIsSN
1
3
1,783
20 Jun 2025
Digital twin of (the future of) our physical world?
20 Jun 2025
World Simulator, reimagined — now alive with humans, robots, and their vibrant society unfolding in 3D real-world geospatial scenes across the globe! 🚀 One day soon, humans and robots will co-exist in the same world. To prepare, we must address: 1️⃣ How can robots cooperate or compete intelligently? 2️⃣ How do humans build social bonds and communities? 3️⃣ How can both co-exist in an open, dynamic world? Announcing Virtual Community Project — a social-physical world simulator, where human characters and robotic agents can interact, grow, and co-evolve within open-world societies, stretching from London to New York, and beyond! Key features include: ✅ Unified multi-agent physics simulations for rich social physical interactions of humans and robots ✅ Massive auto-generated 3D scenes grounded with the rea-world geospatial data ✅ Agent communities populated by robots and LLM-driven human characters with rich appearances, personalities, and social ties. 🌍 Enter our Virtual Community, an open world to study embodied AI at scale— one social-physical world model at a time! 🔗 Project: virtual-community-ai.github.… 💻 Code: github.com/UMass-Embodied-AG… Paper: virtual-community-ai.github.… 1/n
2
11
3,118
10 Jun 2025
Join us if you are at CVPR and can get up early. :-) I'm giving a talk, "BabyVLM: Democratizing Pretraining of Vision Large Language Models" tomorrow (Wednesday). * 9:30AM. * Room 101B. * Computer Vision in the Wild Workshop
1
18
1,218
10 Jun 2025
Excited! VideoPrism-Base/Large are publicly available now: github.com/google-deepmind/v… Check it out if you need a versatile video encoder for video-language or video-native tasks. Feedback appreciated!
25 Mar 2024
Introducing VideoPrism, a single model for general-purpose video understanding that can handle a wide range of tasks, including classification, localization, retrieval, captioning and question answering. Learn how it works at goo.gle/49ltEXW

ALT VideoPrism is a general-purpose video encoder that enables state-of-the-art results over a wide spectrum of video understanding tasks, including classification, localization, retrieval, captioning, and question answering, by producing video representations from a single frozen model.

3
21
2,277
Boqing Gong retweeted
🚀 Excited to announce our 4th Workshop on Computer Vision in the Wild (CVinW) at @CVPR 2025! 🔗 computer-vision-in-the-wild.… ⭐We have invinted a great lineup of speakers: Prof. Kaiming He, Prof. @BoqingGo, Prof. @CordeliaSchmid, Prof. @RanjayKrishna, Prof. @sainingxie, Prof. @YunzhuLiYZ, Prof. @furongh to talk about the exciting researches to bring vision to the wild! 🌎Join top researchers tackling real-world vision challenges — from dynamic environments to embodied agents! See you all at #CVPR2025! #CVPR2025 #ComputerVision #AI
1
23
102
27,940