Staff Research Scientist, NVR TW @NVIDIAAI @NVIDIA (Project Lead: DoRA, EoRA, 4D-RGPT) | Ph.D. @GeorgiaTech | Multimodal AI | github.com/cmhungsteve

Joined July 2011
84 Photos and videos
Min-Hung (Steve) Chen retweeted
SpatialClaw NVIDIA drops a training-free spatial reasoning agent that uses code as its action interface. A VLM writes Python in a persistent kernel, composes perception tools, inspects results, and revises its planโ€”no fine-tuning needed. 11.2 points over prior agents on 20 benchmarks.
4
10
65
3,897
Min-Hung (Steve) Chen retweeted
We are presenting Fast-ThinkAct at 10:45am this morning at Poster #469. Welcome to drop by and discuss!
๐Ÿš€ Excited to share that our paper Fast-ThinkAct has been accepted to #CVPR2026! ๐ŸŽ‰ Efficient Vision-Language-Action reasoning via verbalizable latent planning โ€” enabling embodied agents to think fast internally without lengthy textual reasoning. โšก Achieves 9.3ร— faster inference (89% latency reduction) than ThinkAct-7B โ€” bringing Reasoning VLA closer to real-time robotic control. ๐Ÿ“„ arxiv.org/abs/2601.09708 ๐ŸŽฅ jasper0314-huang.github.io/fโ€ฆ ๐Ÿ™Œ Huge congrats to @chipinhxyz, @yunzeman, @ZhidingYu, @CMHungSteven, @jankautz, Yu-Chiang Frank Wang, @FuEnYang1 #EmbodiedAI #PhysicalAI #VLA #Robotics #NVIDIAResearch @NVIDIAAI @NVIDIARobotics
2
8
1,673
๐Ÿš€ 4D-RGPT is a #CVPR2026 Highlight from @NVIDIA! ๐ŸŒŒ Amid #Cosmos3 #PhysicalAI momentum, we tackle: ๐ŸŽฅ region-level 4D video understanding ๐ŸŽฏ regions ๐Ÿ“ depth ๐ŸŒ€ motion โฑ๏ธ time ๐Ÿ–ผ๏ธ Main poster 5 workshops in Denver ๐Ÿ“Jun 7, 11:45โ€“1:45, ExHall F #225 ๐Ÿ“ฆ Code, Model weights & R4D-Bench are out ๐Ÿ‘‡ @CVPR @NVIDIAAI
2
4
56
3,664
๐Ÿ™Œ Huge thanks again to our amazing team: @cajoeyang, @RHachiuma, @Sifei30488L, @subhashree_r, @RaymondYeh, Yu-Chiang Frank Wang ๐Ÿงต Want more technical details? Check our earlier 4D-RGPT threads: 1) x.com/CMHungSteven/status/20โ€ฆ 2) x.com/CMHungSteven/status/20โ€ฆ This update focuses on new releases #CVPR2026 poster/workshops.

๐Ÿšจ Now live as CVPR 2026 Highlight!!! R4D-Bench dataset official NVlabs GitHub just dropped. 1K region-level 4D VQA pairs ready to download today! Thread links๐Ÿ‘‡ #CVPR2026 #4DRGPT #R4DBench
1
1
353
๐Ÿ™ Thanks @HuggingPapers and @askalphaxiv for sharing 4D-RGPT earlier! alphaXiv: x.com/askalphaxiv/status/200โ€ฆ DailyPapers: x.com/HuggingPapers/status/2โ€ฆ

NVIDIA just released 4D-RGPT on Hugging Face A CVPR 2026 Highlight model for region-level 4D video understanding. It learns depth and motion from an expert at training time โ€” with no extra cost at inference.
92
Min-Hung (Steve) Chen retweeted
As NVIDIA pushed its first ever open weights autopilot model alpamayo-R1 A key frontier is native 4D understanding: not just โ€œdescribe the videoโ€ but reason about depth motion 3D interactions over time, even for a specific region. This paper introduces 4D-RGPT (also from NVIDIA!), which tackles this by distilling 4D perception (depth motion cues) from a frozen expert into an MLLM during training, using both latent-feature and explicit-signal distillation, plus timestamp positional encodings, with training-only modules (so no extra inference cost). On a new benchmark they developed for region-level undestanding, R4D-Bench, this approach achieves 4.3% across the bench! And tops non-region based 3D & 4D benchmarks by 5.3%
4
10
58
3,293
Min-Hung (Steve) Chen retweeted
NVIDIA just released 4D-RGPT on Hugging Face A CVPR 2026 Highlight model for region-level 4D video understanding. It learns depth and motion from an expert at training time โ€” with no extra cost at inference.
4
13
49
3,029
T4V @CVPR is starting!!
What comes after todayโ€™s visual backbones? At T4V @CVPR 2026, weโ€™re bringing the community together for a focused half-day workshop on Transformers for Vision and Multimodal AI โ€” covering image, video, 3D, MLLMs, efficient attention, SSMs/Mamba, and the next generation of visual architectures. ๐Ÿ“ Wed June 3 ยท Room 607 ๐Ÿ• 1:45โ€“5:40 pm (Denver local time) Invited speakers: @RanjayKrishna, @thoma_gu, @sherryyangML, @jcniebles, @liuzhuang1234, and @TongPetersb. Join us at CVPR: sites.google.com/view/t4v-cvโ€ฆ @UNC @NVIDIAAI @NVIDIAAIDev @AIatMeta @ImagineEnpc @NJU1902 @BaskinEng #CVPR2026 #T4V #MultimodalAI #NVIDIA
2
13
1,984
Min-Hung (Steve) Chen retweeted
Come to the T4V Workshop next week at CVPR for the latest developments in Vision Transformers and Multimodal AI!
What comes after todayโ€™s visual backbones? At T4V @CVPR 2026, weโ€™re bringing the community together for a focused half-day workshop on Transformers for Vision and Multimodal AI โ€” covering image, video, 3D, MLLMs, efficient attention, SSMs/Mamba, and the next generation of visual architectures. ๐Ÿ“ Wed June 3 ยท Room 607 ๐Ÿ• 1:45โ€“5:40 pm (Denver local time) Invited speakers: @RanjayKrishna, @thoma_gu, @sherryyangML, @jcniebles, @liuzhuang1234, and @TongPetersb. Join us at CVPR: sites.google.com/view/t4v-cvโ€ฆ @UNC @NVIDIAAI @NVIDIAAIDev @AIatMeta @ImagineEnpc @NJU1902 @BaskinEng #CVPR2026 #T4V #MultimodalAI #NVIDIA
2
8
1,411
T4V @CVPR starts today๐Ÿš€๐Ÿš€ 1:45โ€“5:40pm, Room 607 Join talks/discussion on Transformers for image, video, 3D, MLLMs, efficient attention & SSMs/Mamba
What comes after todayโ€™s visual backbones? At T4V @CVPR 2026, weโ€™re bringing the community together for a focused half-day workshop on Transformers for Vision and Multimodal AI โ€” covering image, video, 3D, MLLMs, efficient attention, SSMs/Mamba, and the next generation of visual architectures. ๐Ÿ“ Wed June 3 ยท Room 607 ๐Ÿ• 1:45โ€“5:40 pm (Denver local time) Invited speakers: @RanjayKrishna, @thoma_gu, @sherryyangML, @jcniebles, @liuzhuang1234, and @TongPetersb. Join us at CVPR: sites.google.com/view/t4v-cvโ€ฆ @UNC @NVIDIAAI @NVIDIAAIDev @AIatMeta @ImagineEnpc @NJU1902 @BaskinEng #CVPR2026 #T4V #MultimodalAI #NVIDIA
6
451
Min-Hung (Steve) Chen retweeted
What comes after todayโ€™s visual backbones? At T4V @CVPR 2026, weโ€™re bringing the community together for a focused half-day workshop on Transformers for Vision and Multimodal AI โ€” covering image, video, 3D, MLLMs, efficient attention, SSMs/Mamba, and the next generation of visual architectures. ๐Ÿ“ Wed June 3 ยท Room 607 ๐Ÿ• 1:45โ€“5:40 pm (Denver local time) Invited speakers: @RanjayKrishna, @thoma_gu, @sherryyangML, @jcniebles, @liuzhuang1234, and @TongPetersb. Join us at CVPR: sites.google.com/view/t4v-cvโ€ฆ @UNC @NVIDIAAI @NVIDIAAIDev @AIatMeta @ImagineEnpc @NJU1902 @BaskinEng #CVPR2026 #T4V #MultimodalAI #NVIDIA
12
41
6,926
Excited that our co-authored papers V2V-LLM and V2V-GoT were accepted to #ICRA2026! @HsukuangChiu led the work and will present both papers next week in Vienna. Please check his original post for papers/code/datasets and presentation details. #Robotics #AutonomousDriving
We will be presenting V2V-Got and V2V-LLM at the ๐Ÿ‡ฆ๐Ÿ‡น #ICRA2026 conference and the ๐Ÿ‡บ๐Ÿ‡ธ #CVPR2026 workshop next week! ๐Ÿš—๐Ÿค–๐Ÿš— We explore the intersection of Cooperative Autonomous Driving and Multimodal LLMs enabling vehicles to perform human-like physical reasoning. โฌ‡๏ธThreads
2
1
13
968
๐ŸŒ Project Page: eddyhkchiu.github.io/v2vgot.โ€ฆ ๐Ÿ“„ V2V-LLM paper: arxiv.org/abs/2502.09980 ๐Ÿ“„ V2V-GoT paper: arxiv.org/abs/2509.18053 ๐Ÿ’ป Code: github.com/eddyhkchiu/V2V-Goโ€ฆ ๐Ÿค— Hugging Face Dataset & Model: huggingface.co/datasets/eddyโ€ฆ

199
Min-Hung (Steve) Chen retweeted
We will be presenting V2V-Got and V2V-LLM at the ๐Ÿ‡ฆ๐Ÿ‡น #ICRA2026 conference and the ๐Ÿ‡บ๐Ÿ‡ธ #CVPR2026 workshop next week! ๐Ÿš—๐Ÿค–๐Ÿš— We explore the intersection of Cooperative Autonomous Driving and Multimodal LLMs enabling vehicles to perform human-like physical reasoning. โฌ‡๏ธThreads
2
8
24
3,617
Min-Hung (Steve) Chen retweeted
Thank you for all the interest in illoca Tracing Paper. The previous video was about why we built it. This one shows what architects can do with it: Sketch. Mark up. Describe ideas in text, or show them with references. Turn intent into editable 2D and 3D designs, so architects can spend less time rebuilding and more time designing.
6
10
90
5,662
Min-Hung (Steve) Chen retweeted
EoRA is a great and useful code & work done by @nbasyl_tw, @CMHungSteven and the @nvidia teams. (CUDA kernel supported)
๐Ÿš€ #ICLR2026 workshops are running Apr 26โ€“27! ๐ŸŽ‰ Presenting EoRA tomorrow (Apr 27) at ICBINB TTU! ๐Ÿง  Full thread slides one-line GPTQModel integration ๐Ÿ‘‡ If useful, please: โ†’ Cite: arxiv.org/abs/2410.21271 โ†’ Star: github.com/NVlabs/EoRA See you at the workshops! ๐Ÿ™ @iclr_conf @ICBINBWorkshop @NVIDIAAI @NVIDIAAIDev #ICLR #ICLR2026Workshop #MachineLearning
1
1
8
848