building agents and poses @nvidia | Robograd @CMU_Robotics | Investing in startups

Joined June 2009
75 Photos and videos
Pinned Tweet
Check the full suite for full mocap to robotics pretraining . SOMA has anatomically correct joint definitions and has much detailed mesh key points compared to MHR/SMPL. Foundational for all bodypose downstream tasks. More on this soon on its capabilities.
#NVIDIA just released a whole ecosystem for human(oid) motion and robot learning from human data. 🚀🦾 Data, as we all know, is the key to scaling AI models. To accelerate the field of Embodied AI, we have open-sourced a full stack of models and tools to capture, generate, retarget, and simulate human(oid) motion data at scale, along with a massive high-quality dataset and a standard human skeletal representation, SOMA, to make them all seamlessly communicate with each other. The entire suite is available under the Apache 2.0 license. 1️⃣ SOMA: A universal interface to unify all parametric human body models (SOMA-shape, SMPL, MHR, etc.) into a standard skeletal representation, eliminating the need for custom adapters or model-specific retargeting. 🔗 lnkd.in/gsxhiJnn 2️⃣ Kimodo: High-fidelity, controllable text-to-motion generation for both humans and humanoid robots. 🔗 lnkd.in/gCc84XnX 3️⃣ GEM: A global human pose estimation method from in-the-wild videos, natively compatible with SOMA. 🔗 lnkd.in/g_QAvRjn 4️⃣ Bones-SEED: A massive dataset of 150k motions in SOMA format, including data already retargeted for the Unitree G1, created with our partners at Bones Studio. 🔗 lnkd.in/gfx-QD-w 🔗 lnkd.in/gyNdTwQx 5️⃣ SOMA Retargeter: A dedicated tool for seamless motion retargeting from the SOMA skeleton to the Unitree G1. 🔗 lnkd.in/gqz9Na-H 6️⃣ ProtoMotions: Our high-performance simulation framework for training digital human(oid)s via RL, now with native SOMA support. 🔗 lnkd.in/gmvMikMU This is just the beginning, and we have much more in the pipeline. Excited to see what the community builds next! #NVIDIA #GTC #GTC2026 #Robotics #EmbodiedAI #PhysicalAI @NVIDIAAI
1
1
9
1,025
Harsh retweeted
That is what they told me
465
474
15,268
247,779
Harsh retweeted
Looking forward to taking our exciting partnership with Nvidia to the next-level
Jun 12
Huge congratulations to the @SpaceX team on a historic IPO debut. Fueling the next frontier of space and AI. 🌌 NVIDIA's partnership with SpaceX spans nearly a decade, from hand-delivering the world's first #NVIDIADGX-1 supercomputer in 2016 to the custom DGX Spark handoff at Starbase. Together, we've been pushing the boundaries of accelerated computing to help power the future of space exploration.
6,375
23,354
276,946
36,852,522
Harsh retweeted
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use. Its capabilities exceed those of any model we’ve ever made generally available.
4,989
14,521
104,634
55,697,181
Harsh retweeted
Actually, both of these two NVIDIA live demos at CVPR are powered by flashdreams!
World models are moving beyond offline generation towards interactive, real-time experiences. Introducing ⚡FlashDreams⚡: an open-source high-performance inference and serving library built for autoregressive world models: 🔥 Up to 3.10× faster LingBot-World inference 🔥 Up to 2.12× faster Self-Forcing inference 🔥 Up to 1.40× faster Wan2.1 inference 🔥 8 integrated models 🔥 Multi-GPU, streaming, low-latency serving 🔥 Agentic skills that teach you how to use it FlashDreams is designed for a new generation of AI systems that continuously evolve over time while responding to user interactions. It powers applications across robotics, autonomous vehicle simulation, gaming, and virtual worlds. Github: github.com/NVIDIA/flashdream… Docs: nvidia.github.io/flashdreams Research page: research.nvidia.com/labs/sil… Join the #flashdreams Discord channel at discord.gg/yTdHDqFP FlashDreams is also the runtime backbone behind NVIDIA OmniDreams (github.com/nv-tlabs/omni-dre…) 1/n #AI #WorldModels #FastInference #PhysicalAI #OpenSource #NVIDIA
3
11
113
19,639
📣 @SKhynix and @NVIDIA announce a multiyear technology partnership to codevelop next-generation memory for the global AI factory buildout. SK hynix will codevelop memory for NVIDIA's platforms — from NVIDIA Vera Rubin to Jetson Thor — while advancing fab digital twins using @NVIDIAOmniverse libraries and applying NVIDIA CUDA-X and PhysicsNeMo to accelerate semiconductor design and manufacturing. Read the press release: nvda.ws/4e43e0p
66
239
1,949
259,519
How it started vs ended You can easily see who plays on a pixel screen vs otb
1
57
Harsh retweeted
Jun 7
American Open Source is so back. 9 / 30 of the models on page 1 of Huggingface are published by Nvidia.
31
31
452
102,993
Harsh retweeted
Jun 3
This week at #CVPR2026, NVIDIA Research is presenting three papers across physical ai that offer groundbreaking solutions for training at scale across diverse applications: → GraspGen-X: the first foundation model for zero-shot grasping, trained on billions of simulated grasps → LCDrive: a model that replaces expensive text-based reasoning with compact latent representations → NitroGen: a generalized gameplay AI foundation model that harnesses NVIDIA Isaac GR00T to help train embodied agents Learn more: nvda.ws/4ubwjgk
17
45
267
42,745
Harsh retweeted
NitroGen just won CVPR Best Paper Honorable Mention!! We are making strides towards general-purpose embodied agents that master not only the real world physics, but also all possible physics across a multiverse of simulations. It’s been 4 years since MineDojo, our first embodied agent in Minecraft, won NeurIPS Best Paper. Congrats to everyone on the team!!
58
47
382
37,269
Harsh retweeted
GRAIL addresses the holy grail of robotics. Humanoid-Object Interaction Data! Releasing a large-scale humanoid-object interaction data (22k motions), code to generate more, and all the models. #NVIDIA #HumanoidRobotics #EmbodiedAI
Humanoid robotics is hitting a data wall. Teleop and mocap took us far, but they don’t scale to every object, terrain, and behavior. We’re releasing GRAIL: research.nvidia.com/labs/dai… — a fully digital pipeline for generating loco-manipulation data before the robot moves. 🧵(1/8)
1
7
31
3,866
Harsh retweeted
We’re going all in on World Models. Today we’re launching the 1X World Model Lab. The bet is simple: You can’t fine-tune your way to AGI. And you definitely can’t fine-tune your way to robots that can operate in the physical world. General-purpose humanoids need models that understand space, motion, objects, causality, affordances, physics, and action before they ever see a specific task. The frontier is not better VLA wrappers. The frontier is embodied world models. The 1X World Model Lab will focus on large-scale embodied world model pretraining: building the most generalizable foundation model for humanoid robots from the ground up. The next frontier in AI requires scaling: web-scale media egocentric human videos sim dexterous remote operated robot data on-policy NEO data → real-world deployment for robot data collection and RL → abundance of data → physical AI The robot collects data. The model gets better. The robot gets better. Repeat. To lead this, we brought in one of the best for the mission: @_sam_sinha_ , as Head of World Models. Sam was a founding research scientist at Luma AI and has been at the frontier of scaling multimodal generative video models his whole career. If you’re the best in the world at large-scale pretraining, video models, robotics, RL, infra, or data — and you want your models to move atoms, not just pixels — join us. Send background evidence of exceptional ability to: wmlab@1x.tech We’re building the model that makes autonomous labor real.
127
214
2,552
352,093
Harsh retweeted
May 28
Attention, all you geniuses with products, but no marketing skills. Today we’re launching the Founder Starter Kit—4 skills that will help you look and sound like a legit company, including: > Build-a-Brand > App Screens > Product Sizzle > Founder Video Available for Claude via the Pika MCP.
120
164
2,006
1,520,457
Harsh retweeted
🚀 4D-RGPT is a #CVPR2026 Highlight from @NVIDIA! 🌌 Amid #Cosmos3 #PhysicalAI momentum, we tackle: 🎥 region-level 4D video understanding 🎯 regions 📏 depth 🌀 motion ⏱️ time 🖼️ Main poster 5 workshops in Denver 📍Jun 7, 11:45–1:45, ExHall F #225 📦 Code, Model weights & R4D-Bench are out 👇 @CVPR @NVIDIAAI
2
4
56
3,664
Harsh retweeted
From a young age, I have always wanted to be the exit liquidity for shareholders of artificial intelligence companies
123
1,510
19,466
468,929
Harsh retweeted
buying into the anthropic IPO at $1T valuation would obviously be an incredible deal, 22x multiple on ARR, huge room to grow, countless markets untapped, mythos as of yet unmonetized. kind of thing people dump whole retirement portfolios into. which is why it'll be $3T
Anthropic has confidentially submitted a draft S-1 registration statement to the Securities and Exchange Commission. Pending completion of SEC review, this gives us the option to pursue an initial public offering. Read more: anthropic.com/news/confident…
33
10
1,164
127,337
Harsh retweeted
🎉 We added 2 SOTA WAMs to the RoboLab Leaderboard 🎉 Current leaders on RoboLab-120 (specific instr.): 🥇Cosmos3-Nano-Policy (39.7%) 🥈π0.5 (28.1%) 🥉DreamZero (28.1%) → See full results at: research.nvidia.com/labs/srl… → All policy clients available at: github.com/NVlabs/RoboLab/
7
21
127
30,572
Harsh retweeted
I’m excited to share what our team has been building at @NVIDIAAI since I joined: Cosmos 3, an omnimodal world model for Physical AI. Project: research.nvidia.com/labs/cos… HF: huggingface.co/collections/n… Code: github.com/NVIDIA/cosmos
4
17
158
12,246
Harsh retweeted
Introducing NVIDIA Cosmos 3 We released NVIDIA Cosmos 3 last night. And today, seeing it take the top spots across 8 open model leaderboards feels surreal. We spent months working towards this moment. Here’s the breakdown: The Leaderboard Wins World Reasoning 🏆 #1 open model on VANTAGE-Bench for vision AI 🏆 #1 overall on Traffic Anomaly Reasoning (TAR) World Generation 🏆 #1 open model on Artificial Analysis Image-to-Video leaderboard 🏆 #1 open model on Artificial Analysis Text-to-Image leaderboard 🏆 #1 open model on PAI-Bench for physical AI synthetic data generation 🏆 #1 open model on Physics-IQ, which measures accuracy on physical laws 🏆 #1 open model on R-Bench for world generation quality World Action 🏆 #1 on RoboArena for specialized policy 🏆 #1 on RoboLab for action generation But the leaderboards are only part of the story. The real story is why we built Cosmos 3 in the first place. The Problem Training robots and autonomous systems in the real world is painfully hard. Robots need to try the same thing numerous times before they succeed reliably. Self-driving cars need rare edge cases that may never happen naturally. Smart machines need to understand physics, motion, contact, failure, and surprise. And real-world data is slow, expensive, and sometimes dangerous to collect. At some point, the answer cannot just be “collect more data.” You can’t collect your way out of an infinite physical world. You have to generate it. That… was the question behind Cosmos: Can one model understand the physical world deeply enough to reason about it, simulate it, and generate actions inside it? What We Built Cosmos 3 is the first omni-model for physical AI. It can understand and generate across: language · images · video · audio · action sequences It is not just a VLM. Not just a video generator. Not just a robot policy model. It is all of them, in one single model. That matters because physical AI has been fragmented for a long time. Cosmos 3 is our attempt to collapse that fragmentation. Depending on how you configure the inputs and outputs, the same model can act as a vision-language model, a video/world generator, a world simulator, or a world-action model. No separate architecture required. The Architecture Under the hood, Cosmos 3 uses a dual-tower Mixture-of-Transformers architecture. One tower is autoregressive for reasoning. It handles next-token prediction for language and discrete understanding. The other tower is diffusion-based- for generation. It denoises images, video, audio, and action trajectories. Two towers. Dual-stream joint attention. One shared world representation. Each modality gets its own tools: visual encoders, video VAEs, audio VAEs, and action projectors that can map different embodiments into a unified action space. Action is a first-class modality in Cosmos 3. That’s what makes it more than a video model. It doesn’t just predict and generate what the world might look like. It can connect reasoning and world modeling to physically grounded action. Why This Matters One of the most interesting findings from the ablation work is that training action domains together creates positive transfer. That means adding more embodiments does not just add more use cases. It can actually make the model better. This is the heart of why omnimodal training matters. A shared world representation is not just convenient. It can make each individual task stronger. That’s the part that feels like the beginning of something much bigger. The part I’m most excited about is that Cosmos 3 is fully open. Developers get the models, scripts, optimization, inference endpoints, post-training recipes, datasets, and benchmarks. Everything is available under the Linux Foundation’s OpenMDW 1.1 License. You can use Cosmos 3 out of the box. You can use the VLM, world model, or world-action pieces separately. You can post-train it for your own domain, embodiment, or accuracy target. That’s what makes this feel different. Cosmos 3 is not just a model release. It is the foundation for building intelligence for autonomous machines. For me, Cosmos 3 feels like a step toward a world where physical AI development becomes much more scalable and accessible - to a new age of developers and agents. That’s what we built Cosmos 3 for. I cannot wait to see what you build with it. Download Models on Hugging Face huggingface.co/collections/n… Customize Models on GitHub github.com/NVIDIA/cosmos Read the Tech Blog to Learn More developer.nvidia.com/blog/de…
20
68
450
65,049
Harsh retweeted
It all starts with the @NVIDIARTXSpark Superchip. RTX Spark reinvents the personal computer for agents, creating and gaming. Learn more → nvidia.com/en-us/products/rt…
44
136
1,150
245,166