building @hebbyrobotics; prev Robotics PhD @Columbia @NTUsg @join_ef

Joined May 2018
31 Photos and videos
Pinned Tweet
To understand what it takes to build a humanoid robot with model-based control, we finetuned @physical_int 's (PI) Pi05 model for our custom use case and environment. We incurred ~$10K in hardware costs, compared to the typical ~$20K set up (DROID/ALOHA). Here are the lessons and challenges we faced building the first working prototype (shown in the video) in 3 months. Part 1: Hardware, Software, Model Selection, Custom Embodiment, Inference, Embedded Hardware, Hierarchical Planner Part 2: Model Evaluation, Data Collection, Model Training, Simulation and Teleoperation We hope sharing our experience accelerates the learning of others who are in a similar starting point.
3
11
114
18,152
Brandon retweeted
NVIDIA showcased Newton at GTC again earlier this year. If you are familiar with MuJoCo or Isaac Sim, Newton is a new open-source, GPU-accelerated physics engine aimed at robotics and contact-rich manipulation, built with DeepMind and Disney Research. It also runs as a physics backend inside the Isaac ecosystem. I finally had time to dig through the examples, and the RJ45 plug simulation (example_contacts_rj45_plug.py) caught my attention. The latch deflects and clicks, and the cable behaves like a real 1D deformable. We've been working on cables, connectors, and other contact-heavy interactions, so seeing this as a first-class example was a pleasant surprise. I extended it to test insertion and removal, with and without pressing the latch, and visualized the result in Rerun. Pull without pressing the tab, and the latch holds the plug in place; press it, and the plug comes free. It also interested me how code-first the examples feel. We've spent real time trying to drive both MuJoCo and Isaac Sim with coding agents, and the recurring friction is being able to reason spatially to modify the simulation setup. It feels like there's real potential here. I'm curious if Newton might be better suited for agent-generated robotics simulations. If you're looking for something like this too, Newton is definitely worth a try.
3
13
74
6,790
Brandon retweeted
Jun 11
“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. @castformai bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇
222
233
2,525
416,273
WAIRED killed it with this one, and where did you say I could get my own Casty?
Jun 11
“don’t train your own model” is common ai advice. it's wrong. your token bill's the proof. today, we’re excited to launch castform into open preview. castform is the easiest way for you to train your own model, on your own data. open-weights models are performant and much cheaper. when trained on your task & proprietary data, they beat closed models. the thing standing between you and that was weeks of plumbing & years of ml expertise. with castform, model training is as simple as prompt engineering. @castformai bring your agent traces or raw corpora. castform turns it into training data, picks the right algorithmic recipes, manages gpus, and gives you an ide to watch and chat with your model as it learns. see what you can build with castform👇
1
1
183
Make motors go brr
Are you really in tech if you’ve never touched (or hauled) a datacenter rack before
2
252
If you need a domain specific model that outperforms frontier models and reduces inference cost, @googrish and the @castformai team have the best product for it.
3
259
We're in Boston this week for #RoboticsSummit, and NYC next week. Who should I meet to exchange notes and explore overlaps on vertical robotics? We're building robots for datacenters and are actively thinking about early commercialization angles, continuous post-training pipelines fed by fleet data and scalable verification for RL.
4
278
👀
Glad to see this project getting some recognition, thanks @dwarkesh_sp People are surprised when I say I worked on datacenters when I was at Jane Street “Why do you need to run datacenters?” “Aren’t you a software engineer?” When I first joined the team it was daunting, I was a fresh grad and knew nothing about DC ops Over time I realized it was one of the best positions I could find myself in early in my career Beyond the technical complexity of running critical infrastructure, supporting the entire firm led to collaborations across multiple teams I was often reminded how much we take physical infrastructure for granted, and how much impact I can have solving problems with new tech in a traditional industry
2
273
The next frontier will be autonomous datacenters ran by AI and robots.
Jane Street just showed the inside of their AI training data center in Texas. 4,032 GPUs. 56 racks. 8,000 km of fiber. liquid cooling running through every server because air cooling can't handle the heat anymore. but the part that got me was the origin story. Ron Minsky, who co-heads their technology group. said their first compute cluster was literally six Dell boxes stacked on top of each other at the end of a desk row. they called it "the hive." the trading systems sat out in the room with the traders because they wanted to be able to unplug them if something went wrong. at one point, someone vacuuming the office unplugged a live trading system in the middle of the day. from six Dell boxes and a vacuum cleaner incident to a liquid-cooled GPU data center processing trades in under 100 nanoseconds. that's a 20-year arc.
4
534
We'll be in SF next week for Data Center Expo. Do I know anyone deploying robots in industrial or similar settings? We're building robots for datacenters and are working together with our first enterprise partner. Would love to share notes and explore overlaps. Real-world RL, engineering deployment pipelines for continuously improving models, and early commercialization are top of mind.
3
10
404
mandatory day zero shot
3
1
25
1,510
Brandon retweeted
Digital twins for datacenters are still hard to build in 2026 As agents control systems more autonomously, the need for high-level observability and visualization will only increase We need to know quickly how agents are running to make sure they do not go off the rails
1
6
264
Who’s working on continuous improvement model posttraining tools in the open source? (For AI robotics models) Sounds like the kind of thing that would benefit from co-developing with the community. Would love to meet others building their own.
Excited to share LWD: Learning While Deploying. Our robots learn while doing real tasks—restocking groceries, brewing Gongfu tea, making cocktails, making juice, and packing shoes. Deployment is no longer just evaluation; it becomes the training loop. 🧵
4
502
Congrats on the new MolmoAct 2 release by @allen_ai! A few features that stood out for those considering this for real-world deployments: 1. YAM embodiment unlock 720h teleoperated dataset 720 hours of bimanual YAM data is a meaningful contribution. The YAM embodiment is a simple bimanual arm setup for dexterous tasks, very similar to the dual PiperX and Trossen WidowX arms. Anyone building on @physical_int 's Pi05 or similar models with a YAM-type robot now has significantly more data to fine-tune from, which should reduce the fine-tuning samples needed for a custom task, assuming the target task and environment fall within the dataset's distribution. The dataset spans household, factory, and coffee-shop settings with high object and scene variation. @cortexairobot was the data vendor. Hoping the appendix detailing the quality control protocol gets released. 2. Depth reasoning as a reproducible recipe, but only with layer-level access MolmoAct2-Think shows one way to inject depth information into the action model. Before producing an action, the model predicts a compact discrete depth representation that conditions the action expert through per-layer KV conditioning. The mechanism requires surgical access to the VLM's intermediate attention states at every layer, something only possible with fully open architectures. 3. Swappable VLM backbones for converting VLM -> VLM-ER The released training recipe effectively decouples the perception backbone from the action head. You can pick a VLM optimized for your task domain rather than accepting a generic vision encoder. Hypothetical example: for warehouse sorting where success hinges on reading tiny, cluttered, blurry SKU labels, start from a VLM fine-tuned for OCR (e.g., a custom Qwen-VL or InternVL variant) instead of a generalist web-scale VLM. Apply the MolmoAct2-ER training recipe to that backbone to produce an "OCR-VL-ER" variant, then attach a flow-matching Action Expert. The result is a bespoke VLA that inherits your perception fine-tuning, optimized for label-reading manipulation rather than generic open-world scenes. This assumes catastrophic forgetting is minimized and the backbone retains most of the baseline capabilities it had before fine-tuning. With this recipe, you can swap in domain-specific backbones (medical imaging, industrial inspection, high-res OCR) and convert them into action models entirely from open components.
May 5
Robotics models often struggle outside controlled environments. Ours is built to work in real ones. Today we're launching MolmoAct 2, which can assist with a host of chores & lab tasks, plus the MolmoAct 2-Bimanual YAM dataset—the largest open robotics dataset of its kind. 🧵
6
18
2,428
Brandon retweeted
Simulations are core to robotics research, but spinning up custom scenes is still tedious and has a steep learning curve We built mujoco workbench (mwb): cli agent skills for codex/claude code to scaffold and debug sim scenes from natural language
3
6
39
3,972
"A robotics-shaped take-off curve." Cracking distribution is as hard as the technical challenges to commercialize early, especially when customers need to see a plausible trajectory, hardware budgets are tight, and data collection needs funding. "Channel construction is the most underestimated lever in the stack." Creating the data flywheel that powers continuous improvement and eventual task reliability is what closes the gap to real-world deployments that actually create value. "Deployment-system engineering matters as much as model architecture." Similar to the systems around early LLM applications and AI voice agents, a model's capabilities are only as good as the scaffolds around it. This is where engineering depth and iteration speed compound into capability. The teams that earn the right to a first deployment reliable enough to kickstart the data flywheel will crack adoption in a new vertical.
2
2
13
2,379
congrats! stoked to have you building embodied autonomy here in SG
[Major life updates] 🎉 After 4 incredible years of my PhD at @UW @uwcse with @fox_dieter17849 and @RanjayKrishna, I'm joining @NUSComputing as an Assistant Professor this August, under the Presidential Young Professorship scheme! More details 🧵👇
4
388
Sims are an essential piece of any researcher's experimental loop, both for training data and for evaluation. However, spinning up quick scenes still requires a learning curve and reading extensive documentation. This is true even for the small but critical fraction of training data needed for diversity, or for quick, directionally correct evaluation checks. In response, we built MuJoCo Workbench (MWB), a CLI and set of agent skills to prototype custom scenes with coding agents like Codex and Claude Code. Repo is in the next post. It's an attempt to make building diverse scenes a delightful experience, and to maximize what coding agents can do in researchers' hands to accelerate their experimental loops. How it works: 1. Install the bundled agent skills. 2. Describe what you want, and the agent scaffolds a working sim for you. No MuJoCo experience is required to get started. The skills teach the agent the mwb CLI, the scene layout conventions, and the debug tools, so it can iterate on behavior without you needing to know the plumbing. We're working on extending MWB with a built-in integration for real-time inference on open-source VLAs/VAMs like Pi05. If you recognize the problems mentioned here and want to learn more, reach out.
4
13
66
5,324