PhD Student @UCLA

Joined June 2014
35 Photos and videos
Pinned Tweet
MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use. MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate % More Below:
Feb 11
Introducing MolmoSpaces, a large-scale, fully open platform benchmark for embodied AI research. ๐Ÿค– 230k indoor scenes, 130k object models, & 42M annotated robotic graspsโ€”all in one ecosystem.
6
16
151
13,390
Wall-OSS is now the #1 policy on the zero-shot MolmoSpaces evals. A lot of details in their paper, I recommend checking it out.
We are open-sourcing Wall-OSS-0.5. Pretrain Once, Act Anywhere. Wall-OSS-0.5 is a VLA model for real-world robotic manipulation, exploring whether pretraining alone can produce robot capabilities directly testable on physical hardware before task-specific fine-tuning. Key technical highlights: โ€ข Gradient-bridged co-training โ€ข Vision-Aligned RVQ Action Tokenizer โ€ข Action-Space Supervision โ€ข DMuon distributed optimizer In zero-shot real-robot evaluation, the pretrained checkpoint achieved task-progress scores above 80 on multiple tasks, including Block Sorting, Fruit Sorting, Ring Stacking, and Rope Tightening. Paper, code, blog, and uncut videos: x2robot.com/oss#resources
1
10
36
6,291
Omar Rayyan retweeted
Robotics is still data starved. Collecting high-quality robot demonstrations remains brutally slow and expensive. Introducing COBALT: A cloud-native teleoperation platform designed for large-scale robot learning. We are democratizing data collection by leveraging the hardware everyone already owns: the smartphone All you need is to download an app (today)! Read on for more!
29
52
393
98,515
Omar Rayyan retweeted
๐—ง๐—ถ๐—ฃ๐—ง๐—ผ๐—ฃ ๐—ถ๐˜€ #๐Ÿญ ๐—ผ๐—ป ๐— ๐—ผ๐—น๐—บ๐—ผ๐—ฆ๐—ฝ๐—ฎ๐—ฐ๐—ฒ๐˜€! Outperforming VLAs including MolmoAct2 and ฯ€โ‚€.โ‚…, and WAMs like DreamZero It's the only method that uses inference-time search and ๐™ฏ๐™š๐™ง๐™ค robot data. We didn't do any benchmark-specific tuning.
3
15
138
14,170
Omar Rayyan retweeted
Just merged an amazing contribution by @omarrayyann to mjlab's viser viewer: checkpoint hot-swapping! You can now browse and load any checkpoint mid-session without restarting and it works with local checkpoints and W&B runs.
1
16
141
9,518
Omar Rayyan retweeted
Benchmarking, evaluating, and developing robotics code is difficult, and part of this is because no simulator really reflects the diversity and scale of real embodiments. Enter MolmoSpaces from AI2: a massive open ecosystem with a range of 230,000 handcrafted and procedurally-generated home environments, including 48,000 manipulable objects. Crucially, MolmoSpaces provides simulation environments which work for both navigation and manipulation. We talked to the team: @YejinKim4, @omarrayyann, and Max Argus, to tell us more. Watch Episode 69 of RoboPapers, with @micoolcho and @DJiafei, now!
1
16
74
20,996
Omar Rayyan retweeted
Jensen approves! Hercules efforts from @YejinKim4 @omarrayyann, Max Argus & team! This has a decent chance of becoming a super important benchmark fo robotics going forward. Check out this @RoboPapers episode with the MolmoSpaces folks.
Benchmarking, evaluating, and developing robotics code is difficult, and part of this is because no simulator really reflects the diversity and scale of real embodiments. Enter MolmoSpaces from AI2: a massive open ecosystem with a range of 230,000 handcrafted and procedurally-generated home environments, including 48,000 manipulable objects. Crucially, MolmoSpaces provides simulation environments which work for both navigation and manipulation. We talked to the team: @YejinKim4, @omarrayyann, and Max Argus, to tell us more. Watch Episode 69 of RoboPapers, with @micoolcho and @DJiafei, now!
1
3
7
1,170
Check out our MolmoBot release and the open-sourced foundational models trained entirely in simulated MolmoSpaces homes!
Mar 11
Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.๐Ÿงต
2
2
18
1,515
Omar Rayyan retweeted
MolmoSpaces leaderboard is now open for submissions! When we created this benchmark for zero-shot real-to-sim eval in diverse homes, we didnโ€™t expect things to heat up so quickly. But it did, thanks to @jang_yoel and team at GEAR toppling PI to take the crown on task-general category. Congrats ๐ŸŽ‰ You can evaluate and submit your model to this leaderboard: molmospaces.allen.ai/leaderbโ€ฆ

๐ƒ๐ซ๐ž๐š๐ฆ๐™๐ž๐ซ๐จ ๐ข๐ฌ #๐Ÿ ๐จ๐ง ๐›๐จ๐ญ๐ก ๐Œ๐จ๐ฅ๐ฆ๐จ๐’๐ฉ๐š๐œ๐ž๐ฌ ๐š๐ง๐ ๐‘๐จ๐›๐จ๐€๐ซ๐ž๐ง๐š ๐Ÿ† ๐—ช๐—ต๐—ฎ๐˜ ๐—บ๐—ฎ๐—ธ๐—ฒ๐˜€ ๐˜๐—ต๐—ถ๐˜€ ๐—ป๐—ผ๐˜๐—ฎ๐—ฏ๐—น๐—ฒ: DreamZero-DROID is trained ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ ๐‘๐‘Ÿ๐‘Ž๐‘ก๐‘โ„Ž using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training ๐‘œ๐‘›๐‘™๐‘ฆ on real data and evaluating on (1) transparent, distributed benchmarks like ๐‘๐จ๐›๐จ๐€๐ซ๐ž๐ง๐š or (2) scalable sim-benchmarks like ๐Œ๐จ๐ฅ๐ฆ๐จ๐’๐ฉ๐š๐œ๐ž๐ฌ is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place ๐Ÿค—(See YAM experiments in our paper for more detail). We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. ๐ŸŒ dreamzero0.github.io ๐Ÿ’ป github.com/dreamzero0/dreamzโ€ฆ RoboArena: robo-arena.github.io/leaderbโ€ฆ MolmoSpaces: molmospaces.allen.ai/leaderbโ€ฆ
2
4
40
4,648
MolmoSpaces-Bench leaderboard is now live! Test your generalist policies to see how they compare across tasks and environments. Feel free to reach out if you need help setting it up. molmospaces.allen.ai/leaderbโ€ฆ
2
5
35
1,868
You can get more insights than just the success rate (e.g. AR policies like DreamZero and pi0-Fast generate smoother trajectories) and cross-compare policy performance across objects.
1
6
201
Also thanks to @youliangtan @jang_yoel for their DreamZero API. Their world action model now leads the benchmark with zero sim data. x.com/jang_yoel/status/20275โ€ฆ

๐ƒ๐ซ๐ž๐š๐ฆ๐™๐ž๐ซ๐จ ๐ข๐ฌ #๐Ÿ ๐จ๐ง ๐›๐จ๐ญ๐ก ๐Œ๐จ๐ฅ๐ฆ๐จ๐’๐ฉ๐š๐œ๐ž๐ฌ ๐š๐ง๐ ๐‘๐จ๐›๐จ๐€๐ซ๐ž๐ง๐š ๐Ÿ† ๐—ช๐—ต๐—ฎ๐˜ ๐—บ๐—ฎ๐—ธ๐—ฒ๐˜€ ๐˜๐—ต๐—ถ๐˜€ ๐—ป๐—ผ๐˜๐—ฎ๐—ฏ๐—น๐—ฒ: DreamZero-DROID is trained ๐‘“๐‘Ÿ๐‘œ๐‘š ๐‘ ๐‘๐‘Ÿ๐‘Ž๐‘ก๐‘โ„Ž using only the DROID dataset. No pretraining on large-scale robot data, unlike competing VLAs. This demonstrates the strength of video-model backbones for generalist robot policies (VAMs/WAMs). More broadly, training ๐‘œ๐‘›๐‘™๐‘ฆ on real data and evaluating on (1) transparent, distributed benchmarks like ๐‘๐จ๐›๐จ๐€๐ซ๐ž๐ง๐š or (2) scalable sim-benchmarks like ๐Œ๐จ๐ฅ๐ฆ๐จ๐’๐ฉ๐š๐œ๐ž๐ฌ is an exciting step toward fairer and more reproducible evaluation of generalist policies, one that the community can hillclimb together to measure progress. Special thanks to the Ai2 MolmoSpaces team (@notmahi @omarrayyann @YejinKim4 Max Argus) and the RoboArena team (@pranav_atreya) for helping with the set-up and getting these evaluations! Special shout out to @youliangtan @NadunRanawakaA @chuning_zhu, who led these efforts from the GEAR side :) We also release our DreamZero-AgiBot checkpoint & post-training code to enable very efficient few-shot adaptation. Post-train on just ~30 minutes of play data for your specific robot, and see the robot do basic language following and pick-and-place ๐Ÿค—(See YAM experiments in our paper for more detail). We also provide the entire codebase & preprocessed dataset to replicate the DreamZero-DROID checkpoint. ๐ŸŒ dreamzero0.github.io ๐Ÿ’ป github.com/dreamzero0/dreamzโ€ฆ RoboArena: robo-arena.github.io/leaderbโ€ฆ MolmoSpaces: molmospaces.allen.ai/leaderbโ€ฆ
8
365
MolmoSpaces also comes with 42M grasps that cover 48K objects across 250K scenes, allowing large-scale functional trajectory generation in MuJoCo and IsaacSim.
4
35
318
18,939
We opensource our mjcf2grasp pipeline that lets you generate and verify grasps starting from an MJCF file: github.com/allenai/molmospacโ€ฆ
2
24
1,133
Omar Rayyan retweeted
Also check out MolmoSpaces-Bench from @omarrayyann! Our contact-anchored policies (CAPs) perform well zero-shot across diverse environments and objects. Omar is the rockstar behind our sim env for CAP, enabling us to train and evaluate multiple models in a day.
Replying to @omarrayyann
Itโ€™s hard to find true zero-shot end-to-end policies โ€“ ones that work without any fine-tuning in fully novel, simulated environments, even for single tasks! We test two policy families, the ฯ€ family from @physical_int and the recent Contact-Anchored Policies (CAP) from NYU & UCB. On all our tasks, we are making steady progress โ€“ but we are nowhere close to saturation yet.
1
6
784
Omar Rayyan retweeted
How general are your general robotic policies? Today, we're releasing MolmoScenes-Bench to help explore this question. You can spin up ~1k envs in ~700 unique simulated homes and within hours find out how well your zero-shot policy generalizes to these unseen scenes ๐Ÿงต
MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use. MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate % More Below:
1
3
26
3,574
MolmoSpaces provides singular scale and diversity. We built a benchmark that puts that scale to use. MolmoSpaces-Bench evaluates zero-shot policies across thousands of environments previously unseen to them under systematic variation, providing insights that go beyond a success rate % More Below:
Feb 11
Introducing MolmoSpaces, a large-scale, fully open platform benchmark for embodied AI research. ๐Ÿค– 230k indoor scenes, 130k object models, & 42M annotated robotic graspsโ€”all in one ecosystem.
6
16
151
13,390
Another example is prompt-sensitivity in lang-conditioned models. On the exact same tasks, early ฯ€ models fail more when given queries less frequent in DROID dataset โ€“ newer ฯ€ models almost entirely close this gap.
2
3
285