Researcher @XSquareRobot. Prev: MPhil @HKUniversity | Ex-intern Horizon Robotics & CASIA

Joined April 2020
9 Photos and videos
Jun 10
【XRZero-G0 is Coming!!!】Welcome to follow our work!🧐🧐🧐🧐🧐
🥳Our third open-source release is here: XRZero-G0. After WALL-OSS-0.5 and WALL-WM, we’re open-sourcing XRZero-G0 to Scale Robot Learning with Interfaces, Data Quality and Ratios XRZero-G0 enables robot-free data collection, trainable policy generation, and real-robot evaluation through a closed-loop pipeline: Collection → Inspection → Training → Evaluation Key highlights: 2,000 hours of validated multimodal demonstrations ~85% effective data yield in controlled settings 10:1 robot-free / real-robot data mixing law Up to 20x reduction in real-robot data needs Zero-shot transfer across robot embodiments Built for scalable, reproducible embodied AI research. Project: x2robot.com/x2go Paper: arxiv.org/abs/2604.13001 Code: github.com/X-Square-Robot/XR… Dataset: 📷huggingface.co/datasets/x-sq… @ComWjm @_akhaliq @HuggingPapers @ModelScope2022 @Xianbao_QIAN @XRoboHub @TheHumanoidHub @chris_j_paxton @IlirAliu_
1
33
James retweeted
Introducing WALL-WM, our open-source World Model for embodied AI and the next piece of our open-source robotics stack. Carving World Action Modeling at the Event Joints Read the blog: x2robot.com/en/pages/wm Why it matters WALL-WM shifts robot world modeling from fixed-length action chunks to event-grounded video-action pretraining. It learns around events like reaching, contact, grasping, lifting, moving, and placing, so language, vision, and action align more naturally. Why you should care WALL-WM brings together: •Event-grounded VLA pretraining •Prior-aligned video-action architecture •Wan-based video tower randomly initialized action DiT •Multi-view perception with sight-cone masking, tube patch masking, and Camera RoPE •Event Mode for variable-length execution •Unified Mode with Staircase Decoding •DMuon for large-scale training The goal: help robots learn what physically matters, not just what happens in the next fixed slice of time. Code (coming soon): github.com/X-Square-Robot/wa… #opensource #EmbodiedAI
8
40
228
38,074
James retweeted
We are open-sourcing Wall-OSS-0.5. Pretrain Once, Act Anywhere. Wall-OSS-0.5 is a VLA model for real-world robotic manipulation, exploring whether pretraining alone can produce robot capabilities directly testable on physical hardware before task-specific fine-tuning. Key technical highlights: • Gradient-bridged co-training • Vision-Aligned RVQ Action Tokenizer • Action-Space Supervision • DMuon distributed optimizer In zero-shot real-robot evaluation, the pretrained checkpoint achieved task-progress scores above 80 on multiple tasks, including Block Sorting, Fruit Sorting, Ring Stacking, and Rope Tightening. Paper, code, blog, and uncut videos: x2robot.com/oss#resources
6
24
120
23,840
James retweeted
X Square Robot Unveils New Embodied AI Model, Says Robots Will Arrive in Homes in 35 Days Backed by Alibaba, ByteDance, Xiaomi and Meituan, X Square Robot unveiled a next-generation embodied AI foundation model for home robots and said its first deployments in everyday households will begin within 35 days. X Square Robot on Tuesday unveiled WALL-B, a new embodied AI foundation model designed for deployment in real-world homes, marking what the company described as a major step toward bringing general-purpose robots into daily family life. At a launch event themed "Born to Bot, Bot to Family," the company also introduced its World Unified Model (WUM) architecture, a training framework that combines vision, language, action and physical prediction within a single system from the outset. X Square said the model is intended to help robots operate in the far more unpredictable setting of a home, where tasks, layouts and interactions vary from moment to moment. "Robots in factories and in homes are completely different. In factories, they repeat the same action 10,000 times without variation. In a home, however, they need to perform 10,000 different actions, each unique and non-repetitive. Therefore, the challenge of a truly intelligent robot lies not in repeating a single action, but in the ability to execute new, untrained movements within unstructured environments. Deploying robots in the home is one of the most significant technical hurdles of our time," said Qian Wang, founder and CEO of X Square Robot. WALL-B is the first real-world implementation of the World Unified Model architecture. Unlike modular systems that train perception, language and control separately, X Square Robot said World Unified Model optimizes those capabilities jointly from the very beginning. The company said that allows physical prediction — including force, friction and collision dynamics — to emerge as part of the model itself, rather than being layered on afterward. "We train all capabilities—vision, language, action, and prediction—within the same network from day one. Much like infants, who do not learn to see, move and speak in isolated, sequential stages, but instead see, move listen and act simultaneously while receiving feedback, we have integrated all these capabilities into a unified whole," said Wang Hao, CTO of X Square. X Square Robot said the development of WALL-B rests on two pillars. The first is a data strategy that prioritizes training on authentic, non-staged home environments to cover the “long-tail” distribution of real-world scenarios, such as misplaced objects and temporary occlusions. Unlike models primarily trained on synthetic data or laboratory datasets, this strategy exposes WALL-B to the natural clutter of lived-in spaces—misplaced items, unexpected obstacles, and spontaneous human activity—ensuring that the training data reflects real-world conditions rather than a simplified version. The second is a physics-aware predictive mechanism that anticipates physical outcomes before an action is taken, enabling the model to respond to contact dynamics instead of just reacting. The development of the self-developed WUM architecture on physical robotic platforms highlights the company’s accumlated experience in bridging sim-to-real gaps across varied operational contexts. Wang commented that the current AI model is still in an "intern" stage, subject to errors requiring remote assistance. For instance, it may mistakenly place slippers in the kitchen or pause while wiping a table to "think". However, the model operates nonstop 24 hours a day, becoming increasingly "intelligent" as each day of operation generates new data. In 35 days, on May 25, X Square Robot will officially bring its robots into everyday homes, underscoring the company’s long-term commitment to the home robotics sector.
8
13
62
52,840
James retweeted
Right now, this service is exclusive to Shenzhen and we plan to expand to more Chinese cities, bringing intelligent, AI-powered services into more homes!
The future of home cleaning just landed in Shenzhen and it is walking right into your living room. 🤖🏠 @XSquareRobot and 58.com officially launched China’s first robot home service, moving embodied AI from the lab to your front door. When you book a cleaning on the 58.com app, a professional cleaner now shows up with an X Square robot partner to tag team the house. The human handles the tricky stuff that needs real judgment while the robot takes over repetitive tasks like wiping tables and tidying up surfaces. X Square is using an end to end foundation model which means the robot actually perceives and plans its own moves instead of just following a script. By testing in the messy reality of a real home, they are proving that if a robot can master a living room, it can handle almost any physical space. This pilot is part of a massive push to turn these machines into reliable partners that can actually assist in our daily lives.
2
5
13
1,813
Jan 13
Welcome to join us! We are hiring for multiple positions, including World Model/VLA/Large Model Infrastructure, etc. Referral code: 32DTKC9 欢迎加入我们,多个职位招聘中,包括世界模型/VLA/大模型Infra等等,内推码:32DTKC9
We’re thrilled to announce an additional around US$140 million (RMB 1 billion) in Series A funding, with investment from ByteDance and HongShan, along with several other strategic Chinese partners. Building on previous backing from Alibaba Group and Meituan, this milestone reinforces our leadership in embodied AI. Check it: prn.to/4qfVNbj
1
287
James retweeted
In November 2025, Quanta X1 made history by completing the world's first autonomous food delivery in a real-world open environment, powered entirely by our end-to-end VLA model—WALL-A foundation model. The future of autonomous last-mile delivery isn't just coming. It's already on the move. #embodiedAI
1
4
478
25 Dec 2025
Merry Christmas🎄🎄🎄
Who knew robots could spread Christmas cheer? 🎄 Our Quanta X1 tried: • Folding tablecloths • Decorating the tree • Lighting candles A few festive fumbles — part of the fun. Robots don’t have all answers, and teaching them is exciting.🤖 Merry Christmas from X Square Robot!
103
James retweeted
23 Dec 2025
We’ve teamed up with @XSquareRobot to integrate WALL-OSS, a powerful new VLA foundation model into LeRobot!
4
18
129
21,656
James retweeted
Can a robot flick a single playing card perfectly?🤔 Thin, slippery, flexible—too little force → it won’t fly, too much → it bends or jams. Watch this: our end-to-end model can nail it!
2
1
380
28 Sep 2025
I got my first 100 citations on Google Scholar! A big thank you to all the people and researchers who have contributed to this milestone.
196
James retweeted
Igniting VLMs toward the Embodied Space. We're excited to introduce WALL-OSS🧠🤖, an end-to-end embodied foundation model--our first open source step. Paper, code, blog, uncut videos: x2robot.com/en/research/68bc…
5
5
34
17,973
12 Aug 2025
Our company X Square Robot's (@XSquareRobot) Wall-A model shines at the World Robotics Conference with "one brain, multiple uses" - seamlessly handling greetings, sachet-making, housework, package sorting, and industrial assembly! #EmbodiedAI
1
213
7 Jun 2025
🚗✨ Excited to share our CVPR 2025 paper: "Momentum-Aware Driving (MomAD)" - a breakthrough in end-to-end autonomous driving! 📄 Paper: arxiv.org/abs/2503.03125 👨‍💻 Code: github.com/adept-thu/MomAD #CVPR2025 #AutonomousDriving #AI #ComputerVision #SelfDriving #MachineLearning
2
5
770
17 Dec 2024
We have developed new air-ground robot navigation systems called the OMEGA🧐 to promote the development of flying cars, which are expected to solve modern traffic congestion problems. 🤖OMEGA (RA-L 2024.12) Homepage: jmwang0117.github.io/OMEGA/
499
1 Oct 2024
We developed OccRWKV, an efficient semantic occupancy network inspired by Receptance Weighted Key Value (RWKV). Project: jmwang0117.github.io/OccRWKV… Code: github.com/jmwang0117/OccRWK… Paper: arxiv.org/abs/2409.19987 #RWKV
1
4
648
21 Aug 2024
OccMamba is here! Title: OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robots in Dynamic Environments via State Space Model Homepage: jmwang0117.github.io/OMEGA/ Paper link: arxiv.org/abs/2408.10618 #mamba #Robotics
565
29 Jul 2024
Nice !
27 Jul 2024
RX1 Humanoid: - Opensource full human scale dual arm robot - <$1,000 - Teleoperation and pick & place objects @_buildspace @_nightsweekends
413
James retweeted
In the past few weeks, I deep dived into an exploration revolving around the use of physical interfaces to feed and interact with a real-time img2img diffusion pipeline using Stream Diffusion and SDXL Turbo. What really captivated me is to use my hands, objects, art supplies, tools, and light to create images and scenes. 𝗣𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗠𝗲𝗱𝗶𝗮 𝗧𝗼𝗼𝗹𝘀 I experimented with clay, manipulating different types and colors along with a selection of prompts. I used a magnifying glass, tracked in real-time, to focus the diffusion process on specific areas. Combining these tools created a dynamic and inspiring experience. Using magic clay to layer shapes and colors as a base for revealing landscapes and hidden worlds, and the magnifying glass to focus and reveal these details, was particularly effective. 𝗣𝗵𝘆𝘀𝗶𝗰𝗮𝗹 𝗟𝗶𝗴𝗵𝘁 I used light as my method of interaction with the img2img diffusion. This approach felt special right away. There was something magical about holding a physical light source and seeing it influence the generated visuals. I iterated on this technique with themes like Rococo architecture, flowers, Brutalist architecture, hidden worlds, and origami landscapes. 𝗜𝗻𝗸 𝗛𝘆𝗯𝗿𝗶𝗱 𝗙𝗼𝗿𝗺𝗮𝘁𝘀 I also used ink in milk as a means of physical interaction with the diffusion pipeline. As I drop ink into milk, shapes come alive instantly. By learning to manipulate the combination of physical and digital elements, I steered the generated output toward my areas of interest. These iterations extended beyond ink in milk to include the format in which these elements are contained: a circular plate or a triptych of small stainless steel trays. These formats provide a structured yet flexible framework to explore themes and narratives across multiple visual spaces. It's magical. Some of that last iteration has been captured in this insightful article by Fast Company: lnkd.in/e6UUCTyr #stablediffusion #realtime #ai
7
72
357
86,355