In our conference submission, we evaluate AXIS as a growable data engine for robot manipulation through three questions:
1. Does AXIS pretraining improve π0.5 on downstream LIBERO-Plus robustness tasks, beyond a matched-volume baseline?
2. Does the gain scale with AXIS data volume, from 25% to 50% to 100% of data volume?
3. Which perturbation axes benefit the most, and do they match the diversity targeted by our augmentation pipeline?
Here, “AXIS” refers to our growable manipulation dataset snapshot built around a Franka Research 3 robot: 207 tabletop tasks across 7 scene categories, 50k human demonstrations, and 60k task/scene variants produced through cleaning and semantic-preserving augmentation.
Findings below 🧵
Axis Weekly
This week was about making the AXIS loop more scalable end to end: automating data-to-model workflows, testing recovery-driven training, expanding TaskGen coverage, and preparing the dataset and model stack for release.
Key updates:
- Data-to-model automation: We used scripts to speed up and standardize several repetitive but critical workflows.
- Continuous-growth training: We completed multi-data-scale training and success-rate comparisons across several failure tasks.
- Failure task expansion: A new batch of failure tasks has been pushed to test, expanding the evaluation range for ablations across data scale, data quality, and randomization.
- TaskGen: Articulated-object generation is now merged into the automatic generation pipeline.
- Model and release prep: We finished the first round of fine-tuning, evaluation, and benchmarking, completed the dataset’s conference submission, and are now improving experimental results for release.
Details below 🧵