Refined the stacking path.
Out: MuJoCo sim, synthetic data, scripted policies. In: real-world RL only.
1. ACT on VR teleop demos, target 30-50% on 2-cube (matches SmolVLA paper baseline for ACT)
2. TOPReward for automated reward via VLM logits (Chen et al, Feb 2026)
3. ZPRL bottleneck latent online RL on frozen ACT to push 70% (Yu et al, May 2026)
4. Then 3-cube stacking, territory not yet published in the modern imitation learning corpus
Pivot from yesterday's plan. Sometimes the right call is to drop the wrong path before shipping.
PS: behind the scenes clip is in french, that's how Iris and I talk. also yes that's me pinching my finger in the gripper, occupational hazard π