Our method cuts 3D primitives out of the outputs of a feed-forward reconstruction model (π³) and then glues them across time.
Each primitive’s motion is represented compactly as a single SE(3) pose, inferred from estimated 2D correspondences via an optimisation pipeline.