Human3R: Everyone Everywhere All at Once
Note: I recorded the video from the interactive demo on their project page (linked in the comment below).
Abstract (excerpt):
Human3R jointly recovers global multi-person SMPL-X bodies ("everyone"), dense 3D scenes ("everywhere"), and camera trajectories in a single forward pass ("all-at-once").
Our method builds upon the 4D online reconstruction model CUT3R and uses parameter-efficient visual prompt tuning to preserve CUT3R's rich spatiotemporal priors while enabling direct readout of multiple SMPL-X bodies.
Human3R is a unified model that eliminates heavy dependencies and iterative refinement. After being trained on the relatively small-scale synthetic dataset BEDLAM for just one day on one GPU, it achieves superior performance with remarkable efficiency: it reconstructs multiple humans in a one-shot manner, along with 3D scenes, in one stage, at real-time speed (15 FPS) with a low memory footprint (8 GB).