View-centric space matters.
Most image-to-3D models operate in canonical space.
That works for instances — but not for scenes.
🔑 Key difference:
Scene generation is about
#spatialreasoning.
In canonical space:
• Left/right in the image becomes meaningless
• Different views collapse to the same 3D layout
• Fine for instances, bad for scene layout
👁️ View-centric space fixes this:
2D image structure now directly corresponds to 3D spatial layout.
💡 We find both are necessary:
Scene context attention view-centric space
Remove either just not work.
🧵 [3/6]