After a year of team work, we're thrilled to introduce Depth Anything 3 (DA3)! 🚀
Aiming for human-like spatial perception, DA3 extends monocular depth estimation to any-view scenarios, including single images, multi-view images, and video.
In pursuit of minimal modeling, DA3 reveals two key insights:
💎 A plain transformer (e.g., vanilla DINO) is enough. No specialized architecture.
✨ A single depth-ray representation is enough. No complex 3D tasks.
Three series of models have been released: the main DA3 series, a monocular metric estimation series, and a monocular depth estimation series.
The core team members, aside from me:
@HaotongLin, Sili Chen, Jun Hao Liew,
@donydchen.
👇(1/n)
#DepthAnything3