Answer in Depth maps.
It worth to understand that absolutely all processing up to depth maps (including them) happen in Image level.
Everything is images.
Multiply View stereo use a pair of images.
Using well known algo such stereo pair estimate Disparity images.
Disparity images has same resolution as original images. Technically you can do upscale here, but sure no one do that. Some genius can downscale to speedup and filter, but hope no one do that.
Disparity after that processing to depth map images. And again, depth maps has exactly the same resolution or lower. They can be lower because at this moment images should be encoded in float or double. That might increase memory use. (Disparity don’t need high bit depth).
Next is “depth fusion”:
Single stereo pair can’t solve discontinuities. And a lot of areas in raw depth map after disparity has NaNs or zeroes.
And to repair/recover such areas depth information from nearest cameras (or from other stereo pairs) can be used.
At the end we again has Depth Maps that storing as usual images (even if they are PLY files).
Now depend on algo, such depth maps can be blended to a single mega-point cloud. (Depth map store exact distances from sensor to point in 3D scene). And meshed using some poisson meshing algo. Or use more advanced methods without mega-point cloud.
In any case that’s a big difference, you have 2500x2000 points per camera or 5000x4000 point per camera.
Old methods that use depth -> dense point cloud -> mesh together with low resolution images and bad alignment can fuse all to some smooth blob.
Modern methods and mesh from depth together with a higher precise alignment can give a profit.
Thin details, or thin surfaces, when you have original resolution due to reprojection errors from SFM can be just blended to a single value.
2x resolution can give 4x more samples per camera and higher chance that will deserve own vertex in a mesh.
Is this a universal solution?
No. Using too low res cameras or too high res cameras, and upscale images will not give you much profit.
Upscale 41mpx Sony A7RIII images only add processing time to facial scans with minimal improvements in reconstructed details that also mixed with amplified noise.
Using Ai upscale can be even more destructive as such methods often can distort original details, but “dreamed” details from multiply images become a noise due to their inconsistency across cameras.
And if you have alignment issues, upscale images more likely don’t improve a final mesh.