Smoothness, Synthesis, and Sampling

Re-thinking Unsupervised Multi-View Stereo with DIV Loss

University of California, Santa Barbara

Our DIV loss results in substantially more precise object boundaries and reduced artifacts when training unsupervised multi-view-stereo networks

Abstract

Despite significant progress in unsupervised multi-view stereo (MVS), the core loss formulation has remained largely unchanged since its introduction. However, we identify fundamental limitations to this core loss and propose three major changes to improve the modeling of depth priors, occlusion, and view-dependent effects. First, we eliminate prominent stair-stepping and edge artifacts in predicted depth maps using a clamped depth-smoothness constraint. Second, we propose a learned view-synthesis approach to generate an image for photometric loss, avoiding the use of hand-coded heuristics for handling view-dependent effects. Finally, we sample additional views for supervision beyond those used as MVS input, challenging the network to predict depth that matches unseen views. Together, these contributions form an improved supervision strategy we call DIV loss. The key advantage of our DIV loss is that it can be easily dropped into existing unsupervised MVS training pipelines, resulting in significant improvements on competitive reconstruction benchmarks and drastically better qualitative performance around object boundaries for minimal training cost.

BibTeX

@inproceedings{rich2024divloss,
  title={Smoothness, Synthesis, and Sampling: Re-thinking Unsupervised Multi-View Stereo with {DIV} Loss},
  author={Alex Rich and Noah Stier and Pradeep Sen and Tobias H\"ollerer},
  booktitle={Proceedings of the European Conference on Computer Vision ({ECCV})},
  year={2024}
}