Albeit remarkable progress has been made to improve the accuracy and completeness of multi-view stereo (MVS), existing methods still suffer from either sparse reconstructions of low-textured surfaces or heavy computational burden. In this paper, we propose a Confidence-based Large-scale Dense Multi-view Stereo (CLD-MVS) method for high resolution imagery. Firstly, we formulate MVS as a multi-view depth estimation problem, and employ a normal-aware efficient PatchMatch stereo to estimate the initial depth and normal map for each reference view. A self-supervised deep learning method is then developed to predict the spatial confidence for multi-view depth maps, which is combined with cross-view consistency to generate the ground control points. Subsequently, a confidence-driven and boundary-aware interpolation scheme using static and dynamic guidance is adopted to synthesize dense depth and normal maps. Finally, a refinement procedure which leverages synthesized depth and normal as prior is conducted to estimate cross-view consistent surface. Experiments show that the proposed CLD-MVS method achieves high geometric completeness while preserving fine-scale details. In particular, it has ranked No. 1 on the ETH3D high-resolution MVS benchmark in terms of F1-score.
- Multi-view stereo
- static and dynamic guidance