Abstract:
The encoder-decoder network (ED-Net) is a commonly employed choice for existing depth completion methods, but its working mechanism is ambiguous.In this paper, we visualize the internal feature maps to analyze how the network densifies the input sparse depth.We find that the encoder feature of ED-Net focus on the areas with input depth points around.To obtain a dense feature and thus estimate complete depth, the decoder feature tends to complement and enhance the encoder feature by skip-connection to make the fused encoder-decoder feature dense, resulting in the decoder feature also exhibits sparse.However, ED-Net obtains the sparse decoder feature from the dense fused feature at the previous stage, where the ``dense$\Rightarrow$sparse'' process destroys the completeness of features and loses information.To address this issue, we present a depth feature upsampling network (DFU) that explicitly utilizes these dense features to guide the upsampling of a low-resolution (LR) depth feature to a high-resolution (HR) one.The completeness of features is maintained throughout the upsampling process, thus avoiding information loss.Furthermore, we propose a confidence-aware guidance module (CGM), which is confidence-aware and performs guidance with adaptive receptive fields (GARF), to fully exploit the potential of these dense features as guidance.Experimental results show that our DFU, a plug-and-play module, can significantly improve the performance of existing ED-Net based methods with limited computational overheads, and new SOTA results are achieved.Besides, the generalization capability on sparser depth is also enhanced.Project page: https://npucvr.github.io/DFU.
Chat is not available.