Abstract:
While recent 3D instance segmentation approaches show promising results based on transformer architectures, they often fail to correctly identify instances with similar appearances. They also ambiguously determine edges, leading to multiple misclassifications of adjacent edge points. In this work, we introduce a novel framework, called $\textbf{EASE}$, to overcome these challenges and improve the perception of complex 3D instances. We first propose a semantic guidance network to leverage rich semantic knowledge from a language model as intelligent priors, enhancing the functional understanding of real-world instances beyond relying solely on geometrical information. We explicitly instruct the basic instance queries using text embeddings of each instance to learn deep semantic details. Further, we utilize the edge prediction module, encouraging the segmentation network to be edge-aware. We extract voxel-wise edge maps from point features and use them as auxiliary information for learning edge cues. In our extensive experiments on large-scale benchmarks, ScanNetV2, ScanNet200, S3DIS, and STPLS3D, our EASE outperforms existing state-of-the-art models, demonstrating its superior performance.
Chat is not available.