2 L. Tang et al.
proposals. Besides, Tao et al. [25] proposed a two-stage seg-level supervision 3D
instance and semantic segmentation method, which first leverages a segment
grouping network to generate pseudo labels for the whole scenes, and then the
generated pseudo point-level labels are used as the ground truth to train the net-
work. However, these simple pseudo label generation strategies cannot effectively
generate high-quality pseudo labels, resulting in poor 3D instance segmentation
results.
In this paper, we propose a simple yet effective weakly supervised 3D in-
stance segmentation framework, which can achieve impressive results with one
point annotation per instance. For weakly supervised point cloud instance seg-
mentation with few annotated labels, our intuition lies in two folds: (1) Under
rare annotations, effective label propagation is essential to produce high-quality
pseudo labels, especially in 3D instance segmentation. (2) Weakly supervised 3D
instance segmentation is more challenging than weakly supervised 3D semantic
segmentation, so we consider introducing the object volume constraint to im-
prove the instance segmentation results. Specifically, we first use an unsuper-
vised method [15] to oversegment the point cloud into superpoints and build the
superpoint graph. In this way, point-level labels can be extended to superpoint-
level labels. Then, we propose an inter-superpoint affinity mining module to
generate high-quality pseudo labels based on a few annotated superpoint-level
labels. Based on the superpoint graph, we leverage the semantic and spatial in-
formation of adjacent superpoints to adaptively learn inter-superpoint affinity,
which can be used to propagate superpoint labels along the superpoint graph
via semantic-aware random walk. Finally, we propose a volume-aware instance
refinement module to improve instance segmentation performance. Based on the
trained model using superpoint-level propagation, we can obtain coarse instance
segmentation results through superpoint clustering and further infer the object
volume information from the instance segmentation results. The object volume
information contains the number of voxels and the radius of the object. The
inferred object volume information is regarded as the ground truth of the corre-
sponding instance to retrain the network. In the test phase, based on the object
volume information, we utilize the predicted object volume information to intro-
duce a volume-aware instance clustering algorithm for segmenting high-quality
instances. Extensive experiments on the ScanNet-v2 [6] and S3DIS [1] datasets
can demonstrate the effectiveness of our method.
The main contributions of our paper are as follows:
–We present an inter-superpoint affinity mining module that considers the
semantic and spatial relation to adaptively learn inter-superpoint affinity for
random-walk based label propagation.
–We present a volume-aware instance refinement module, which guides the
superpoint clustering on the superpoint graph to segment instances by using
the object volume information.
–Our simple yet effective framework achieves state-of-the-art weakly super-
vised 3D instance segmentation performance on popular datasets ScanNet-v2
and S3DIS.