keypoint detection at a large scale. In a distinct line of re-
search, it has been shown that incorporating color modality
can improve accuracy for applications such as point cloud
registration [23].
The main hypothesis underlying this work is that in-
corporating color modality (in addition to a geometric
modality) can help improve the overall performance, as the
amount of information passed on to the following compo-
nents in the system has increased. Despite existing descrip-
tors that can incorporate color information (e.g., [32]), to
the best of the authors’ knowledge there currently exists no
effective keypoint detector that can extract color-rich key-
points to feed to the descriptor. For instance, geometric-
based keypoint detectors can fail to extract keypoints on a
flat surface with color texture. While some methods (e.g.,
SIFT-3D) can extract color-rich keypoints, they do so at ex-
pense of losing geometric information (i.e. they can only
respond to one modality). This limitation can be linked to
lack of an effective non-maximum suppression algorithm to
combine the two modalities; this is one of the key contribu-
tions of this work.
To this end, we propose an efficient multi-modal key-
point detector, named CEntroid Distance (CED) keypoint
detector, that utilizes both geometric and photometric in-
formation. The proposed CED detector comprises an intu-
itive and effective saliency measure, the centroid distance,
that can be used in both 3D space and color space, and
a multi-modal non-maximum suppression algorithm that
can select keypoints with high saliency in two or more
modalities. The proposed saliency measure leverages di-
rectly the distribution of points in a local neighborhood and
does not require normal estimation or eigenvalue decom-
position. The proposed CED detector is evaluated in terms
of repeatability and computational efficiency (running time)
against state-of-the-art keypoint detectors on both synthetic
and real-world datasets. Results demonstrate that our pro-
posed CED keypoint detector requires minimal computa-
tional time while attaining high repeatability. In addition,
to showcase one of the potential applications of the pro-
posed method, we further investigate the task of colored
point cloud registration. Results show that our CED detec-
tor outperforms state-of-the-art crafted and learning-based
keypoint detectors in the evaluated scenes.
Contributions. The paper’s contributions are fourfold:
• We propose an efficient multi-modal keypoint detector
that can extract both geometry-salient and color-salient
keypoints in a colored point cloud, with the potential to
be extended and applied to point clouds with multiple
modalities (e.g., colored by multi-spectrum images).
• We propose to use an intuitive and effective measure
for keypoint saliency, the distance to centroid, which
can leverage directly the distribution of points and does
not require normal estimation or eigenvalue decompo-
sition.
• We develop a multi-modal non-maximum suppression
algorithm that can select keypoints with high saliency
in two or more modalities.
• We demonstrate through experiments in four datasets
that the proposed keypoint detector can outperform the
state-of-the-art handcrafted and learning-based key-
point detectors.
2. Related Work
3D keypoint detectors can be categorized as those ex-
tending designs originally developed for 2D images [10,19],
and those native to 3D point clouds [4, 20, 31, 40] and 3D
meshes [3, 34, 38]. Following the design in 2D images,
Harris family [10] computes covariance matrices of surface
normal or intensity gradient in 3D space, and in 3D and
color space (herein referred to as 6D space). SIFT [19]
applies the difference of Gaussian operator in scale-space
to find keypoints with local maximal response. However,
for 3D point clouds, the amount and position of points
within the spherical region are uncertain, making it hard
to obtain gradients. In 3D space, Normal Aligned Ra-
dial Feature (NARF) [29] measures saliency from surface
normal and distance changes between neighboring points.
Intrinsic Shape Signature (ISS) [40] and KeyPoint Qual-
ity (KPQ) [20] perform eigenvalue decomposition of the
scatter matrix of neighbor points and threshold on the ra-
tio between eigenvalues. Heat Kernel Signature (HKS) [31]
and Laplace-Beltrami Scale-space (LBSS) [34] measure the
saliency from the response to the Laplace-Beltrami opera-
tor in the neighborhood. Local Surface Patches (LSP) [4]
leverages local principal curvatures to construct the Shape
Index (SI) [7] as the measure of saliency. As in SIFT,
MeshDoG [38] and Salient Points (SP) [3] apply the
difference-of-Gaussian operator to construct the scale space
for saliency measure. We refer readers to the comprehen-
sive evaluation in [33] for more details.
In summary, the existing methods often apply an oper-
ator to obtain point normal, curvature and gradient in the
local region, and threshold on either a combination of the
obtained measures or the eigenvalues of the covariance ma-
trices. On the contrary, our proposed method leverages di-
rectly the point distribution and statistics in 3D space and
color space, without the need of normal estimation or eigen-
value decomposition.
Learning-based approaches, such as USIP [18] and
3DFeat-Net [37], have also been studied. 3DFeat-Net [37]
learns a 3D feature detector and descriptor for point cloud
matching using weak supervision, whereas USIP [18] trains
a feature proposal network with probabilistic Chamfer loss
in an unsupervised manner.