to remove the translations and utilizes the orthogonality of
rotation matrices to eliminate the rotations. Consequently,
our novel SE(3)-equivariant feature extractor concurrently
produces pose-preserving and pose-invariant representations,
supporting both coarse- and fine-grained registers to effort-
lessly perform registrations.
We evaluate our method on ModelNet40 [13], which is
composed of various object models. Following RPMNet
[14], we pre-processed the dataset to simulate real-world
situations, including sensor noises, independent scans, and
partially overlapping point clouds. Furthermore, we evaluate
the performance over multiple initial angle ranges to exhibit
the influence of initial poses. Experimental results demon-
strate that our method outperforms state-of-the-art methods
and reaches a reliable performance under simulated real-
world scenarios. In addition, ablations (Sec. IV-F) support
that our feature extractor satisfies the feature requirements
of both global and local registers yet remains time efficient.
To sum up, the overall contributions of this work can be
summarized as follows:
•We apply a coarse-to-fine pipeline to resist impacts from
both initial pose differences and distribution variances.
•We introduce a novel SE(3)-equivariant feature extrac-
tor, simultaneously obtaining representations for both
global and local registers.
•Our method outperforms state-of-the-art methods on
ModelNet40 under circumstances simulating the real
world across different initial pose difference ranges.
II. RELATED WORK
A. Local Registration Methods
Local approaches are often used under circumstances
where transformations are known to be small in magnitude.
Iterative Closest Point (ICP) [5] iteratively matches the
closest points as correspondences and minimizes the distance
between these correspondences, which often causes the result
to converge at a local minimum. To resolve this problem,
a variety of strategies have been proposed to deal with
outliers [15], handle the noises [16], or devise better distance
metrics [17], [18]. However, the limitation of matching
points on Euclidean space leads to recent work performing
matching on feature space. PPFNet [19], 3DSmoothNet [20],
SpinNet [21], and FCGF [22] follow this idea and solve
the Procrustes problem based on the correspondences paired
by their representations. Moreover, DCP [6], IDAM [7],
RPMNet [14], DGR [23], ImLoveNet [24], and DetarNet
[25] use the ground truth poses to supervise point matching
and feature learning. Predator [26] and REGTR [27] further
leverage the ground truth overlap regions. Another branch
of work such as D3Feat [28], DeepVCP [29], PRNet [30],
and GeoTransformer [31] leverages key points to enhance
the time efficiency. The remaining challenge is that perfect
correspondences rarely exist in real-world situations, and
thereby recent work [14], [32] utilizes soft matching [33] to
work under these conditions. Even so, these local methods
still fail to handle large initial perturbations.
B. Global Registration Methods
Unlike local approaches, global approaches are designed to
be invariant to the initial transformation error. Some methods
such as GO-ICP [34], GOGMA [35], and GOSMA [36]
search the SE(3) space using branch-and-bound techniques.
Other methods [37]–[39] match the feature with robust
optimization. However, these methods are unsuitable for
real-time applications due to their large computation time.
Fast Global Registration (FGR) [40] is presented to address
this issue, achieving a similar speed to that of many local
methods. To further improve the accuracy of the registration
result, recent work handles the registration problem via
learned global representations. DeepGMR [8] represents the
global feature through GMM distributions and EquivReg
[9] takes rotation-equivariant implicit feature embeddings as
its global representations. Nevertheless, these learning-based
methods often struggle with distribution differences.
C. Group Equivariant Neural Network
Some research concentrates on proposing group equivari-
ant neural networks as a means of resisting group transfor-
mations. For instance, Convolution Neural Network (CNN)
[41] are translation-equivariant, resulting in its performance
consistency among the same images with different 2D trans-
lations. To prevent the effect of rotation, recent studies [11],
[42], [43] construct the kernels by some steerable functions.
However, these constrained kernels limit the flexibility of
the network. Other studies [10], [44] obtain the equivariance
property by lifting the input space to higher-dimensional
spaces where the group is contained. These studies are time-
intensive and cost more computational resources due to the
integration of the entire group. Vector Neuron [12] presents
a brand new SO(3)-equivariant framework. The major ad-
vantage of this framework is the capability of incorporating
the SO(3)-equivariance property into existing networks, such
as PointNet [45] or DGCNN [46]. We will later see how
we design our SE(3)-equivariant feature extractor based on
this simple idea, and use the extracted representation to cope
with the registrations with notable initial transformations and
distribution variances.
III. COARSE-TO-FINE REGISTRATION
Illustrated in Fig. 2, our coarse-to-fine registration pipeline
begins with extracting global and local feature representa-
tions. These global representations are fed into the global
register to estimate a rough alignment between input point
clouds. Focusing on roughly aligned point clouds, the local
register refines the alignment results given the correspon-
dences formed by matching these local representations.
A. Preliminaries
A function f:U→Vis equivariant to a set of trans-
formations G, if for any g∈G,fand gcommutes, i.e.,
f(g·u) = g·f(u),∀u∈U. For instance, convolution layers
are translation-equivariant because the outcome of applying
a 2D translation to the input taken by convolution layers is
identical to that of applying the 2D translation to the feature