Coarse-to-Fine Point Cloud Registration with SE3-Equivariant Representations Cheng-Wei Lin1 Tung-I Chen1 Hsin-Ying Lee1 Wen-Chin Chen1 and Winston H. Hsu12

2025-04-27 0 0 2.17MB 8 页 10玖币

侵权投诉

Coarse-to-Fine Point Cloud Registration with SE(3)-Equivariant

Representations

Cheng-Wei Lin1, Tung-I Chen1, Hsin-Ying Lee1, Wen-Chin Chen1, and Winston H. Hsu1,2

Abstract— Point cloud registration is a crucial problem in

computer vision and robotics. Existing methods either rely

on matching local geometric features, which are sensitive to

the pose differences, or leverage global shapes, which leads

to inconsistency when facing distribution variances such as

partial overlapping. Combining the advantages of both types of

methods, we adopt a coarse-to-ﬁne pipeline that concurrently

handles both issues. We ﬁrst reduce the pose differences

between input point clouds by aligning global features; then

we match the local features to further reﬁne the inaccurate

alignments resulting from distribution variances. As global

feature alignment requires the features to preserve the poses

of input point clouds and local feature matching expects the

features to be invariant to these poses, we propose an SE(3)-

equivariant feature extractor to simultaneously generate two

types of features. In this feature extractor, representations

that preserve the poses are ﬁrst encoded by our novel SE(3)-

equivariant network and then converted into pose-invariant

ones by a pose-detaching module. Experiments demonstrate

that our proposed method increases the recall rate by 20%

compared to state-of-the-art methods when facing both pose

differences and distribution variances.

I. INTRODUCTION

Point cloud registration, the task of ﬁnding rigid transfor-

mations that align two input point clouds, is fundamental

for several applications in computer vision and robotics. De-

pending on the application domain, point cloud registration

methods need to fulﬁll the demand for various properties

[1]. For example, some applications such as SLAM [2], [3]

require the registration method to be real-time and accurate.

Other applications such as scene reconstruction [4] expect the

registration method to be robust to initial pose conditions.

The diversiﬁed requirements lead to a variety of regis-

tration approaches. Some prior approaches [5]–[7], namely

local approaches, focused on time-efﬁciency and accuracy.

However, due to the dependence on matching local geometric

features, these approaches are sensitive to the magnitude

changes in rigid transformations, and thus fail to handle large

initial pose differences (Fig. 1-a). On the other hand, global

approaches [8], [9] leverage global shape information to

maintain robust against initial pose difference. ‘Nonetheless,

these global approaches usually produce alignment results

inferior to those of local approaches when facing distribution

variances, such as partial overlapping (Fig. 1-b). Distribution

variances affect the overall shape that global methods rely

on, while regional geometries remain unchanged.

To solve the registration problems with both large initial

pose differences and distribution variances, we adopt a

coarse-to-ﬁne pipeline that takes advantage of both global

1National Taiwan University, 2Mobile Drive Technology

Input (a) (b) (c)

Fig. 1. The registration results of methods from each category. We denote

source point clouds as red and target point clouds as blue. (a) Local methods

(e.g. IDAM) only produce accurate alignments when initial errors are small

enough. (b) Global methods (e.g. DeepGMR) are invariant to initial errors

but their performance are unreliable. In contrast, (c) our coarse-to-ﬁne

method accurately aligns point clouds regardless of initial errors.

and local approaches (Fig. 2-a). A global register is applied

as our coarse-grained register to reduce pose differences

and roughly align input point clouds. Then, we utilize a

local register to reﬁne the inaccurate alignments caused by

distribution variances.

On top of that, we employ a shared feature extractor to

generate representations for both global and local registers.

The feature alignment process in the global register requires

representations to preserve the pose of input point clouds.

On the contrary, the correspondence matching in the local

caused by the differences in the poses.

In our feature extractor, an SE(3)-equivariant neural net-

work and a pose-detaching module are proposed to pro-

duce pose-preserving features and convert them into pose-

invariant ones, respectively (Sec. III-A shows more details

about SE(3)-equivariance). Unlike existing SE(3)-equivariant

networks [10], [11], our SE(3)-equivariant neural network

avoids time-intensive approximations and kernels that con-

strain the expressiveness of the representation. We preserve

the translations by maintaining the center of feature em-

bedding and preserve the rotations using Vector Neuron

[12], a rotation-equivariant framework. Furthermore, our

pose-detaching module normalizes the input point clouds

arXiv:2210.02045v2 [cs.CV] 4 Mar 2023

to remove the translations and utilizes the orthogonality of

rotation matrices to eliminate the rotations. Consequently,

our novel SE(3)-equivariant feature extractor concurrently

produces pose-preserving and pose-invariant representations,

supporting both coarse- and ﬁne-grained registers to effort-

lessly perform registrations.

We evaluate our method on ModelNet40 [13], which is

composed of various object models. Following RPMNet

[14], we pre-processed the dataset to simulate real-world

situations, including sensor noises, independent scans, and

partially overlapping point clouds. Furthermore, we evaluate

the performance over multiple initial angle ranges to exhibit

the inﬂuence of initial poses. Experimental results demon-

strate that our method outperforms state-of-the-art methods

and reaches a reliable performance under simulated real-

world scenarios. In addition, ablations (Sec. IV-F) support

that our feature extractor satisﬁes the feature requirements

of both global and local registers yet remains time efﬁcient.

To sum up, the overall contributions of this work can be

summarized as follows:

•We apply a coarse-to-ﬁne pipeline to resist impacts from

both initial pose differences and distribution variances.

•We introduce a novel SE(3)-equivariant feature extrac-

tor, simultaneously obtaining representations for both

global and local registers.

•Our method outperforms state-of-the-art methods on

ModelNet40 under circumstances simulating the real

world across different initial pose difference ranges.

II. RELATED WORK

A. Local Registration Methods

Local approaches are often used under circumstances

where transformations are known to be small in magnitude.

Iterative Closest Point (ICP) [5] iteratively matches the

closest points as correspondences and minimizes the distance

between these correspondences, which often causes the result

to converge at a local minimum. To resolve this problem,

a variety of strategies have been proposed to deal with

outliers [15], handle the noises [16], or devise better distance

metrics [17], [18]. However, the limitation of matching

points on Euclidean space leads to recent work performing

matching on feature space. PPFNet [19], 3DSmoothNet [20],

SpinNet [21], and FCGF [22] follow this idea and solve

the Procrustes problem based on the correspondences paired

by their representations. Moreover, DCP [6], IDAM [7],

RPMNet [14], DGR [23], ImLoveNet [24], and DetarNet

[25] use the ground truth poses to supervise point matching

and feature learning. Predator [26] and REGTR [27] further

leverage the ground truth overlap regions. Another branch

of work such as D3Feat [28], DeepVCP [29], PRNet [30],

and GeoTransformer [31] leverages key points to enhance

the time efﬁciency. The remaining challenge is that perfect

correspondences rarely exist in real-world situations, and

thereby recent work [14], [32] utilizes soft matching [33] to

work under these conditions. Even so, these local methods

still fail to handle large initial perturbations.

B. Global Registration Methods

Unlike local approaches, global approaches are designed to

be invariant to the initial transformation error. Some methods

such as GO-ICP [34], GOGMA [35], and GOSMA [36]

search the SE(3) space using branch-and-bound techniques.

Other methods [37]–[39] match the feature with robust

optimization. However, these methods are unsuitable for

real-time applications due to their large computation time.

Fast Global Registration (FGR) [40] is presented to address

this issue, achieving a similar speed to that of many local

methods. To further improve the accuracy of the registration

result, recent work handles the registration problem via

learned global representations. DeepGMR [8] represents the

global feature through GMM distributions and EquivReg

[9] takes rotation-equivariant implicit feature embeddings as

its global representations. Nevertheless, these learning-based

methods often struggle with distribution differences.

C. Group Equivariant Neural Network

Some research concentrates on proposing group equivari-

ant neural networks as a means of resisting group transfor-

mations. For instance, Convolution Neural Network (CNN)

[41] are translation-equivariant, resulting in its performance

consistency among the same images with different 2D trans-

lations. To prevent the effect of rotation, recent studies [11],

[42], [43] construct the kernels by some steerable functions.

However, these constrained kernels limit the ﬂexibility of

the network. Other studies [10], [44] obtain the equivariance

property by lifting the input space to higher-dimensional

spaces where the group is contained. These studies are time-

intensive and cost more computational resources due to the

integration of the entire group. Vector Neuron [12] presents

a brand new SO(3)-equivariant framework. The major ad-

vantage of this framework is the capability of incorporating

the SO(3)-equivariance property into existing networks, such

as PointNet [45] or DGCNN [46]. We will later see how

we design our SE(3)-equivariant feature extractor based on

this simple idea, and use the extracted representation to cope

with the registrations with notable initial transformations and

distribution variances.

III. COARSE-TO-FINE REGISTRATION

Illustrated in Fig. 2, our coarse-to-ﬁne registration pipeline

begins with extracting global and local feature representa-

tions. These global representations are fed into the global

clouds. Focusing on roughly aligned point clouds, the local

dences formed by matching these local representations.

A. Preliminaries

A function f:U→Vis equivariant to a set of trans-

formations G, if for any g∈G,fand gcommutes, i.e.,

f(g·u) = g·f(u),∀u∈U. For instance, convolution layers

are translation-equivariant because the outcome of applying

a 2D translation to the input taken by convolution layers is

identical to that of applying the 2D translation to the feature

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Coarse-to-FinePointCloudRegistrationwithSE(3)-EquivariantRepresentationsCheng-WeiLin1,Tung-IChen1,Hsin-YingLee1,Wen-ChinChen1,andWinstonH.Hsu1;2AbstractPointcloudregistrationisacrucialproblemincomputervisionandrobotics.Existingmethodseitherrelyonmatchinglocalgeometricfeatures,whicharesensitivetothe...

展开>> 收起<<

Coarse-to-Fine Point Cloud Registration with SE3-Equivariant Representations Cheng-Wei Lin1 Tung-I Chen1 Hsin-Ying Lee1 Wen-Chin Chen1 and Winston H. Hsu12.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Coarse-to-Fine Point Cloud Registration with SE3-Equivariant Representations Cheng-Wei Lin1 Tung-I Chen1 Hsin-Ying Lee1 Wen-Chin Chen1 and Winston H. Hsu12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: