DCL-Net Deep Correspondence Learning Network for 6D Pose Estimation Hongyang Li1 Jiehong Lin12 and Kui Jia13

2025-05-06 0 0 2.64MB 18 页 10玖币
侵权投诉
DCL-Net: Deep Correspondence Learning
Network for 6D Pose Estimation
Hongyang Li1, Jiehong Lin1,2, and Kui Jia1,3
1South China University of Technology
2DexForce Co. Ltd.
3Peng Cheng Laboratory
{eeli.hongyang,lin.jiehong}@mail.scut.edu.cn, kuijia@scut.edu.cn
Abstract. Establishment of point correspondence between camera and
object coordinate systems is a promising way to solve 6D object poses.
However, surrogate objectives of correspondence learning in 3D space are
a step away from the true ones of object pose estimation, making the
learning suboptimal for the end task. In this paper, we address this short-
coming by introducing a new method of Deep Correspondence Learning
Network for direct 6D object pose estimation, shortened as DCL-Net.
Specifically, DCL-Net employs dual newly proposed Feature Disengage-
ment and Alignment (FDA) modules to establish, in the feature space,
partial-to-partial correspondence and complete-to-complete one for par-
tial object observation and its complete CAD model, respectively, which
result in aggregated pose and match feature pairs from two coordinate
systems; these two FDA modules thus bring complementary advantages.
The match feature pairs are used to learn confidence scores for measur-
ing the qualities of deep correspondence, while the pose feature pairs
are weighted by confidence scores for direct object pose regression. A
confidence-based pose refinement network is also proposed to further
improve pose precision in an iterative manner. Extensive experiments
show that DCL-Net outperforms existing methods on three benchmark-
ing datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD;
ablation studies also confirm the efficacy of our novel designs. Our code is
released publicly at https://github.com/Gorilla-Lab-SCUT/DCL-Net.
Keywords: 6D Pose Estimation, Correspondence Learning
1 Introduction
6D object pose estimation is a fundamental task of 3D semantic analysis with
many real-world applications, such as robotic grasping [7,44], augmented reality
[27], and autonomous driving [8,9,21,42]. Non-linearity of the rotation space of
SO(3) makes it hard to handle this nontrivial task through direct pose regression
from object observations [6,11,15,18,2426,39,45,47]. Many of the data-driven
methods [3,14,20,23,28,31,33,34,38,41] thus achieve the estimation by learning
point correspondence between camera and object coordinate systems.
Equal contribution
Corresponding author
arXiv:2210.05232v1 [cs.CV] 11 Oct 2022
2 H. Li et al.
(a) Partial-to-Partial Correspondence
Partial
observation
(cam)
Partial
prediction
(obj)
(b) Complete-to-Complete Correspondence
Complete
CAD model
(obj)
Complete
prediction
(cam)
Fig. 1. Illustrations of two kinds of point correspondence between camera coordinate
system (cam) and object coordinate system (obj). Best view in the electronic version.
Given a partial object observation in camera coordinate system along with
its CAD model in object coordinate one, we show in Fig. 1two possible ways
to build point correspondence: i) inferring the observed points in object co-
ordinate system for partial-to-partial correspondence; ii) inferring the sampled
points of CAD model in camera coordinate system for complete-to-complete
correspondence. These two kinds of correspondence show different advantages.
The partial-to-partial correspondence is of higher qualities than the complete-to-
complete one due to the difficulty in shape completion, while the latter is more
robust to figure out poses for objects with severe occlusions, which the former
can hardly handle with.
While these methods are promising by solving 6D poses from point corre-
spondence (e.g., via a PnP algorithm), their surrogate correspondence objec-
tives are a step away from the true ones of estimating 6D object poses, thus
making their learnings suboptimal for the end task [40]. To this end, we present
a novel method to realize the above two ways of correspondence establishment
in the feature space via dual newly proposed Feature Disengagement and Align-
ment (FDA) modules, and directly estimate object poses from feature pairs of
two coordinate systems, which are weighted by confidence scores measuring the
qualities of deep correspondence. We term our method as Deep Correspondence
Learning Network, shortened as DCL-Net. Fig. 2gives the illustration.
For the partial object observation and its CAD model, DCL-Net firstly ex-
tracts their point-wise feature maps in parallel; then dual Feature Disengage-
ment and Alignment (FDA) modules are designed to establish, in feature space,
the partial-to-partial correspondence and the complete-to-complete one between
camera and object coordinate systems. Specifically, each FDA module takes as
inputs two point-wise feature maps, and disengages each feature map into indi-
vidual pose and match ones; the match feature maps of two systems are then
used to learn an attention map for building deep correspondence; finally, both
pose and match feature maps are aligned and paired across systems based on the
attention map, resulting in pose and match feature pairs, respectively. DCL-Net
aggregates two sets of correspondence together, since they bring complemen-
tary advantages, by fusing the respective pose and match feature pairs of two
FDA modules. The aggregated match feature pairs are used to learn confidence
scores for measuring the qualities of deep correspondence, while the pose ones
are weighted by the scores to directly regress object poses. A confidence-based
pose refinement network is also proposed to further improve the results of DCL-
DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation 3
Net in an iterative manner. Extensive experiments show that DCL-Net outper-
forms existing methods for 6D object pose estimation on three well-acknowledged
datasets, including YCB-Video [4], LineMOD [16], and Occlusion-LineMOD [3];
remarkably, on the more challenging Occlusion-LineMOD, our DCL-Net outper-
forms the state-of-the-art method [13] with an improvement of 4.4% on the met-
ric of ADD(S), revealing the strength of DCL-Net on handling with occlusion.
Ablation studies also confirm the efficacy of individual components of DCL-Net.
Our technical contributions are summarized as follows:
We design a novel Feature Disengagement and Alignment (FDA) module to
establish deep correspondence between two point-wise feature maps from
different coordinate systems; more specifically, FDA module disengages each
feature map into individual pose and match ones, which are then aligned
across systems to generate pose and match feature pairs, respectively, such
that deep correspondence is established within the aligned feature pairs.
We propose a new method of Deep Correspondence Learning Network for
direct regression of 6D object poses, termed as DCL-Net, which employs
dual FDA modules to establish, in feature space, partial-to-partial corre-
spondence and complete-to-complete one between camera and object coor-
dinate systems, respectively; these two FDA modules bring complementary
advantages.
Match feature pairs of dual FDA modules are aggregated and used for learn-
ing of confidence scores to measure the qualities of correspondence, while
pose feature pairs are weighted by the scores for estimation of 6D pose;
a confidence-based pose refinement network is also proposed to iteratively
improve pose precision.
2 Related Work
6D Pose Estimation from RGB Data This body of works can be broadly
categorized into three types: i) holistic methods [11,15,18] for directly estimating
object poses; ii) keypoint-based methods [28,33,34], which establish 2D-3D cor-
respondence via 2D keypoint detection, followed by a PnP/RANSAC algorithm
to solve the poses; iii) dense correspondence methods [3,20,23,31], which make
dense pixel-wise predictions and vote for the final results.
Due to loss of geometry information, these methods are sensitive to lighting
conditions and appearance textures, and thus inferior to the RGB-D methods.
6D Pose Estimation from RGB-D Data Depth maps provide rich geometry
information complementary to appearance one from RGB images. Traditional
methods [3,16,32,37,43] solve object poses by extracting features from RGB-
D data and performing correspondence grouping and hypothesis verification.
Earlier deep methods, such as PoseCNN [45] and SSD-6D [19], learn coarse poses
firstly from RGB images, and refine the poses on point clouds by using ICP [2] or
MCN [22]. Recently, learning deep features of point clouds becomes an efficient
4 H. Li et al.
way to improve pose precision, especially for methods [39,47] of direct regression,
which make efforts to enhance pose embeddings from deep geometry features,
due to the difficulty in the learning of rotations from a nonlinear space. Wang et
al. present DenseFusion [39], which fuses local features of RGB images and point
clouds in a point-wise manner, and thus explicitly reasons about appearance
and geometry information to make the learning more discriminative; due to
the incomplete and noisy shape information, Zhou et al. propose PR-GCN [47]
to polish point clouds and enhance pose embeddings via Graph Convolutional
Network. On the other hand, dense correspondence methods show the advantages
of deep networks on building the point correspondence in Euclidean space; for
example, He et al. propose PVN3D [14] to regress dense keypoints, and achieve
remarkable results. While promising, these methods are usually trained with
surrogate objectives instead of the true ones of estimating 6D poses, making the
learning suboptimal for the end task.
Our proposed DCL-Net borrows the idea from dense correspondence meth-
ods by learning deep correspondence in feature space, and weights the feature
correspondence based on confidence scores for direct estimation of object poses.
Besides, the learned correspondence is also utilized by an iterative pose refine-
ment network for precision improvement.
3 Deep Correspondence Learning Network
Given the partial object observation Xcin the camera coordinate system, along
with the object CAD model Yoin the object coordinate one, our goal is to
estimate the 6D pose (R,t) between these two systems, where RSO(3) stands
for a rotation, and tR3for a translation.
Fig. 2gives the illustration of our proposed Deep Correspondence Learning
Network (dubbed DCL-Net). DCL-Net firstly extracts point-wise features of
Xcand Yo(cf. Sec. 3.1), then establishes correspondence in feature space via
dual Feature Disengagement and Alignment modules (cf. Sec. 3.2), and finally
regresses the object pose (R,t) with confidence scores based on the learned
deep correspondence (cf. Sec. 3.3). The training objectives of DCL-Net are given
in Sec. 3.4. A confidence-based pose refinement network is also introduced to
iteratively improve pose precision (cf. Sec. 3.5).
3.1 Point-wise Feature Extraction
We represent the inputs of the object observation Xcand its CAD model Yoas
(IXc,PXc) and (IYo,PYo) with NXand NYsampled points, respectively, where
Pdenotes a point set, and Idenotes RGB values corresponding to points in P.
As shown in Fig. 2, we use two parallel backbones to extract their point-wise
features FXcand FYo, respectively. Following [12], both backbones are built
based on 3D Sparse Convolutions [10], of which the volumetric features are then
converted to point-level ones; more details about the architectures are given in
the supplementary material. Note that for each object instance, FYocan be
pre-computed during inference for efficiency.
摘要:

DCL-Net:DeepCorrespondenceLearningNetworkfor6DPoseEstimationHongyangLi1∗,JiehongLin1,2∗,andKuiJia1,3†1SouthChinaUniversityofTechnology2DexForceCo.Ltd.3PengChengLaboratory{eeli.hongyang,lin.jiehong}@mail.scut.edu.cn,kuijia@scut.edu.cnAbstract.Establishmentofpointcorrespondencebetweencameraandobjectco...

展开>> 收起<<
DCL-Net Deep Correspondence Learning Network for 6D Pose Estimation Hongyang Li1 Jiehong Lin12 and Kui Jia13.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:2.64MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注