Geo6D Geometric Constraints Learning for 6D Pose Estimation Jianqiu Chen1 Mingshan Sun2 Ye Zheng3 Tianpeng Bao2 Zhenyu He1 Donghai Li2 Guoqiang Jin2 Zhao Rui2 Liwei Wu2 Xiaoke Jiang4

2025-05-06 0 0 3.89MB 8 页 10玖币

侵权投诉

Geo6D: Geometric Constraints Learning for 6D Pose Estimation

Jianqiu Chen1, Mingshan Sun 2, Ye Zheng3, Tianpeng Bao 2, Zhenyu He*1,

Donghai Li 2, Guoqiang Jin 2, Zhao Rui 2, Liwei Wu 2, Xiaoke Jiang 4

1Harbin Institute of Technology, Shenzhen

2SenseTime Research

3JD.com, Inc

4International Digital Economy Academy (IDEA)

Abstract

Numerous 6D pose estimation methods have been proposed

that employ end-to-end regression to directly estimate the tar-

get pose parameters. Since the visible features of objects are

implicitly inﬂuenced by their poses, the network allows infer-

ring the pose by analyzing the differences in features in the

visible region. However, due to the unpredictable and unre-

stricted range of pose variations, the implicitly learned visi-

ble feature-pose constraints are insufﬁciently covered by the

training samples, making the network vulnerable to unseen

object poses. To tackle these challenges, we proposed a novel

geometric constraints learning approach called Geo6D for di-

rect regression 6D pose estimation methods. It introduces a

pose transformation formula expressed in relative offset rep-

resentation, which is leveraged as geometric constraints to re-

construct the input and output targets of the network. These

reconstructed data enable the network to estimate the pose

based on explicit geometric constraints and relative offset

representation mitigates the issue of the pose distribution gap.

Extensive experimental results show that when equipped with

Geo6D, the direct 6D methods achieve state-of-the-art perfor-

mance on multiple datasets and demonstrate signiﬁcant effec-

tiveness, even with only 10% amount of data.

Introduction

6D pose estimation has drawn widespread attention as the

essential prerequisite for emerging applications, such as

robotic manipulation, autonomous driving, and augmented

reality (Geiger, Lenz, and Urtasun 2012; Xu, Anguelov,

and Jain 2018; Chen et al. 2017). In the computer vision

community, several approaches have been proposed to es-

timate the transformation pose from the object frame to

the camera frame. These existing methods can be cate-

gorized into two distinct groups: indirect and direct ap-

proaches. Indirect methods (Wang et al. 2019; He et al. 2020,

2021) usually ﬁrst predict an intermediate feature and then

use post-processing optimization algorithms, such as least-

squares ﬁtting and iterative Perspective-n-Point (PnP) algo-

rithms (Su et al. 2022; Haugaard and Buch 2022; Li, Wang,

and Ji 2019; Kiru, Patten, and Pix2Pose 2019; Rad and Lep-

etit 2017), to calculate the target pose based on the trans-

formation or projection equation. The direct methods (Jiang

*Corresponding author.

Resistance to

unseen pose

Visible region constraint method

Proposed Geo6D constraint method

Fragility to

unseen pose

Fapm

Visible feature-pose

mapping network

{ RGB, XYD }

pose = Fvfp( RGB, XYD )

Camera frame

constraint

Object frame constraint

Fcoor

Fcons

{ RGB, ∆XYD,

D · d0 , t0 / (D · d0 ) }

Only training phase

Train

Test

3D object distribution

Test

Train

...

Figure 1: Existing direct pose estimation methods adopt im-

plicit visible feature-pose constraints, that the visible feature

of objects depends on its pose, having the fragility to unseen

pose. The Geo6D approach introduces novel geometric con-

straints to rebuild the input and optimization target of the

network. It enables the network to regress the pose from ex-

plicit geometric constraints and show the resistance to un-

seen poses.

et al. 2022; Li et al. 2018; Wang et al. 2020, 2021; Sun

et al. 2022; Mo et al. 2022) directly predict the ﬁnal 6D

pose parameters (e.g, rotation angles, and translation vec-

tors) by the neural network in an end-to-end manner. How-

ever, both methods have their respective weaknesses. Al-

though indirect approaches build the geometric constraints

by intermediate geometric features, the detached two-stage

pipeline makes the optimization target suboptimal, and the

time-consuming iterative ﬁtting in the pose estimation stage

is an impediment in reality. On the contrary, the direct meth-

ods have the advantage of efﬁciency and end-to-end opti-

mization. As shown in Fig 1, these leverage the implicit

constraint to estimate the pose that the visible region of an

object in the input image implicitly depends on the object

CAD model, the camera intrinsic, and its pose. Given that

the CAD model and camera intrinsic parameters are com-

monly known, the network is capable of estimating the pose

based on the visible region of the object. However, the dis-

tribution of poses, speciﬁcally the translation component, is

unpredictable and unrestricted. The method of mapping vis-

ible region appearance features to poses cannot cover ev-

arXiv:2210.10959v6 [cs.CV] 22 Aug 2023

ery possibility. This creates a vulnerability to differences be-

tween the distribution of poses in training and testing data,

as well as gaps in appearance domains. This limits the ro-

bustness and accuracy of the system.

To improve the visible feature-pose constraints, recent

methods (Mo et al. 2022; Dong et al. 2019; Zeng et al. 2022)

solve the ambiguity of multiple ground-truth poses relating

to the same visible feature by modeling symmetric objects.

After that, some methods (Sun et al. 2022; Mo et al. 2022)

attempt to leverage an instance segmentation to mitigate the

impact of visible features differences arising from camera

intrinsic factors, the difference in object pose (camera ex-

trinsic) still hampers the capacity of the model to accurately

regress the pose from visible features. Besides, several di-

rect methods (Wang et al. 2020, 2021) introduce an auxil-

iary loss to regress intermediate geometric features, such as

2D-3D correspondences, akin to indirect methods. However,

these geometric features are not complete constraints to en-

able the network to regress the pose parameters based on it.

To solve these issues, we proposed the geometric con-

straints (Geo6D) learning approach that introduces a refor-

mulated pose transformation to establish robust constraints

on both camera and object frames by a relative offset rep-

resentation. Speciﬁcally, the proposed Geo6D constraints

are built upon the pose transformation formula. The rigid

object points’ 3D coordinates on the different frames can

be transformed based on the pose. To address the distribu-

tion gap, we introduce a reference point and reformulate the

pose transformation formula from the camera frame 3D co-

ordinate representation (the offset for the visible point to

the camera) to a relative offset for the visible point to the

selected reference point. For making the formula learning-

friendly and mathematically correct during network ﬁtting,

we separate the variables based on coordinate frames as ex-

plicit geometric constraints, demonstrated in Fig 1. For the

camera frame variables, we supply and linearize all required

variables in the camera frame as input and feed them to the

network. For the object frame constraints, we introduce an

additional regression network output head to predict the cor-

responding relative offset value in the object frame.

We encapsulate the Geo6D mechanism as a plugin, which

rebuilds the input and output targets of the network and

integrates it with two pose estimation networks. Extensive

experiments demonstrate the effectiveness of our method,

without sacriﬁcing efﬁciency in both training and inference

to enhance accuracy and stability and reduce the required

amount of training data. It only requires 10% of training data

to reach the comparable performance of full training data.

Furthermore, we analyze the impact of the Geo6D mecha-

nism from the perspective of the loss function.

To summarize, our main contributions are:

• Introducing a pose transformation formula in a relative

offset presentation to establish explicit geometric con-

straints for direct methods.

• Proposing the Geo6D mechanism, a plugin module that

processes input data and optimization targets to adhere to

the geometric constraints, making the network learning-

friendly and mathematically correct.

• Extensive experimental results demonstrate that the pro-

posed Geo6D effectively improves the accuracy of ex-

isting direct pose estimation methods achieving state-of-

the-art overall results and reducing the training data re-

quirement, thus making it more practical for real-world

applications.

Related work

Indirect 6D pose estimation

Indirect methods ﬁrst predict intermediate geometric infor-

mation and then exploit the projection constraints to esti-

mate the 6D pose by optimization function. Recent meth-

ods (Peng et al. 2019; He et al. 2020, 2021) introduce the

keypoints mechanism in 6D pose estimation and then esti-

mate the 6D pose by a least-squares ﬁtting algorithm, which

takes advantage of the geometric constraints of rigid ob-

jects to train the keypoint prediction network. Different from

the keypoints-based methods, 2D-3D correspondence-based

methods (Su et al. 2022; Hodan, Barath, and Matas 2020;

Haugaard and Buch 2022; Li, Wang, and Ji 2019; Kiru, Pat-

ten, and Pix2Pose 2019; Rad and Lepetit 2017) ﬁrst estab-

lish the correspondences between 2D coordinates in the im-

age plane and 3D coordinates in the object coordinate sys-

tem by the neural network and then solve the 6D pose by a

PnP or RANSAC algorithm. However, these indirect meth-

ods are only optimized in the ﬁrst stage rather than the ﬁnal

pose regression, which is suboptimal compared with direct

methods. Moreover, the optimization is time-consuming and

computationally expensive in practical applications.

Direct 6D pose estimation

To estimate 6D pose efﬁciently, recent approaches (Mo et al.

2022; Jiang et al. 2022; Li et al. 2018; Wang et al. 2020,

2021) directly regress the ﬁnal 6D pose parameters from

the neural network instead of intermediate results. Dense-

fusion (Wang et al. 2019) extracts the visible region features

information from RGB-D images by two separate backbones

to extract the features from 2D and 3D spaces and fuses them

with a dense fusion network. Uni6D (Jiang et al. 2022) sim-

pliﬁes the architecture with a homogeneous single backbone

to process RGB-D data, by introducing the extra UV data

into input to preserve the projection constraints. Since the

corresponding visible features of the object and pose are sen-

sitive to the visual ambiguity of the symmetric object, there

are multiple ground-truth poses related to the same visible

features that confused the network ﬁtting. ES6D individu-

ally models different types of symmetric objects to solve the

issue of multiple pose mapping to the same visible features.

Besides, the camera intrinsic is another factor for visible fea-

tures of the object, Uni6Dv2 adopts an instance segmenta-

tion method to mitigate the impact of visible features differ-

ence from the camera intrinsic difference. However, since

the pose parameters are unpredictable and unrestricted and

the visible features to pose mapping can not be exhaustive

and fragile for the unseen pose of the object in the test scene.

To enhance network training, some methods (Wang et al.

2020, 2021) leverage the intermediate geometric features,

i.e. 2D-3D correspondences, akin to indirect methods as an

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Geo6D:GeometricConstraintsLearningfor6DPoseEstimationJianqiuChen1,MingshanSun2,YeZheng3,TianpengBao2,ZhenyuHe*1,DonghaiLi2,GuoqiangJin2,ZhaoRui2,LiweiWu2,XiaokeJiang41HarbinInstituteofTechnology,Shenzhen2SenseTimeResearch3JD.com,Inc4InternationalDigitalEconomyAcademy(IDEA)AbstractNumerous6Dposeestim...

展开>> 收起<<

Geo6D Geometric Constraints Learning for 6D Pose Estimation Jianqiu Chen1 Mingshan Sun2 Ye Zheng3 Tianpeng Bao2 Zhenyu He1 Donghai Li2 Guoqiang Jin2 Zhao Rui2 Liwei Wu2 Xiaoke Jiang4.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Geo6D Geometric Constraints Learning for 6D Pose Estimation Jianqiu Chen1 Mingshan Sun2 Ye Zheng3 Tianpeng Bao2 Zhenyu He1 Donghai Li2 Guoqiang Jin2 Zhao Rui2 Liwei Wu2 Xiaoke Jiang4

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: