1 PCKRF Point Cloud Completion and Keypoint Refinement with Fusion Data for 6D Pose

2025-04-28 2 0 6.82MB 15 页 10玖币

侵权投诉

PCKRF: Point Cloud Completion and Keypoint

Reﬁnement with Fusion Data for 6D Pose

Estimation

Yiheng Han1∗, Irvin Haozhe Zhan2∗, Long Zeng3, Yu-Ping Wang4, Ran Yi5, Minjing Yu6, Matthieu Gaetan Lin2,

Jenny Sheng2and Yong-Jin Liu†

Abstract—Some robust point cloud registration approaches

with controllable pose reﬁnement magnitude, such as ICP and

its variants, are commonly used to improve 6D pose estimation

accuracy. However, the effectiveness of these methods gradually

diminishes with the advancement of deep learning techniques

and the enhancement of initial pose accuracy, primarily due to

their lack of speciﬁc design for pose reﬁnement. In this paper,

we propose Point Cloud Completion and Keypoint Reﬁnement

with Fusion Data (PCKRF), a new pose reﬁnement pipeline for

6D pose estimation. The pipeline consists of two steps. First, it

completes the input point clouds via a novel pose-sensitive point

completion network. The network uses both local and global

features with pose information during point completion. Then, it

registers the completed object point cloud with the corresponding

target point cloud by our proposed Color supported Iterative

KeyPoint (CIKP) method. The CIKP method introduces color

information into registration and registers a point cloud around

each keypoint to increase stability. The PCKRF pipeline can be

integrated with existing popular 6D pose estimation methods,

such as the full ﬂow bidirectional fusion network, to further im-

prove their pose estimation accuracy. Experiments demonstrate

that our method exhibits superior stability compared to existing

approaches when optimizing initial poses with relatively high

precision. Notably, the results indicate that our method effectively

complements most existing pose estimation techniques, leading to

improved performance in most cases. Furthermore, our method

achieves promising results even in challenging scenarios involving

textureless and symmetrical objects. Our source code is available

at https://github.com/zhanhz/KRF.

Index Terms—Pose estimation, Pose reﬁnement, Point cloud

completion, Data Fusion.

I. INTRODUCTION

This work was partially supported by Beijing Natural Science Founda-

tion (L222008) and Natural Science Foundation of Guangdong Province

(2022A1515011234).

1Yiheng Han is with the Faculty of Information Technology, Beijing

University of Technology, Beijing, China. hanyiheng@bjut.edu.cn

2I.H. Zhan, M.G. Lin, J. Sheng and Y-J Liu are with BNRist,

MOE-Key Laboratory of Pervasive Computing, Department of

Computer Science and Technology, Tsinghua University, Beijing, China.

{zhanhz20@mails. yh-lin21@mails. cqq22@mails.

and liuyongjin@}tsinghua.edu.cn

3Long Zeng is with the Department of Advanced Manufacturing, Shen-

zhen International Graduate School, Tsinghua University, Shenzhen, China.

zenglong@sz.tsinghua.edu.cn

4Y-P Wang is with the Beijing Institute of Technology, Beijing, China.

wyp_cs@bit.edu.cn

5Ran Yi is with the Shanghai Jiao Tong University, Shanghai, China.

ranyi@sjtu.edu.cn

6Minjing Yu is with the College of Intelligence and Computing, Tianjin

University, Tianjin, China. minjingyu@tju.edu.cn

∗Joint ﬁrst authors †Corresponding author

Fig. 1. Steps of our method: With input RGBD image (a) (the bottom half

shows the depth map) and initial pose, we transform the visible point cloud

(shown in blue, known object point cloud shown in black) and keypoints

(shown in orange, groundtruth keypoints shown in green) to the object

coordinate system (b). After completing the visible point cloud and sampling

(purple points) around each keypoint within the sphere of radius r(c),

we iteratively register purple and black point cloud (d) and get all reﬁned

keypoints (shown in red) (e). Then, we use the least squares ﬁtting method

to get the ﬁnal pose. The model transformed by the ﬁnal pose is shown in

(f). It is evident that the reﬁned keypoints are closer to groundtruth than the

original keypoints.

6D object pose estimation is an essential component in

various applications, including robotic manipulation [1],

[2], augmented reality [3], and autonomous driving [4], [5].

It has received extensive attention and has led to many

research works over the past decade. Nonetheless, the task

presents considerable challenges due to sensor noise, occlusion

between objects, varying lighting conditions, and symmetries

of objects.

Traditional methods [6], [7] attempted to extract hand-

crafted features from the correspondences between known

RGB images and object mesh models. However, these methods

are less effective in heavy occlusion scenes or on low-texture

objects. With the rapid development of deep learning, Deep

Neural Networks (DNN) are now applied to the 6D object

pose estimation task and demonstrate signiﬁcant performance

improvements. Speciﬁcally, some methods [8]–[10] use DNNs

to directly regress the translation and rotation of each object.

However, the non-linearity of the rotation results in poor

generalization of these methods. More recently, works like

arXiv:2210.03437v3 [cs.CV] 14 Sep 2024

[11]–[13] utilize DNNs to detect the keypoints of each object

and subsequently compute the 6D pose parameters using

Perspective-n-Point (PnP) for 2D keypoints or Least Squares

methods for 3D keypoints.

While DNN methods can solve the problem more rapidly,

they are still unable to achieve high accuracy due to errors in

segmentation or regression. To achieve higher accuracy and

stability, many works have adopted pose reﬁnement methods,

of which the most common is the Iterative Closest Point

(ICP) [14] algorithm. Given an estimated pose, the method

tries to ﬁnd the nearest neighbor of each point of the source

point cloud in the target point cloud, considers it as the

corresponding point, and solves for the optimal transforma-

tion iteratively. Moreover, works like [8], [15] use DNNs to

extract more features for better performance. However, with

the development of pose estimation networks, performance

improvement of these pose reﬁnement methods becomes less

and less. The limited accuracy of existing registration methods

can be attributed to their reliance on incomplete point clouds to

erroneous correspondences. Besides, despite the widespread

use of color information in 6D estimation, its potential to en-

hance registration accuracy remains largely unexplored. Con-

ventional methods have not effectively exploited the beneﬁts of

color information and are primarily designed to solve the large-

scale optimization problem of point cloud registration, rather

than to deal with the small-scale problem of pose reﬁnement,

resulting in an untapped area of research.

Our reﬁnement method mainly contains two modules.

Firstly, we propose a point cloud completion network to fully

utilize the point cloud and RGB data. Our composite encoder

of the network has two branches: the local branch fuses the

RGB and point cloud information at each corresponding pixel,

and the global branch extracts the feature of the whole point

cloud. The decoder of the network follows [16] and employs

a multistage point generation structure. Additionally, we add

a keypoint detection module to the point cloud completion

network during the training process to improve the sensitivity

of the completed point cloud to pose accuracy, leading to better

pose optimization. Secondly, to use color and point cloud data

in registration and to enhance method stability, we propose

a novel method named Color supported Iterative KeyPoint

(CIKP), which samples the point cloud surrounding each key

point and leverages both RGB and point cloud information to

reﬁne object keypoints iteratively. However, the CIKP method

will make it hard to reﬁne all key points when the point

cloud is incomplete, which limits its performance. To address

this issue, we introduce a combination of our completion

network and the CIKP method, referred to as Point Cloud

Completion and Keypoint Reﬁnement with Fusion (PCKRF).

This integrated approach enables the reﬁnement of the initial

pose prediction from the pose estimation network. We further

conduct extensive experiments on YCB-Video [10] and Occlu-

sion LineMOD [6] datasets to evaluate our method. The results

demonstrate that our method can be effectively integrated with

most existing pose estimation techniques, leading to improved

performance in most cases.

Our main contribution is threefold:

•PCKRF: A pipeline that combines our completion net-

work and CIKP method, utilizing RGBD information and

keypoints throughout the reﬁnement.

•A novel point completion network that includes a com-

posite encoder and adds a keypoint detection module.

•A novel iterative pose reﬁnement method CIKP that

uses both RGB and point cloud information based on

keypoints reﬁnement.

Experiments demonstrate that our PCKRF exhibits superior

stability compared to existing approaches when optimizing

initial poses with relatively high precision. Notably, the results

indicate that our method can be effectively integrated with

most existing pose estimation techniques, leading to improved

performance in most cases. Furthermore, our method achieves

promising results even in challenging scenarios involving

textureless and symmetrical objects.

II. RELATED WORKS

A. Pose Estimation

Pose estimation methods can be categorized into two types

based on their optimization goal: holistic and keypoint-based

methods. Holistic methods predict the 3D position and ori-

entation of objects directly from the provided RGB and/or

depth images. Traditional template-based methods construct

a rigid template for an object from different viewpoints and

compute the best-matched pose for the given image [17],

[18]. Recently, some works utilized DNNs to directly regress

or classify the 6D pose of objects. PoseCNN [10] used a

multi-stage network to predict pose. It ﬁrst utilized Hough

Voting to determine the center location of objects and then

directly regressed 3D rotation parameters. SSD-6D [19] ﬁrst

detected objects in the images and then classiﬁed their poses.

DenseFusion [8] fused RGB and depth values at the per-pixel

level, which signiﬁcantly impacted 6D pose estimation meth-

ods based on RGBD images. However, the non-linearity of the

rotation makes it challenging for the loss function to converge.

Recently, Neural Radiance Fields have also been employed

for 6D pose estimation, showcasing signiﬁcant inspiration and

research potential [20].

Pose estimation using only point cloud information is also

called point cloud registration. Recently, the advancements

in deep neural networks, particularly in three-dimensional

geometry with methods like PointNet [21] and DGCNN [22],

have signiﬁcantly propelled the progress of deep point cloud

registration. These methods are centered around the idea of

utilizing deep neural networks to extract features from cross-

source point clouds. These extracted features then serve as

the basis for registrations or are directly used to regress

transformation matrices. Techniques like SpinNet [23] aim

to extract robust point descriptors through specialized neural

network designs, focusing on feature learning. However, its

reliance on a voxelization preprocessing step poses a challenge

when dealing with cross-modality point clouds. Another ap-

proach, D3Feat [24], constructs features based on k-nearest

neighbors. Nonetheless, this descriptor tends to struggle when

confronted with signiﬁcant density disparities. Beyond these

Fig. 2. The upper diagram features the PCKRF pipeline and the lower diagram is the architecture of our point cloud completion network. In the preprocessing

step, we utilize the segmentation result and pose of the target object given by the pose estimation network to obtain the partial point cloud in the object

coordinate system. The PCKRF pipeline ﬁrst completes the partial point cloud by the point completion network and then reﬁnes the initial pose by our CIKP

method. In the point cloud completion network, the Feature Extractor fuses the point cloud and RGB color at each corresponding pixel, and the Keypoint

Detector predicts the offset from each point to each keypoint to improve the sensitivity of the completed point cloud to pose accuracy. The loss function of

the completion network is a joint optimization of the keypoint detector Loss Lkp and the completion decoder Loss Lcd.

point descriptor-centered methodologies, several strategies em-

phasize feature matching. For instance, Deep Global Regis-

tration (DGR) [25] employs a UNet architecture to discern

whether a point pair corresponds, reinterpreting the feature-

matching challenge as a binary classiﬁcation task. Alterna-

tively, transformation learning approaches directly estimate

transformations through neural networks. Feature-metric regis-

tration (FMR) [26] introduces a technique that aligns two point

clouds by minimizing their feature metric projection error,

offering a unique approach to point cloud registration. More

recently, attempts have been made to leverage the Transformer

for aggregating context between two point clouds, followed

by estimating correspondences through the utilization of dual

normalization [27] or some end-to-end pipelines without key

points [28]. Another incremental method [29] that combined

with deep-learned methods has also achieved excellent results.

Moreover, the diffusion model is also applied to point cloud

registration [30]. To further verify the effectiveness of our

pipeline and its performance on texture-free objects, we se-

lected a representative work [27], modiﬁed our framework,

and conducted testing experiments using only the point cloud

information.

Keypoint-based methods provide one way to address the

above problems. Keypoint differs from superpoint [27], [31],

[32], which relies on clustering or patching for point cloud

registration without a prior model. Each keypoint is calculated

based on the speciﬁc geometric features of the given model.

YOLO-6D [33] employed the popular object detection model

YOLO to predict 8 points and a center point for each bounding

box of the object projected onto the 2D image. The method

then computed the 6D pose using the PnP algorithm. PVNet

[11] predicted a unit vector to each keypoint for each pixel,

then voted the 2D location for each keypoint and calculated

the ﬁnal pose using the PnP algorithm. PVN3D [12] used

additional depth information to detect 3D keypoints via Hough

Voting and calculated the 6D pose parameters with the Least

Squares method. In order to fully exploit the RGB and depth

data, FFB6D [34] proposed a novel feature extraction network

that applies fusion at both the encoding and decoding layers.

B. Pose Reﬁnement

Most of the methods mentioned above apply pose reﬁne-

ment techniques to further improve the accuracy of their

results. The most commonly used method is ICP [14], but

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1PCKRF:PointCloudCompletionandKeypointRefinementwithFusionDatafor6DPoseEstimationYihengHan1∗,IrvinHaozheZhan2∗,LongZeng3,Yu-PingWang4,RanYi5,MinjingYu6,MatthieuGaetanLin2,JennySheng2andYong-JinLiu†Abstract—Somerobustpointcloudregistrationapproacheswithcontrollableposerefinementmagnitude,suchasICPand...

展开>> 收起<<

1 PCKRF Point Cloud Completion and Keypoint Refinement with Fusion Data for 6D Pose.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 PCKRF Point Cloud Completion and Keypoint Refinement with Fusion Data for 6D Pose

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: