1 PCKRF Point Cloud Completion and Keypoint Refinement with Fusion Data for 6D Pose

2025-04-28 0 0 6.82MB 15 页 10玖币
侵权投诉
1
PCKRF: Point Cloud Completion and Keypoint
Refinement with Fusion Data for 6D Pose
Estimation
Yiheng Han1, Irvin Haozhe Zhan2, Long Zeng3, Yu-Ping Wang4, Ran Yi5, Minjing Yu6, Matthieu Gaetan Lin2,
Jenny Sheng2and Yong-Jin Liu
Abstract—Some robust point cloud registration approaches
with controllable pose refinement magnitude, such as ICP and
its variants, are commonly used to improve 6D pose estimation
accuracy. However, the effectiveness of these methods gradually
diminishes with the advancement of deep learning techniques
and the enhancement of initial pose accuracy, primarily due to
their lack of specific design for pose refinement. In this paper,
we propose Point Cloud Completion and Keypoint Refinement
with Fusion Data (PCKRF), a new pose refinement pipeline for
6D pose estimation. The pipeline consists of two steps. First, it
completes the input point clouds via a novel pose-sensitive point
completion network. The network uses both local and global
features with pose information during point completion. Then, it
registers the completed object point cloud with the corresponding
target point cloud by our proposed Color supported Iterative
KeyPoint (CIKP) method. The CIKP method introduces color
information into registration and registers a point cloud around
each keypoint to increase stability. The PCKRF pipeline can be
integrated with existing popular 6D pose estimation methods,
such as the full flow bidirectional fusion network, to further im-
prove their pose estimation accuracy. Experiments demonstrate
that our method exhibits superior stability compared to existing
approaches when optimizing initial poses with relatively high
precision. Notably, the results indicate that our method effectively
complements most existing pose estimation techniques, leading to
improved performance in most cases. Furthermore, our method
achieves promising results even in challenging scenarios involving
textureless and symmetrical objects. Our source code is available
at https://github.com/zhanhz/KRF.
Index Terms—Pose estimation, Pose refinement, Point cloud
completion, Data Fusion.
I. INTRODUCTION
This work was partially supported by Beijing Natural Science Founda-
tion (L222008) and Natural Science Foundation of Guangdong Province
(2022A1515011234).
1Yiheng Han is with the Faculty of Information Technology, Beijing
University of Technology, Beijing, China. hanyiheng@bjut.edu.cn
2I.H. Zhan, M.G. Lin, J. Sheng and Y-J Liu are with BNRist,
MOE-Key Laboratory of Pervasive Computing, Department of
Computer Science and Technology, Tsinghua University, Beijing, China.
{zhanhz20@mails. yh-lin21@mails. cqq22@mails.
and liuyongjin@}tsinghua.edu.cn
3Long Zeng is with the Department of Advanced Manufacturing, Shen-
zhen International Graduate School, Tsinghua University, Shenzhen, China.
zenglong@sz.tsinghua.edu.cn
4Y-P Wang is with the Beijing Institute of Technology, Beijing, China.
wyp_cs@bit.edu.cn
5Ran Yi is with the Shanghai Jiao Tong University, Shanghai, China.
ranyi@sjtu.edu.cn
6Minjing Yu is with the College of Intelligence and Computing, Tianjin
University, Tianjin, China. minjingyu@tju.edu.cn
Joint first authors Corresponding author
Fig. 1. Steps of our method: With input RGBD image (a) (the bottom half
shows the depth map) and initial pose, we transform the visible point cloud
(shown in blue, known object point cloud shown in black) and keypoints
(shown in orange, groundtruth keypoints shown in green) to the object
coordinate system (b). After completing the visible point cloud and sampling
(purple points) around each keypoint within the sphere of radius r(c),
we iteratively register purple and black point cloud (d) and get all refined
keypoints (shown in red) (e). Then, we use the least squares fitting method
to get the final pose. The model transformed by the final pose is shown in
(f). It is evident that the refined keypoints are closer to groundtruth than the
original keypoints.
6D object pose estimation is an essential component in
various applications, including robotic manipulation [1],
[2], augmented reality [3], and autonomous driving [4], [5].
It has received extensive attention and has led to many
research works over the past decade. Nonetheless, the task
presents considerable challenges due to sensor noise, occlusion
between objects, varying lighting conditions, and symmetries
of objects.
Traditional methods [6], [7] attempted to extract hand-
crafted features from the correspondences between known
RGB images and object mesh models. However, these methods
are less effective in heavy occlusion scenes or on low-texture
objects. With the rapid development of deep learning, Deep
Neural Networks (DNN) are now applied to the 6D object
pose estimation task and demonstrate significant performance
improvements. Specifically, some methods [8]–[10] use DNNs
to directly regress the translation and rotation of each object.
However, the non-linearity of the rotation results in poor
generalization of these methods. More recently, works like
arXiv:2210.03437v3 [cs.CV] 14 Sep 2024
2
[11]–[13] utilize DNNs to detect the keypoints of each object
and subsequently compute the 6D pose parameters using
Perspective-n-Point (PnP) for 2D keypoints or Least Squares
methods for 3D keypoints.
While DNN methods can solve the problem more rapidly,
they are still unable to achieve high accuracy due to errors in
segmentation or regression. To achieve higher accuracy and
stability, many works have adopted pose refinement methods,
of which the most common is the Iterative Closest Point
(ICP) [14] algorithm. Given an estimated pose, the method
tries to find the nearest neighbor of each point of the source
point cloud in the target point cloud, considers it as the
corresponding point, and solves for the optimal transforma-
tion iteratively. Moreover, works like [8], [15] use DNNs to
extract more features for better performance. However, with
the development of pose estimation networks, performance
improvement of these pose refinement methods becomes less
and less. The limited accuracy of existing registration methods
can be attributed to their reliance on incomplete point clouds to
register entire object mesh point clouds, resulting in numerous
erroneous correspondences. Besides, despite the widespread
use of color information in 6D estimation, its potential to en-
hance registration accuracy remains largely unexplored. Con-
ventional methods have not effectively exploited the benefits of
color information and are primarily designed to solve the large-
scale optimization problem of point cloud registration, rather
than to deal with the small-scale problem of pose refinement,
resulting in an untapped area of research.
Our refinement method mainly contains two modules.
Firstly, we propose a point cloud completion network to fully
utilize the point cloud and RGB data. Our composite encoder
of the network has two branches: the local branch fuses the
RGB and point cloud information at each corresponding pixel,
and the global branch extracts the feature of the whole point
cloud. The decoder of the network follows [16] and employs
a multistage point generation structure. Additionally, we add
a keypoint detection module to the point cloud completion
network during the training process to improve the sensitivity
of the completed point cloud to pose accuracy, leading to better
pose optimization. Secondly, to use color and point cloud data
in registration and to enhance method stability, we propose
a novel method named Color supported Iterative KeyPoint
(CIKP), which samples the point cloud surrounding each key
point and leverages both RGB and point cloud information to
refine object keypoints iteratively. However, the CIKP method
will make it hard to refine all key points when the point
cloud is incomplete, which limits its performance. To address
this issue, we introduce a combination of our completion
network and the CIKP method, referred to as Point Cloud
Completion and Keypoint Refinement with Fusion (PCKRF).
This integrated approach enables the refinement of the initial
pose prediction from the pose estimation network. We further
conduct extensive experiments on YCB-Video [10] and Occlu-
sion LineMOD [6] datasets to evaluate our method. The results
demonstrate that our method can be effectively integrated with
most existing pose estimation techniques, leading to improved
performance in most cases.
Our main contribution is threefold:
PCKRF: A pipeline that combines our completion net-
work and CIKP method, utilizing RGBD information and
keypoints throughout the refinement.
A novel point completion network that includes a com-
posite encoder and adds a keypoint detection module.
A novel iterative pose refinement method CIKP that
uses both RGB and point cloud information based on
keypoints refinement.
Experiments demonstrate that our PCKRF exhibits superior
stability compared to existing approaches when optimizing
initial poses with relatively high precision. Notably, the results
indicate that our method can be effectively integrated with
most existing pose estimation techniques, leading to improved
performance in most cases. Furthermore, our method achieves
promising results even in challenging scenarios involving
textureless and symmetrical objects.
II. RELATED WORKS
A. Pose Estimation
Pose estimation methods can be categorized into two types
based on their optimization goal: holistic and keypoint-based
methods. Holistic methods predict the 3D position and ori-
entation of objects directly from the provided RGB and/or
depth images. Traditional template-based methods construct
a rigid template for an object from different viewpoints and
compute the best-matched pose for the given image [17],
[18]. Recently, some works utilized DNNs to directly regress
or classify the 6D pose of objects. PoseCNN [10] used a
multi-stage network to predict pose. It first utilized Hough
Voting to determine the center location of objects and then
directly regressed 3D rotation parameters. SSD-6D [19] first
detected objects in the images and then classified their poses.
DenseFusion [8] fused RGB and depth values at the per-pixel
level, which significantly impacted 6D pose estimation meth-
ods based on RGBD images. However, the non-linearity of the
rotation makes it challenging for the loss function to converge.
Recently, Neural Radiance Fields have also been employed
for 6D pose estimation, showcasing significant inspiration and
research potential [20].
Pose estimation using only point cloud information is also
called point cloud registration. Recently, the advancements
in deep neural networks, particularly in three-dimensional
geometry with methods like PointNet [21] and DGCNN [22],
have significantly propelled the progress of deep point cloud
registration. These methods are centered around the idea of
utilizing deep neural networks to extract features from cross-
source point clouds. These extracted features then serve as
the basis for registrations or are directly used to regress
transformation matrices. Techniques like SpinNet [23] aim
to extract robust point descriptors through specialized neural
network designs, focusing on feature learning. However, its
reliance on a voxelization preprocessing step poses a challenge
when dealing with cross-modality point clouds. Another ap-
proach, D3Feat [24], constructs features based on k-nearest
neighbors. Nonetheless, this descriptor tends to struggle when
confronted with significant density disparities. Beyond these
3
Fig. 2. The upper diagram features the PCKRF pipeline and the lower diagram is the architecture of our point cloud completion network. In the preprocessing
step, we utilize the segmentation result and pose of the target object given by the pose estimation network to obtain the partial point cloud in the object
coordinate system. The PCKRF pipeline first completes the partial point cloud by the point completion network and then refines the initial pose by our CIKP
method. In the point cloud completion network, the Feature Extractor fuses the point cloud and RGB color at each corresponding pixel, and the Keypoint
Detector predicts the offset from each point to each keypoint to improve the sensitivity of the completed point cloud to pose accuracy. The loss function of
the completion network is a joint optimization of the keypoint detector Loss Lkp and the completion decoder Loss Lcd.
point descriptor-centered methodologies, several strategies em-
phasize feature matching. For instance, Deep Global Regis-
tration (DGR) [25] employs a UNet architecture to discern
whether a point pair corresponds, reinterpreting the feature-
matching challenge as a binary classification task. Alterna-
tively, transformation learning approaches directly estimate
transformations through neural networks. Feature-metric regis-
tration (FMR) [26] introduces a technique that aligns two point
clouds by minimizing their feature metric projection error,
offering a unique approach to point cloud registration. More
recently, attempts have been made to leverage the Transformer
for aggregating context between two point clouds, followed
by estimating correspondences through the utilization of dual
normalization [27] or some end-to-end pipelines without key
points [28]. Another incremental method [29] that combined
with deep-learned methods has also achieved excellent results.
Moreover, the diffusion model is also applied to point cloud
registration [30]. To further verify the effectiveness of our
pipeline and its performance on texture-free objects, we se-
lected a representative work [27], modified our framework,
and conducted testing experiments using only the point cloud
information.
Keypoint-based methods provide one way to address the
above problems. Keypoint differs from superpoint [27], [31],
[32], which relies on clustering or patching for point cloud
registration without a prior model. Each keypoint is calculated
based on the specific geometric features of the given model.
YOLO-6D [33] employed the popular object detection model
YOLO to predict 8 points and a center point for each bounding
box of the object projected onto the 2D image. The method
then computed the 6D pose using the PnP algorithm. PVNet
[11] predicted a unit vector to each keypoint for each pixel,
then voted the 2D location for each keypoint and calculated
the final pose using the PnP algorithm. PVN3D [12] used
additional depth information to detect 3D keypoints via Hough
Voting and calculated the 6D pose parameters with the Least
Squares method. In order to fully exploit the RGB and depth
data, FFB6D [34] proposed a novel feature extraction network
that applies fusion at both the encoding and decoding layers.
B. Pose Refinement
Most of the methods mentioned above apply pose refine-
ment techniques to further improve the accuracy of their
results. The most commonly used method is ICP [14], but
摘要:

1PCKRF:PointCloudCompletionandKeypointRefinementwithFusionDatafor6DPoseEstimationYihengHan1∗,IrvinHaozheZhan2∗,LongZeng3,Yu-PingWang4,RanYi5,MinjingYu6,MatthieuGaetanLin2,JennySheng2andYong-JinLiu†Abstract—Somerobustpointcloudregistrationapproacheswithcontrollableposerefinementmagnitude,suchasICPand...

展开>> 收起<<
1 PCKRF Point Cloud Completion and Keypoint Refinement with Fusion Data for 6D Pose.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:6.82MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注