Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing Yun Liu12 Xiaomeng Xu1 Weihang Chen3 Haocheng Yuan4 He Wang5 Jing Xu3 Rui Chen3 and Li Yi126

2025-04-29 0 0 7.95MB 8 页 10玖币

侵权投诉

Enhancing Generalizable 6D Pose Tracking

of an In-Hand Object with Tactile Sensing

Yun Liu∗,1,2, Xiaomeng Xu∗,1, Weihang Chen3, Haocheng Yuan4, He Wang5, Jing Xu3, Rui Chen3, and Li Yi1,2,6

Abstract—When manipulating an object to accomplish com-

plex tasks, humans rely on both vision and touch to keep track

of the object’s 6D pose. However, most existing object pose

tracking systems in robotics rely exclusively on visual signals,

which hinder a robot’s ability to manipulate objects effectively.

To address this limitation, we introduce TEG-Track, a tactile-

enhanced 6D pose tracking system that can track previously

unseen objects held in hand. From consecutive tactile signals,

TEG-Track optimizes object velocities from marker ﬂows when

slippage does not occur, or regresses velocities using a slippage es-

timation network when slippage is detected. The estimated object

velocities are integrated into a geometric-kinematic optimization

scheme to enhance existing visual pose trackers. To evaluate our

method and to facilitate future research, we construct a real-

world dataset for visual-tactile in-hand object pose tracking.

Experimental results demonstrate that TEG-Track consistently

enhances state-of-the-art generalizable 6D pose trackers in syn-

thetic and real-world scenarios. Our code and dataset are available

at https://github.com/leolyliu/TEG-Track.

Index Terms—Force and Tactile Sensing, Sensor Fusion, Visual

Tracking

I. INTRODUCTION

ACCURATE 6D pose tracking of objects is essential

for enabling effective robotic manipulation. Prior re-

search [1]–[3] has demonstrated impressive precision and

robustness for tracking known objects using 3D object mod-

els. Recent studies have further shifted their focus towards

developing generalizable 6D pose tracking methods that can

handle novel object instances from known [4]–[6] or even

unknown [7]–[9] object categories. In this paper, we contribute

to the development of such generalizable 6D pose tracking

techniques, speciﬁcally addressing the in-hand setup shown in

Figure 1 that is commonly encountered in robot manipulation

tasks. Our goal is to consecutively track the 6D pose of an

Manuscript received: July, 18, 2023; Revised October, 17, 2023; Accepted

November, 20, 2023.

This paper was recommended for publication by Editor Pascal Vasseur upon

evaluation of the Associate Editor and Reviewers’ comments.

Project supported by the Young Scientists Fund of the National Natural

Science Foundation of China (Grant No. 62203258).

∗Yun Liu and Xiaomeng Xu are co-ﬁrst authors.

Li Yi is the corresponding author.

1Institute for Interdisciplinary Information Sciences, Tsinghua University,

Beijing, China

2Shanghai Qizhi Institute, Shanghai, China

3Department of Mechanical Engineering, Tsinghua University, Beijing,

China

4Northwestern Polytechnical University, Xian, China

5Center on Frontiers of Computing Studies, Peking University, Beijing,

China

6Shanghai AI Laboratory, Shanghai, China

Digital Object Identiﬁer (DOI): see top of this page.

in-hand object starting from its initial 6D pose. In scenarios

where objects are manipulated by robot hands, relying solely

on robot proprioception could prove challenging, particularly

when external forces from collisions or multi-agent interac-

tions occur. Therefore, it is critical to have an accurate in-

hand object tracker that can precisely capture the object’s

state, especially in contexts involving in-hand manipulation

or rich environmental contacts such as peg-hole insertion.

Furthermore, this research could signiﬁcantly beneﬁt human-

robot collaboration [10]–[12], where sudden changes in the

object’s kinematic state caused by interactions are common.

Existing generalizable 6D pose tracking methods face chal-

lenges in in-hand manipulation scenarios. Compared with

scenes without robot manipulation, visual sensing of the in-

hand object become more distorted and less informative due to

in-hand occlusions, which could impede existing methods that

heavily rely on only visual signals like RGB-D images. As a

remedy, tactile sensing could be integrated into the tracking

process. By equipping the robot hand with tactile sensors such

as GelSight [13], we can capture high-quality geometric and

motion signals from contact areas. Such information from

tactile sensing can complement the noisy visual sensing caused

by occlusions, meanwhile combining with the rapid advance-

ment of tactile sensor technologies [13]–[16], making the

integration feasible and promising. Moreover, precise tactile

sensing captures accurate motions for object contact regions,

providing strong clues for understanding object pose changes.

Therefore, we propose TEG-Track, a general framework

for enhancing generalizable 6D pose tracking of an in-hand

object with tactile sensing. First, from tactile sensing alone,

TEG-Track learns tactile kinematic cues that indicate the

kinematic states of the object. Combining with visual sensing,

TEG-Track then integrates object kinematic states with exist-

ing generalizable visual pose trackers through a geometric-

kinematic optimization strategy. TEG-Track can be easily

plugged into various generalizable pose trackers, including

template-based (introduced in Section V-B), regression-based

[5], and keypoint-based [7] approaches.

To evaluate TEG-Track, we curate synthetic and real-world

datasets due to the lack of datasets supporting generalizable

visual-tactile in-hand object pose tracking research. Since

existing datasets [17], [18] only serve for single-frame object

pose estimation with a small data scale, we collect a large-

scale synthetic object pose tracking dataset with large in-

hand motion variations to test TEG-Track widely in vari-

ous situations. Furthermore, to examine TEG-Track in real

scenarios, we contribute a real-world visual-tactile in-hand

arXiv:2210.04026v2 [cs.CV] 23 Dec 2023

Fig. 1. We propose a general in-hand object pose tracking framework TEG-

Track, and evaluate it on our synthetic and real datasets. Our approach

enhances generalizable visual trackers such as BundleTrack [7] with tactile

sensing. Here we visualize the tracking task as tracking the object’s 3D

bounding box: green boxes denote ground truth poses whereas red boxes

denote estimated poses.

object pose tracking dataset including 200 trajectories covering

17 instances from 5 object categories with careful per-frame

object pose annotations. Experiments demonstrate that TEG-

Track consistently improves the performances of different

generalizable visual pose trackers in both synthetic and real

scenarios. Compared to a state-of-the-art generalizable pose

tracker BundleTrack [7] on our real evaluation set, TEG-Track

achieves 30.9%and 21.4%decreases in the average rotation

and translation errors, respectively.

In summary, our main contributions are threefold: 1) To

the best of our knowledge, we are among the ﬁrst to explore

generalizable in-hand object pose tracking combining visual

and tactile sensing. 2) We present TEG-Track, a visual-tactile

framework that learns tactile kinematic cues from tactile

sensing and then incorporates them into various visual pose

trackers with consistent performance gain. 3) Dataset: We

construct the ﬁrst fully-annotated visual-tactile in-hand object

pose tracking dataset in real-world scenarios to facilitate future

research.

II. RELATED WORK

Generalizable Visual Pose Tracking. Different from

instance-level object pose tracking [3], [19] , generalizable

object pose tracking methods [4]–[9] aim to track the pose for

an unseen object without its 3D model and can be divided into

regression-based and keypoint-based methods. Regression-

based approaches [5], [8] directly use a neural network to

regress 6D object motion from RGB [8] or point cloud [5]

sequences, while keypoint-based methods [4], [6], [7], [9] are

two-stage that ﬁrst detect object keypoints and then estimate

object pose differences among different frames by keypoint

matching. In terms of generalizability, category-level trackers

[4]–[6] are limited to objects from known object categories

during test time, while category-agnostic ones [7]–[9] can

track an arbitrary object without the category information.

However, visual signals are the only input for these methods,

impeding them to apply to robot manipulation scenarios due

to visually heavy occlusions.

Visual-Tactile 3D Perception and Datasets. Tremendous

efforts [20]–[28] have been made to combine visual and tactile

signals to deal with several 3D perception tasks other than pose

tracking. To reconstruct the 3D shape of the object in contact,

a line of studies [20], [21], [24], [29] ﬁrst reconstructs a coarse

object mesh by visual sensing and then reﬁnes the details with

tactile information, and others [22], [23], [27], [28] further

design iterative strategies to online search for a local object

region with the most informative tactile signals. To estimate

the pose of a static object from an active robot movement, a

multi-stage method [25] leverages visual and tactile sensing

alternatively in different robot states. Various visual-tactile

datasets have been collected to facilitate studies on object

shape reconstruction [30], [31], in-hand object pose estimation

[17], [18], and object grasping [32], [33]. We present the

ﬁrst real-world visual-tactile dataset supporting researches on

object pose tracking.

Object Pose Estimation and Tracking via Tactile Feed-

back. Due to the relatively low quality of visual sensing,

previous works have explored object pose estimation and

tracking via tactile feedback, but limited to instance-level

tracking with a major focus on static grasps. To estimate

the object pose in a single frame, a tactile-only method [34]

combines multiple tactile images via proprioception. Recent

studies [35], [36] in this ﬁeld combines visual and tactile

sensing to achieve object pose estimation. Wen et.al. [35]

generates pose hypotheses from visual point clouds and then

prunes them via hand-object collision check, and Caddeo et.al.

[36] encodes different modalities to learnable features and

fuses them in the feature space. To track the object pose with a

known object model, ´

Alvarez et.al. [37] separately predicts the

object pose using visual and tactile signals alone, then fuses

the two predictions by an extended Kalman Filter. Another

method [38] fuses visual and tactile point clouds and then

align the integral point cloud to the object model.

III. METHOD

In this section, we introduce TEG-Track in detail. As illus-

trated in Figure 2, the key idea is leveraging tactile kinematic

cues learned from tactile sensing to boost visual pose trackers

through a geometric-kinematic optimization strategy. We ﬁrst

revisit generalizable visual pose trackers in Section III-A. We

then present tactile kinematic cues that estimate kinematic

state of the in-hand object from tactile images and marker

ﬂows in Section III-B. Finally, we propose a geometric-

kinematic optimization strategy to integrate object kinematic

states with various visual pose trackers in Section III-C.

A. Generalizable Visual Pose Trackers

A generalizable pose tracker aims at transferring the learned

pose tracking policy to novel objects without their 3D models.

For instance, CAPTRA [5] trains one network per object

category, and uses it to track an unseen object from the

same category during test time. Such a pose tracker can be

designed in a template-based, regression-based, or keypoint-

based manner. A general object-centric representation (com-

plete object model, NOCS Map [39], object keypoints, etc.) is

commonly regarded as an intermediate object feature to build

the bridge between visual inputs and 3D object pose. Though

generalizable visual pose trackers have achieved impressive

results, their heavy reliance on visual sensing makes it difﬁcult

to handle heavy occlusion under in-hand situations. TEG-

Track leverages a generalizable visual pose tracker to provide

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EnhancingGeneralizable6DPoseTrackingofanIn-HandObjectwithTactileSensingYunLiu∗,1,2,XiaomengXu∗,1,WeihangChen3,HaochengYuan4,HeWang5,JingXu3,RuiChen3,andLiYi1,2,6Abstract—Whenmanipulatinganobjecttoaccomplishcom-plextasks,humansrelyonbothvisionandtouchtokeeptrackoftheobject’s6Dpose.However,mostexistin...

展开>> 收起<<

Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing Yun Liu12 Xiaomeng Xu1 Weihang Chen3 Haocheng Yuan4 He Wang5 Jing Xu3 Rui Chen3 and Li Yi126.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing Yun Liu12 Xiaomeng Xu1 Weihang Chen3 Haocheng Yuan4 He Wang5 Jing Xu3 Rui Chen3 and Li Yi126

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: