Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing Yun Liu12 Xiaomeng Xu1 Weihang Chen3 Haocheng Yuan4 He Wang5 Jing Xu3 Rui Chen3 and Li Yi126

2025-04-29 0 0 7.95MB 8 页 10玖币
侵权投诉
Enhancing Generalizable 6D Pose Tracking
of an In-Hand Object with Tactile Sensing
Yun Liu,1,2, Xiaomeng Xu,1, Weihang Chen3, Haocheng Yuan4, He Wang5, Jing Xu3, Rui Chen3, and Li Yi1,2,6
Abstract—When manipulating an object to accomplish com-
plex tasks, humans rely on both vision and touch to keep track
of the object’s 6D pose. However, most existing object pose
tracking systems in robotics rely exclusively on visual signals,
which hinder a robot’s ability to manipulate objects effectively.
To address this limitation, we introduce TEG-Track, a tactile-
enhanced 6D pose tracking system that can track previously
unseen objects held in hand. From consecutive tactile signals,
TEG-Track optimizes object velocities from marker flows when
slippage does not occur, or regresses velocities using a slippage es-
timation network when slippage is detected. The estimated object
velocities are integrated into a geometric-kinematic optimization
scheme to enhance existing visual pose trackers. To evaluate our
method and to facilitate future research, we construct a real-
world dataset for visual-tactile in-hand object pose tracking.
Experimental results demonstrate that TEG-Track consistently
enhances state-of-the-art generalizable 6D pose trackers in syn-
thetic and real-world scenarios. Our code and dataset are available
at https://github.com/leolyliu/TEG-Track.
Index Terms—Force and Tactile Sensing, Sensor Fusion, Visual
Tracking
I. INTRODUCTION
ACCURATE 6D pose tracking of objects is essential
for enabling effective robotic manipulation. Prior re-
search [1]–[3] has demonstrated impressive precision and
robustness for tracking known objects using 3D object mod-
els. Recent studies have further shifted their focus towards
developing generalizable 6D pose tracking methods that can
handle novel object instances from known [4]–[6] or even
unknown [7]–[9] object categories. In this paper, we contribute
to the development of such generalizable 6D pose tracking
techniques, specifically addressing the in-hand setup shown in
Figure 1 that is commonly encountered in robot manipulation
tasks. Our goal is to consecutively track the 6D pose of an
Manuscript received: July, 18, 2023; Revised October, 17, 2023; Accepted
November, 20, 2023.
This paper was recommended for publication by Editor Pascal Vasseur upon
evaluation of the Associate Editor and Reviewers’ comments.
Project supported by the Young Scientists Fund of the National Natural
Science Foundation of China (Grant No. 62203258).
Yun Liu and Xiaomeng Xu are co-first authors.
Li Yi is the corresponding author.
1Institute for Interdisciplinary Information Sciences, Tsinghua University,
Beijing, China
2Shanghai Qizhi Institute, Shanghai, China
3Department of Mechanical Engineering, Tsinghua University, Beijing,
China
4Northwestern Polytechnical University, Xian, China
5Center on Frontiers of Computing Studies, Peking University, Beijing,
China
6Shanghai AI Laboratory, Shanghai, China
Digital Object Identifier (DOI): see top of this page.
in-hand object starting from its initial 6D pose. In scenarios
where objects are manipulated by robot hands, relying solely
on robot proprioception could prove challenging, particularly
when external forces from collisions or multi-agent interac-
tions occur. Therefore, it is critical to have an accurate in-
hand object tracker that can precisely capture the object’s
state, especially in contexts involving in-hand manipulation
or rich environmental contacts such as peg-hole insertion.
Furthermore, this research could significantly benefit human-
robot collaboration [10]–[12], where sudden changes in the
object’s kinematic state caused by interactions are common.
Existing generalizable 6D pose tracking methods face chal-
lenges in in-hand manipulation scenarios. Compared with
scenes without robot manipulation, visual sensing of the in-
hand object become more distorted and less informative due to
in-hand occlusions, which could impede existing methods that
heavily rely on only visual signals like RGB-D images. As a
remedy, tactile sensing could be integrated into the tracking
process. By equipping the robot hand with tactile sensors such
as GelSight [13], we can capture high-quality geometric and
motion signals from contact areas. Such information from
tactile sensing can complement the noisy visual sensing caused
by occlusions, meanwhile combining with the rapid advance-
ment of tactile sensor technologies [13]–[16], making the
integration feasible and promising. Moreover, precise tactile
sensing captures accurate motions for object contact regions,
providing strong clues for understanding object pose changes.
Therefore, we propose TEG-Track, a general framework
for enhancing generalizable 6D pose tracking of an in-hand
object with tactile sensing. First, from tactile sensing alone,
TEG-Track learns tactile kinematic cues that indicate the
kinematic states of the object. Combining with visual sensing,
TEG-Track then integrates object kinematic states with exist-
ing generalizable visual pose trackers through a geometric-
kinematic optimization strategy. TEG-Track can be easily
plugged into various generalizable pose trackers, including
template-based (introduced in Section V-B), regression-based
[5], and keypoint-based [7] approaches.
To evaluate TEG-Track, we curate synthetic and real-world
datasets due to the lack of datasets supporting generalizable
visual-tactile in-hand object pose tracking research. Since
existing datasets [17], [18] only serve for single-frame object
pose estimation with a small data scale, we collect a large-
scale synthetic object pose tracking dataset with large in-
hand motion variations to test TEG-Track widely in vari-
ous situations. Furthermore, to examine TEG-Track in real
scenarios, we contribute a real-world visual-tactile in-hand
arXiv:2210.04026v2 [cs.CV] 23 Dec 2023
Fig. 1. We propose a general in-hand object pose tracking framework TEG-
Track, and evaluate it on our synthetic and real datasets. Our approach
enhances generalizable visual trackers such as BundleTrack [7] with tactile
sensing. Here we visualize the tracking task as tracking the object’s 3D
bounding box: green boxes denote ground truth poses whereas red boxes
denote estimated poses.
object pose tracking dataset including 200 trajectories covering
17 instances from 5 object categories with careful per-frame
object pose annotations. Experiments demonstrate that TEG-
Track consistently improves the performances of different
generalizable visual pose trackers in both synthetic and real
scenarios. Compared to a state-of-the-art generalizable pose
tracker BundleTrack [7] on our real evaluation set, TEG-Track
achieves 30.9%and 21.4%decreases in the average rotation
and translation errors, respectively.
In summary, our main contributions are threefold: 1) To
the best of our knowledge, we are among the first to explore
generalizable in-hand object pose tracking combining visual
and tactile sensing. 2) We present TEG-Track, a visual-tactile
framework that learns tactile kinematic cues from tactile
sensing and then incorporates them into various visual pose
trackers with consistent performance gain. 3) Dataset: We
construct the first fully-annotated visual-tactile in-hand object
pose tracking dataset in real-world scenarios to facilitate future
research.
II. RELATED WORK
Generalizable Visual Pose Tracking. Different from
instance-level object pose tracking [3], [19] , generalizable
object pose tracking methods [4]–[9] aim to track the pose for
an unseen object without its 3D model and can be divided into
regression-based and keypoint-based methods. Regression-
based approaches [5], [8] directly use a neural network to
regress 6D object motion from RGB [8] or point cloud [5]
sequences, while keypoint-based methods [4], [6], [7], [9] are
two-stage that first detect object keypoints and then estimate
object pose differences among different frames by keypoint
matching. In terms of generalizability, category-level trackers
[4]–[6] are limited to objects from known object categories
during test time, while category-agnostic ones [7]–[9] can
track an arbitrary object without the category information.
However, visual signals are the only input for these methods,
impeding them to apply to robot manipulation scenarios due
to visually heavy occlusions.
Visual-Tactile 3D Perception and Datasets. Tremendous
efforts [20]–[28] have been made to combine visual and tactile
signals to deal with several 3D perception tasks other than pose
tracking. To reconstruct the 3D shape of the object in contact,
a line of studies [20], [21], [24], [29] first reconstructs a coarse
object mesh by visual sensing and then refines the details with
tactile information, and others [22], [23], [27], [28] further
design iterative strategies to online search for a local object
region with the most informative tactile signals. To estimate
the pose of a static object from an active robot movement, a
multi-stage method [25] leverages visual and tactile sensing
alternatively in different robot states. Various visual-tactile
datasets have been collected to facilitate studies on object
shape reconstruction [30], [31], in-hand object pose estimation
[17], [18], and object grasping [32], [33]. We present the
first real-world visual-tactile dataset supporting researches on
object pose tracking.
Object Pose Estimation and Tracking via Tactile Feed-
back. Due to the relatively low quality of visual sensing,
previous works have explored object pose estimation and
tracking via tactile feedback, but limited to instance-level
tracking with a major focus on static grasps. To estimate
the object pose in a single frame, a tactile-only method [34]
combines multiple tactile images via proprioception. Recent
studies [35], [36] in this field combines visual and tactile
sensing to achieve object pose estimation. Wen et.al. [35]
generates pose hypotheses from visual point clouds and then
prunes them via hand-object collision check, and Caddeo et.al.
[36] encodes different modalities to learnable features and
fuses them in the feature space. To track the object pose with a
known object model, ´
Alvarez et.al. [37] separately predicts the
object pose using visual and tactile signals alone, then fuses
the two predictions by an extended Kalman Filter. Another
method [38] fuses visual and tactile point clouds and then
align the integral point cloud to the object model.
III. METHOD
In this section, we introduce TEG-Track in detail. As illus-
trated in Figure 2, the key idea is leveraging tactile kinematic
cues learned from tactile sensing to boost visual pose trackers
through a geometric-kinematic optimization strategy. We first
revisit generalizable visual pose trackers in Section III-A. We
then present tactile kinematic cues that estimate kinematic
state of the in-hand object from tactile images and marker
flows in Section III-B. Finally, we propose a geometric-
kinematic optimization strategy to integrate object kinematic
states with various visual pose trackers in Section III-C.
A. Generalizable Visual Pose Trackers
A generalizable pose tracker aims at transferring the learned
pose tracking policy to novel objects without their 3D models.
For instance, CAPTRA [5] trains one network per object
category, and uses it to track an unseen object from the
same category during test time. Such a pose tracker can be
designed in a template-based, regression-based, or keypoint-
based manner. A general object-centric representation (com-
plete object model, NOCS Map [39], object keypoints, etc.) is
commonly regarded as an intermediate object feature to build
the bridge between visual inputs and 3D object pose. Though
generalizable visual pose trackers have achieved impressive
results, their heavy reliance on visual sensing makes it difficult
to handle heavy occlusion under in-hand situations. TEG-
Track leverages a generalizable visual pose tracker to provide
摘要:

EnhancingGeneralizable6DPoseTrackingofanIn-HandObjectwithTactileSensingYunLiu∗,1,2,XiaomengXu∗,1,WeihangChen3,HaochengYuan4,HeWang5,JingXu3,RuiChen3,andLiYi1,2,6Abstract—Whenmanipulatinganobjecttoaccomplishcom-plextasks,humansrelyonbothvisionandtouchtokeeptrackoftheobject’s6Dpose.However,mostexistin...

展开>> 收起<<
Enhancing Generalizable 6D Pose Tracking of an In-Hand Object with Tactile Sensing Yun Liu12 Xiaomeng Xu1 Weihang Chen3 Haocheng Yuan4 He Wang5 Jing Xu3 Rui Chen3 and Li Yi126.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:7.95MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注