Enhancing Generalizable 6D Pose Tracking
of an In-Hand Object with Tactile Sensing
Yun Liu∗,1,2, Xiaomeng Xu∗,1, Weihang Chen3, Haocheng Yuan4, He Wang5, Jing Xu3, Rui Chen3, and Li Yi1,2,6
Abstract—When manipulating an object to accomplish com-
plex tasks, humans rely on both vision and touch to keep track
of the object’s 6D pose. However, most existing object pose
tracking systems in robotics rely exclusively on visual signals,
which hinder a robot’s ability to manipulate objects effectively.
To address this limitation, we introduce TEG-Track, a tactile-
enhanced 6D pose tracking system that can track previously
unseen objects held in hand. From consecutive tactile signals,
TEG-Track optimizes object velocities from marker flows when
slippage does not occur, or regresses velocities using a slippage es-
timation network when slippage is detected. The estimated object
velocities are integrated into a geometric-kinematic optimization
scheme to enhance existing visual pose trackers. To evaluate our
method and to facilitate future research, we construct a real-
world dataset for visual-tactile in-hand object pose tracking.
Experimental results demonstrate that TEG-Track consistently
enhances state-of-the-art generalizable 6D pose trackers in syn-
thetic and real-world scenarios. Our code and dataset are available
at https://github.com/leolyliu/TEG-Track.
Index Terms—Force and Tactile Sensing, Sensor Fusion, Visual
Tracking
I. INTRODUCTION
ACCURATE 6D pose tracking of objects is essential
for enabling effective robotic manipulation. Prior re-
search [1]–[3] has demonstrated impressive precision and
robustness for tracking known objects using 3D object mod-
els. Recent studies have further shifted their focus towards
developing generalizable 6D pose tracking methods that can
handle novel object instances from known [4]–[6] or even
unknown [7]–[9] object categories. In this paper, we contribute
to the development of such generalizable 6D pose tracking
techniques, specifically addressing the in-hand setup shown in
Figure 1 that is commonly encountered in robot manipulation
tasks. Our goal is to consecutively track the 6D pose of an
Manuscript received: July, 18, 2023; Revised October, 17, 2023; Accepted
November, 20, 2023.
This paper was recommended for publication by Editor Pascal Vasseur upon
evaluation of the Associate Editor and Reviewers’ comments.
Project supported by the Young Scientists Fund of the National Natural
Science Foundation of China (Grant No. 62203258).
∗Yun Liu and Xiaomeng Xu are co-first authors.
Li Yi is the corresponding author.
1Institute for Interdisciplinary Information Sciences, Tsinghua University,
Beijing, China
2Shanghai Qizhi Institute, Shanghai, China
3Department of Mechanical Engineering, Tsinghua University, Beijing,
China
4Northwestern Polytechnical University, Xian, China
5Center on Frontiers of Computing Studies, Peking University, Beijing,
China
6Shanghai AI Laboratory, Shanghai, China
Digital Object Identifier (DOI): see top of this page.
in-hand object starting from its initial 6D pose. In scenarios
where objects are manipulated by robot hands, relying solely
on robot proprioception could prove challenging, particularly
when external forces from collisions or multi-agent interac-
tions occur. Therefore, it is critical to have an accurate in-
hand object tracker that can precisely capture the object’s
state, especially in contexts involving in-hand manipulation
or rich environmental contacts such as peg-hole insertion.
Furthermore, this research could significantly benefit human-
robot collaboration [10]–[12], where sudden changes in the
object’s kinematic state caused by interactions are common.
Existing generalizable 6D pose tracking methods face chal-
lenges in in-hand manipulation scenarios. Compared with
scenes without robot manipulation, visual sensing of the in-
hand object become more distorted and less informative due to
in-hand occlusions, which could impede existing methods that
heavily rely on only visual signals like RGB-D images. As a
remedy, tactile sensing could be integrated into the tracking
process. By equipping the robot hand with tactile sensors such
as GelSight [13], we can capture high-quality geometric and
motion signals from contact areas. Such information from
tactile sensing can complement the noisy visual sensing caused
by occlusions, meanwhile combining with the rapid advance-
ment of tactile sensor technologies [13]–[16], making the
integration feasible and promising. Moreover, precise tactile
sensing captures accurate motions for object contact regions,
providing strong clues for understanding object pose changes.
Therefore, we propose TEG-Track, a general framework
for enhancing generalizable 6D pose tracking of an in-hand
object with tactile sensing. First, from tactile sensing alone,
TEG-Track learns tactile kinematic cues that indicate the
kinematic states of the object. Combining with visual sensing,
TEG-Track then integrates object kinematic states with exist-
ing generalizable visual pose trackers through a geometric-
kinematic optimization strategy. TEG-Track can be easily
plugged into various generalizable pose trackers, including
template-based (introduced in Section V-B), regression-based
[5], and keypoint-based [7] approaches.
To evaluate TEG-Track, we curate synthetic and real-world
datasets due to the lack of datasets supporting generalizable
visual-tactile in-hand object pose tracking research. Since
existing datasets [17], [18] only serve for single-frame object
pose estimation with a small data scale, we collect a large-
scale synthetic object pose tracking dataset with large in-
hand motion variations to test TEG-Track widely in vari-
ous situations. Furthermore, to examine TEG-Track in real
scenarios, we contribute a real-world visual-tactile in-hand
arXiv:2210.04026v2 [cs.CV] 23 Dec 2023