goods. While a large body of prior works exists on stable
object placing [28]–[34], none of those works investigates the
contribution of tactile feedback in stable placing. Rather, they
rely either on vision systems, which are prone to occlusions
and require external sensors, or accurate scene descriptions,
which demand cumbersome manual labor. We attempt to fill
this gap by investigating the impact of tactile sensing in this
simple yet challenging scenario.
We propose an effective pipeline for translating taxel-
based measurements into useful features for learning a pose
correction signal to ensure optimal object placement. A
placing action is optimal if the object’s placing normal
(orthogonal to its placing surface) is colinear with the normal
of the placing surface, e.g., a table. Our method comprises a
deep convolutional neural network that predicts a corrective
rotation action for the gripper. Given the current tactile
sensor readings and potentially adding other signals, e.g., F/T
information, we predict a rotation matrix w.r.t. the current
gripper frame (cf. Fig. 3(a)). The z-axis of this predicted
frame corresponds to the object’s placing normal. This
prediction is subsequently used to plan a hand movement
to align the object’s placing normal with the table’s normal.
After this single-step prediction and alignment, we attempt to
place the object on the surface while keeping the previously
determined orientation fixed. The major challenge here is
to predict the object’s placing normal solely from tactile
and proprioceptive sensors instead of employing traditional
extrinsic vision-based methods. To assess the importance of
learning-based placing policies for this problem, we compare
our method to two classical baseline approaches.
Our main contributions are twofold; (i) the development
and training of tactile-based policies for stable object placing
without requiring any extrinsic visual feedback, and (ii) an
open-source suite of our dataset, CADs, pretrained models,
and the codebase of all methods (both classical and deep
learning ones) from our extensive real-robot experiments.
Overall, our study confirms that tactile sensing can be a
powerful and valuable low-cost addition to robotic manipu-
lators: their signals provide features that increase reliability
and robot dexterity.
II. RELATED WORK
Object placing. Stable object placing is a crucial skill
for autonomous robotic systems. Many prominent tasks in
the robotic community, such as object rearrangement or
assembly, require robotic pick & place sequences that heavily
rely on this skill. The authors of [29] propose a model-based
pointcloud-conditioned approach for stable object placing
by matching polygon models of object and environment.
Similarly, [28] uses pointcloud observations for extracting
meaningful feature representations for learning to place new
objects in stable configurations. More recently, [32] proposes
to exploit learned keypoint representations from raw RGB-
D images for solving category-level manipulation tasks. [31]
also uses a combination of vision and learning for manipu-
lating unknown objects. [33] presents a planning algorithm
for stable object placement in cluttered scenes requiring a
fully specified environment. Closely related to our work is
[34], which presents an iterative learning-based approach
for placing household objects onto flat surfaces but using
a system of three external depth cameras for input.
While most of these works deal with the problem of
generating stable placing poses for unknown objects, the
main difference to our work lies in the input observation–
none of them considers tactile sensing. Instead, they all
rely on single or even multiple depth/RGB images. Relying
on image data might be problematic due to gripper-object
occlusions in highly cluttered scenes, especially if the object
ends up inside the gripper without any prior knowledge of its
pose. Additionally, external sensing systems require careful
and precise calibration w.r.t. the robot, which is often tedious,
time-consuming, and error-prone. In contrast, tactile sensors
directly provide the contact information between the object
and gripper, independent of the surrounding environment.
In-hand object pose estimation. Due to the inherent diffi-
culties of estimating a grasped object’s pose and due to its
importance for tasks like pick & place or in-hand manipu-
lation, multiple methods for object-in-hand pose estimation
have been developed. The authors of [35] solely exploit
tactile sensors and match their signal with a local patch
of the object’s geometry, thereby estimating its pose. Other
works [36], [37] make use of both visual and tactile inputs.
While [36] only requires an initial visual input for initializing
a particle filter, [37] employs an extended Kalman filter
constantly using vision & touch. Recent progress in deep
learning has fostered data-driven methods for in-hand object
pose estimation. [38]–[40] present end-to-end approaches
based on RGB images. While [38], [39] directly outputs pose
predictions, [40] learns observation models that can later be
exploited in optimization frameworks. [41] fuses vision and
tactile in an approach that self-determines the reliability of
each modality, while [42] exploits a learned tactile observa-
tion model in combination with a Bayes filter. Following the
successes of recent deep learning approaches, we propose
to learn an end-to-end direct mapping from tactile input for
estimating the grasped object’s placing normal. We want to
point out that we are not interested in estimating the object’s
full 6D pose. Instead, we only focus on aligning the object
with the placing surface. Moreover, our proposed method
does not require any repetitive measurements or filtering and
solely needs to be queried once. Finally, our method requires
a very small training set.
Insertion. Stable object placing is also related to tactile in-
sertion. Successful completion of both tasks requires suitable
alignment between object & table, or peg & hole. Several
works approach challenging insertion tasks using tactile
sensors [2]–[4], [43], [44]. The authors of [43] leverage
vision-based tactile sensors for precisely localizing small
objects inside the gripper. This information is subsequently
exploited for small-part insertions using classical control. [2],
[3] also focus on solving tight insertion tasks using learned
tactile representations. Both exploit the tactile measurements
as a feedback signal to predict residual control commands.
Recently, [4] demonstrated tactile insertion through active ex-