II. RELATED WORK
A. In-hand manipulation
In-hand manipulation has been studied extensively [1]–[7].
Many approaches require object pose which can be obtained
with a marker tag [3], [4] or a 6D object pose estimator
[5]. Alternatively, deep learning methods can obtain an end-
to-end policy that does not require pose estimation. For
example, [1] learns a controller to reorient a Rubik’s cube
and [7] presents a controller that can reorient many distinct
objects, but both rely on vision and complex multi-fingered
grippers. In contrast, [2] demonstrates how simpler hardware
(parallel-jaw grippers) can use extrinsic dexterity to re-orient
objects in-hand. Their approach is limited to simple cuboids
and requires a 6D object pose estimator. [6] shows how a
compliant gripper can robustly reorient objects in-hand using
handcrafted open-loop primitives that do not require object
pose or shape estimation.
B. Tactile Sensing for shape reconstruction and localization
Reconstructing object shape from vision and tactile data is
a common research objective [8]–[10]. These works typically
rely heavily on vision and tend to use tactile data simply
to detect contact. However, many scenarios with heavy
occlusion preclude good vision data, leading to a growing
interest in reconstructing and/or localizing objects with just
tactile data.
Low resolution tactile sensors [11]–[13] are typically used
to obtain binary contact information only. The development
of vision-based high resolution tactile sensors, such as Gel-
Sight [14], has greatly increased our ability to reconstruct
and localize objects without vision. The GelSight outputs an
RGB image of the tactile imprint, and photometric stereo
is used to convert this image to a depth image. Others
learn this mapping from data [15], or choose instead to
learn a binary segmentation mask [16] to reduce noise. [14]
showed one of the first uses of the GelSight for small object
localization, further improved in [15], [17]. However, they
require building a complete tactile map of the object before
using it for localization.
Most works assume each tactile imprint is collected by
making and breaking contact with the object. This introduces
uncertainty in the relative position of tactile imprints. Instead,
[13] maintains constant contact with the object to reduce
robot movement and obtain higher fidelity data. However,
they use a low resolution sensor and only consider a bowl
shaped object fixed in space. [18] slide along a freely moving
cable with a GelSight to estimate the cable’s pose, but their
method is specific to cables. By rolling objects in-hand, we
also continuously collect tactile data, but do so for a more
general set of freely-moving objects.
Most of these works narrow their scope either by recon-
structing the shape of an unknown object in a fixed pose,
or by localizing an object of known shape and unknown
pose. Recently, [19] proposed removing these limitations by
simultaneously doing shape reconstruction and localization
from tactile data. However, they only show results for
pushing planar objects.
Gaussian process implicit surfaces (GPIS) are commonly
used to build a probabilistic estimate of an object’s shape
[20]. [13], [21] use the variance of the GPIS to guide
the exploration towards regions with highest uncertainty.
However, the goal of these approaches is to reconstruct
the object shape as accurately as possible. In contrast, we
focus on exploration that prioritizes regions of the object
that are likely to aid in solving an insertion task, making
our exploration more efficient. [22] also prioritizes exploring
regions that help solve a task, but their approach is limited
to grasping an object that is fixed during exploration.
C. Insertion Task
A lot of previous work on the peg insertion task is object-
geometry specific [23]–[25]. For example, [26] assumes a
cylindrical peg shape; [27] uses vision and tactile data to
enable insertion of complex shapes but assumes known peg
geometry; [28] learns from forces measured during human
demonstration, but requires successful demonstration of the
specific peg shape. In contrast, our work assumes no prior
knowledge of the peg shape, and instead learns an explicit
peg object model. There is some prior work that aims to be
robust to different peg geometries, such as [29], but the pegs
are fixed to the end-effector.
D. Roller Grasper
In this work, we use a simulated version of the Tactile-
Enabled Roller Grasper hardware from [30]. This gripper
has 7 degrees of freedom, shown in Fig. 3, and each roller is
equipped with a custom GelSight sensor [31]. The sensor’s
camera is inside the roller, fixed to the stator such that it
points towards the grasping point, while the elastomer covers
the rotating roller.
Because the Roller Grasper rolls objects in-hand, it can
continuously collect tactile information of an object’s sur-
face. This continuity reduces uncertainty in relative position
between tactile imprints, simplifying reconstruction of ob-
ject shape compared to linkage-based grippers that capture
discontinuous data.
Several works have studied the Roller Grasper. [3] pro-
posed a velocity controller and an imitation learning con-
troller to reorient objects. [30] built a closed loop controller
that keeps an object centered in the Roller Grasper during
manipulation using contact patch data from the tactile sensor.
We build on these works by using the velocity controller
to determine the roller action to produce a desired object
angular velocity. However, we do not assume a known
object model or use marker tags, and instead simultaneously
reconstruct and localize the object in a task-driven manner.
III. APPROACH
We propose a method that leverages tactile sensing to
reorient objects of unknown shapes in order to complete a
task. We focus on settings with high occlusion where the
robot cannot see the object it is manipulating, and must
instead rely on its sense of touch and proprioception. We