Our approach is inspired by the recent advances in legged locomotion [
5
,
6
] using reinforcement
learning. The core of these works is to learn a compressed representation of different terrain properties
(called extrinsics) for walking, which is jointly trained with the control policy. During deployment,
the extrinsics is estimated online and the controller can perform rapid adaptation to it. Our key
insight is that, despite the diversity of real-world objects, for the task of in-hand object rotation, the
important physical properties such as local shape, mass, and size as perceived by the fingertips can
be compressed to a compact representation. Once such a compressed representation (extrinsics) of
different objects is learned, the controller can estimate it online from proprioception history and use
it to adaptively manipulate a diverse set of objects.
Specifically, we encode the object’s intrinsic properties (such as mass and size) to an extrinsics vector,
and train an adaptive policy with it as an input. The learned policy can robustly and efficiently rotate
different objects in simulation environments. However, we do not have access to the extrinsics when
we deploy the policy in the real world. To tackle this problem, we use the rapid motor adaptation [
6
]
to learn an adaptation module which estimate the extrinsic vector, using the discrepancy between
observed proprioception history and the commanded actions. This adaptation module can also be
trained solely in simulation via supervised learning. The concept of estimating physical properties
using proprioceptive history has been widely used in locomotion [
5
,
6
,
7
] but has not yet been
explored for in-hand manipulation.
Experimental results on a multi-finger Allegro Hand [
8
] show that our method can successfully
rotate over 30 objects of diverse shapes, sizes (from
4.5 cm
to
7.5 cm
), mass (from
5 g
to
200 g
), and
other physical properties (e.g., deformable or soft objects) in the real world. We also observe that an
adaptive and smooth finger gait emerges from the learning process. Our approach shows the surprising
effectiveness of using only proprioception sensory signals for adaptation to different objects, even
without the usage of vision and tactile sensing. To further understand the underlying mechanisms
of our approach, we studied the estimated extrinsics when manipulating different objects. We find
interpretable extrinsic values that correlate to mass and scale changes, and that a low-dimensional
structure of the embedding does exist, both of which are critical to our generalization ability.
2 Related Work
Classic Control for In-Hand Manipulation.
Dexterous in-hand manipulation has been an active
research area for decades [
9
]. Classic control methods usually need an analytical model of the object
and robot geometry to perform motion planning for object manipulation. For example, [
10
,
11
] rely
on such a model to plan finger movement to rotate objects. [
12
] assumes objects are piece-wise
smooth and use finger-tracking to rotate objects. [
13
,
14
] demonstrate reorientation of different
objects in simulation by generating trajectories using optimization. There have been also attempts to
deploy systems in the real-world. For example, [
15
] calculates precise contact locations to plan a
sequence of contact locations for twirling objects. [
16
] plans over a set of predefined grasp strategies
to achieve object reorientation using two multi-fingered hands. [
17
,
18
] use throwing or external
forces to perturb the object in the air and re-grasp it. Recently, works such as [
19
,
20
] do in-grasp
manipulation without breaking the contact. [
4
] demonstrates complex object reorientation skills using
a non-anthropomorphic hand by leveraging the compliance and an accurate pose tracker. The diversity
of the objects they can manipulate is still limited due to the intrinsic complexity of the physical
world. In contrast to traditional control approaches which may use heuristics or simplified models to
solve this task, we instead use model-free reinforcement learning to train an adaptive policy, and use
adaptation to achieve generalization.
Reinforcement Learning for In-Hand Manipulation.
To get around the need of an accurate object
model and physical property measures, in the last few years, there has been a growing interest in
using reinforcement learning directly in the real-world for dexterous in-hand manipulation. [
21
]
learns simple in-grasp rolling for cylindrical objects. [
22
,
23
] learns a dynamics model and plan
over it for rotating objects on the palm. [
24
,
25
] use human demonstration to accelerate the learning
process. However, since reinforcement learning is very sample inefficient, the learned skills are rather
simple or have limited object diversity. Although complex skills such as re-orientating a diverse set
of objects [
26
,
27
,
28
] and tool use [
29
,
30
] can be obtained in simulation, transferring the results to
real-world remains challenging. Instead of directly training a policy in the real world, our approach
learns the policy entirely in the simulator and aims to directly transfer to the real world.
Sim-to-Real Transfer via Domain Randomization.
Several works aim to train reinforcement
learning policies using a simulator and directly deploy it in a real-world system. Domain randomiza-
2