In-Hand Object Rotation via Rapid Motor Adaptation Haozhi Qi12Ashish Kumar1Roberto Calandra2Yi Ma1Jitendra Malik12 1UC Berkeley2Meta AI

2025-05-05 0 0 8.29MB 16 页 10玖币
侵权投诉
In-Hand Object Rotation via Rapid Motor Adaptation
Haozhi Qi,1,2 Ashish Kumar,1 Roberto Calandra2Yi Ma1Jitendra Malik1,2
1UC Berkeley 2Meta AI
https://haozhi.io/hora/
Train in Simulation Directly Deploy in the Real World
195 g 48 g 5 g 181 g
30 g
130 g
55 g200 g
22 g 22 g 106 g 13 g
(5.9, 6.7) cm (5.0, 6.7) cm (5.0, 7.9) cm (5.5, 6.5) cm
(4.4, 4.5) cm (4.5, 5.7) cm (5.0, 6.0) cm (4.3, 5.4) cm
(7.0, 7.2) cm
(4.5, 5.8) cm
(5.6, 7.3) cm(6.5, 8.0) cm
198 g
(6.3, 8.7) cm
14 g
(4.8, 5.2) cm
25 g
(5.5, 6.7) cm
Figure 1:
Left:
Our controller is trained only in simulation on simple cylindrical objects of different
sizes and weights.
Right:
Without any real world fine-tuning, the controller can be deployed to
a real robot on a diverse set of objects with different shapes, sizes and weights (object mass and
the shortest/longest diameter axis length along the fingertips are shown in the figure) using only
proprioceptive information.
Website
:Emergence of natural stable finger gaits can be observed in the
learned control policy.
Abstract:
Generalized in-hand manipulation has long been an unsolved challenge
of robotics. As a small step towards this grand goal, we demonstrate how to design
and learn a simple adaptive controller to achieve in-hand object rotation using
only fingertips. The controller is trained entirely in simulation on only cylindrical
objects, which then – without any fine-tuning – can be directly deployed to a real
robot hand to rotate dozens of objects with diverse sizes, shapes, and weights over
the
z
-axis. This is achieved via rapid online adaptation of the robot’s controller
to the object properties using only proprioception history. Furthermore, natural
and stable finger gaits automatically emerge from training the control policy via
reinforcement learning. Code and more videos are available at our Website.
Keywords: In-Hand Manipulation, Object Rotation, Reinforcement Learning
1 Introduction
Humans are remarkably good at manipulating objects in-hand – they can even adapt to new objects of
different shapes, sizes, mass and materials with no apparent effort. While several works have shown
in-hand object rotation with real-world multi-fingered hands for a single or a few objects [
1
,
2
,
3
,
4
],
truly generalizable in-hand manipulation remains an unsolved challenge of robotics.
In this paper, we demonstrate that it is possible to train an adaptive controller capable of rotating
diverse objects over the
z
-axis with the fingertips of a multi-fingered robot hand (Figure 1). This task
is a simplification of the general in-hand reorientation task, yet still quite challenging for robots since
at all times the fingers need to maintain a dynamic or static force closure on the object to prevent it
from falling (as it can not make use of any other supporting surface such as the palm).
Equal Contribution.
6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.
arXiv:2210.04887v1 [cs.RO] 10 Oct 2022
Our approach is inspired by the recent advances in legged locomotion [
5
,
6
] using reinforcement
learning. The core of these works is to learn a compressed representation of different terrain properties
(called extrinsics) for walking, which is jointly trained with the control policy. During deployment,
the extrinsics is estimated online and the controller can perform rapid adaptation to it. Our key
insight is that, despite the diversity of real-world objects, for the task of in-hand object rotation, the
important physical properties such as local shape, mass, and size as perceived by the fingertips can
be compressed to a compact representation. Once such a compressed representation (extrinsics) of
different objects is learned, the controller can estimate it online from proprioception history and use
it to adaptively manipulate a diverse set of objects.
Specifically, we encode the object’s intrinsic properties (such as mass and size) to an extrinsics vector,
and train an adaptive policy with it as an input. The learned policy can robustly and efficiently rotate
different objects in simulation environments. However, we do not have access to the extrinsics when
we deploy the policy in the real world. To tackle this problem, we use the rapid motor adaptation [
6
]
to learn an adaptation module which estimate the extrinsic vector, using the discrepancy between
observed proprioception history and the commanded actions. This adaptation module can also be
trained solely in simulation via supervised learning. The concept of estimating physical properties
using proprioceptive history has been widely used in locomotion [
5
,
6
,
7
] but has not yet been
explored for in-hand manipulation.
Experimental results on a multi-finger Allegro Hand [
8
] show that our method can successfully
rotate over 30 objects of diverse shapes, sizes (from
4.5 cm
to
7.5 cm
), mass (from
5 g
to
200 g
), and
other physical properties (e.g., deformable or soft objects) in the real world. We also observe that an
adaptive and smooth finger gait emerges from the learning process. Our approach shows the surprising
effectiveness of using only proprioception sensory signals for adaptation to different objects, even
without the usage of vision and tactile sensing. To further understand the underlying mechanisms
of our approach, we studied the estimated extrinsics when manipulating different objects. We find
interpretable extrinsic values that correlate to mass and scale changes, and that a low-dimensional
structure of the embedding does exist, both of which are critical to our generalization ability.
2 Related Work
Classic Control for In-Hand Manipulation.
Dexterous in-hand manipulation has been an active
research area for decades [
9
]. Classic control methods usually need an analytical model of the object
and robot geometry to perform motion planning for object manipulation. For example, [
10
,
11
] rely
on such a model to plan finger movement to rotate objects. [
12
] assumes objects are piece-wise
smooth and use finger-tracking to rotate objects. [
13
,
14
] demonstrate reorientation of different
objects in simulation by generating trajectories using optimization. There have been also attempts to
deploy systems in the real-world. For example, [
15
] calculates precise contact locations to plan a
sequence of contact locations for twirling objects. [
16
] plans over a set of predefined grasp strategies
to achieve object reorientation using two multi-fingered hands. [
17
,
18
] use throwing or external
forces to perturb the object in the air and re-grasp it. Recently, works such as [
19
,
20
] do in-grasp
manipulation without breaking the contact. [
4
] demonstrates complex object reorientation skills using
a non-anthropomorphic hand by leveraging the compliance and an accurate pose tracker. The diversity
of the objects they can manipulate is still limited due to the intrinsic complexity of the physical
world. In contrast to traditional control approaches which may use heuristics or simplified models to
solve this task, we instead use model-free reinforcement learning to train an adaptive policy, and use
adaptation to achieve generalization.
Reinforcement Learning for In-Hand Manipulation.
To get around the need of an accurate object
model and physical property measures, in the last few years, there has been a growing interest in
using reinforcement learning directly in the real-world for dexterous in-hand manipulation. [
21
]
learns simple in-grasp rolling for cylindrical objects. [
22
,
23
] learns a dynamics model and plan
over it for rotating objects on the palm. [
24
,
25
] use human demonstration to accelerate the learning
process. However, since reinforcement learning is very sample inefficient, the learned skills are rather
simple or have limited object diversity. Although complex skills such as re-orientating a diverse set
of objects [
26
,
27
,
28
] and tool use [
29
,
30
] can be obtained in simulation, transferring the results to
real-world remains challenging. Instead of directly training a policy in the real world, our approach
learns the policy entirely in the simulator and aims to directly transfer to the real world.
Sim-to-Real Transfer via Domain Randomization.
Several works aim to train reinforcement
learning policies using a simulator and directly deploy it in a real-world system. Domain randomiza-
2
𝑞!"#$, 𝑎!"%&
𝑞!, 𝑎!"'
..
Adaptation Module
Adaptation Module Training
𝑧!
Base
Policy
(π) 𝑎!
Base Policy Training
𝑜!
𝑜!
Object Prop Encoder
𝑧̂!
object position, scale,
mass, center of mass,
coefficient of friction
||𝑧!%%%%𝑧̂!||"
"
𝑜!
𝑧̂!
Deployment
Copy and Freeze
Copy and Freeze
Trainable Module in Red
Copy and Freeze
(𝜇)
(𝜙)
Adaptation Module (𝜙)
20 Hz
20 Hz
𝑎!
𝑎!
Base
Policy
(π)
𝑞!"#$, 𝑎!"%&
𝑞!, 𝑎!"'
..
Base Policy
(π)
Figure 2: An overview of our approach at different training and deployment stages. In Base Policy
Learning, we jointly optimize
µ
and
π
using PPO [
33
]. The observation
ot
only contains three past
joint positions and commanded actions. Next, in Adaptation Module Learning, we freeze the policy
π
and use supervised learning to train
φ
which uses proprioception and action history to estimate
the extrinsics vector
zt
. During Deployment, the base policy
π
uses the extrinsics
ˆ
zt
estimated and
updated online by φ.
tion [31] varies the simulation parameters during training to expose the policy to diverse simulation
environments so that they can be robustly deployed in the real-world. Representative examples are [
1
]
and [
2
]. They leverage massive computational resources and large-scale reinforcement learning
methods to learn an agile object reorientation skills and solve Rubik’s Cube with a single robot hand.
However, they still focus only on manipulating a limited number of objects. [
3
] learns a finger-gaiting
behavior efficiently and transfers to a real robot when hand facing downwards, but they do not only
use fingertips and the objects considered are all cubes. Our approach focuses on generalization to a
diverse set of objects and can be trained within a few hours.
Sim-to-Real via Adaptation.
Instead of relying on domain randomization which is agnostic to
current environment parameters, [
32
] performs system identification via initial calibration, or online
adaptive control to estimate the system parameters for Sim-to-Real Transfer. However, learning the
exact physical values and alignment between simulation and real-world may be sub-optimal because
of the intrinsic inaccuracy of physics simulations. An alternative way is to learn a low dimensional
embedding which encodes the environment parameters [
5
,
6
] which is then used by the control
policy to act. This paradigm has enabled robust and adaptive locomotion policies. However, it is
not straightforward to apply it on the in-hand manipulation task. Our approach demonstrates how to
design the reward and training environment to enable a natural and stable controller that can transfer
to the real world.
3 Rapid Motor Adaptation for In-Hand Object Rotation
An overview of our approach is shown in Figure 2. During deployment (Figure 2, bottom), our policy
infers a low-dimensional embedding of object’s properties such as size and mass from proprioception
and action history, which is then used by our base policy to rotate the object. We first describe how
we train a base policy with object property provided by a simulator, then we discuss how to train an
adaptation module that is capable of inferring these properties.
3.1 Base Policy Training
Privileged Information.
In this paper, privileged information refers to the object’s properties such
as position, size, mass, coefficient of friction, center of mass of the object. This information, denoted
by a 9-dim vector
etR9
at timestep
t
, can be accurately measured in simulation. We provide
3
this as an input to the policy, but instead of using
et
directly, we use an 8-dim embedding (called
extrinsics in [6]) zt=µ(et)which gives better generalization behavior as we show in Section 5.
Base Policy.
Our control policy
π
takes the current robot joint positions
qtR16
, the predicted
action
at1R16
at the last timestep, together with the extrinsics vector
ztR8
as input, and
outputs the target of the PD Controller (denoted as
at
). We also augment the observation to include
two additional timesteps to have the velocity and acceleration information. Formally, the base policy
output at=π(ot,zt)where ot= (qt2:t,at3:t1)R96.
Reward Function.
We jointly optimize the policy
π
and the embedding
µ
using PPO [
33
]. The
reward function depends on several quantities:
ω
is the object’s angular velocity.
ˆ
k
is the desired
rotation axis (we use the
z
-axis in the hand coordinate system).
qinit
is the starting robot configuration.
τ
is the commanded torques at each timestep.
v
is the object’s linear velocity. Our reward function
r
(subscript tomitted for simplicity) to maximize is then
r.
=rrot +λpose rpose +λlinvel rlinvel +λwork rwork +λtorque rtorque (1)
where
rrot .
= max(min(ω·ˆ
k, rmax), rmin)
is the rotation reward,
rpose .
=− kqqinitk2
2
is the
hand pose deviation penalty,
rtorque .
=− kτk2
2
is the torque penalty,
rwork .
=τT˙q
is the energy
consumption penalty, and
rlinvel .
=− kvk2
2
is the object linear velocity penalty. Note that in contrast
to [
28
] which explicitly encourages at least three fingertips to always be in contact with the object,
we do not enforce any heuristic finger gaiting behaviour. Instead, a stable finger gaiting behaviour
emerges from the energy constraints and the penalty on deviation from the initial pose.
Object Initialization and Dynamics Randomization.
A good training environment has to provide
enough variety in simulation to enable generalization in the real world. In this work we find that
using cylinders with different aspect ratios and masses provides such variety. We uniformly sample
different diameters and side lengths of the cylinder.
We initialize the object and the fingers in a stable precision grasp. Instead of constructing the fingertip
positions as in [
28
], we simply randomly sample the object position, pose, and robot joint position
around a canonical grasp until a stable grasp is achieved. We also randomize the mass, center of mass,
and friction of these objects (see appendix for the details).
3.2 Adaptation Module Training
We cannot directly deploy the learned policy
π
to the real world because we do not directly observe
the vector
et
and hence we cannot compute the extrinsics
zt
. Instead, we estimate the extrinsics
vector
ˆ
zt
from the discrepancy between the proprioception history and the commanded action history
via an adaptation module
φ
. This idea is inspired by recent work in locomotion [
5
,
6
] where the
proprioception history is used to estimate the terrain properties. We show that this information can
also be used to estimate the object properties.
To train this network, we first collect trajectories and privileged information by executing the policy
at=π(ot,ˆ
zt)
with the predicted extrinsic vectors
ˆ
zt=φ(qtk:t,atk1:t1)
. Meanwhile we also
store the ground-truth extrinsic vector ztand construct a training set
B={(q(i)
tk:t,a(i)
tk1:t1,z(i)
t,ˆ
z(i)
t)}N
i=1.
Then we optimize
φ
by minimizing the
`2
distance between
zt
and
ˆ
zt
using Adam [
34
]. The process is
iterative until the loss converges. We apply the same object initialization and dynamics randomization
setting as the above section.
4 Experimental Setup and Implementation Details
Hardware Setup.
We use an Allegro Hand from Wonik Robotics [
8
]. The Allegro hand is a dexterous
anthropomorphic robot hand with four fingers, with each finger having four degrees of freedom.
These 16 joints are controlled using position control at
20 Hz
. The target position commands are
converted to torque using a PD Controller (Kp= 3.0, Kd= 0.1) at 300 Hz.
Simulation Setup.
We use the IsaacGym [
35
] simulator. During training, we use 16384 parallel
environments to collection samples for training the agent. Each environment contains a simulated
Allegro Hand and an cylindrical object with different shape and physical properties (the exact
parameters are in the supplementary material). The simulation frequency is
120 Hz
and the control
frequency is 20 Hz. Each episode lasts for 400 control steps (equivalent to 20 s).
4
摘要:

In-HandObjectRotationviaRapidMotorAdaptationHaozhiQi,1,2AshishKumar,1RobertoCalandra2YiMa1JitendraMalik1,21UCBerkeley2MetaAIhttps://haozhi.io/hora/Figure1:Left:Ourcontrolleristrainedonlyinsimulationonsimplecylindricalobjectsofdifferentsizesandweights.Right:Withoutanyrealworldne-tuning,thecontroll...

展开>> 收起<<
In-Hand Object Rotation via Rapid Motor Adaptation Haozhi Qi12Ashish Kumar1Roberto Calandra2Yi Ma1Jitendra Malik12 1UC Berkeley2Meta AI.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:8.29MB 格式:PDF 时间:2025-05-05

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注