In-Hand Object Rotation via Rapid Motor Adaptation Haozhi Qi12Ashish Kumar1Roberto Calandra2Yi Ma1Jitendra Malik12 1UC Berkeley2Meta AI

2025-05-05 1 0 8.29MB 16 页 10玖币

侵权投诉

In-Hand Object Rotation via Rapid Motor Adaptation

Haozhi Qi∗,1,2 Ashish Kumar∗,1 Roberto Calandra2Yi Ma1Jitendra Malik1,2

1UC Berkeley 2Meta AI

https://haozhi.io/hora/

Train in Simulation Directly Deploy in the Real World

195 g 48 g 5 g 181 g

30 g

130 g

55 g200 g

22 g 22 g 106 g 13 g

(5.9, 6.7) cm (5.0, 6.7) cm (5.0, 7.9) cm (5.5, 6.5) cm

(4.4, 4.5) cm (4.5, 5.7) cm (5.0, 6.0) cm (4.3, 5.4) cm

(7.0, 7.2) cm

(4.5, 5.8) cm

(5.6, 7.3) cm(6.5, 8.0) cm

198 g

(6.3, 8.7) cm

14 g

(4.8, 5.2) cm

25 g

(5.5, 6.7) cm

Figure 1:

Left:

Our controller is trained only in simulation on simple cylindrical objects of different

sizes and weights.

Right:

Without any real world ﬁne-tuning, the controller can be deployed to

a real robot on a diverse set of objects with different shapes, sizes and weights (object mass and

the shortest/longest diameter axis length along the ﬁngertips are shown in the ﬁgure) using only

proprioceptive information.

Website

:Emergence of natural stable ﬁnger gaits can be observed in the

learned control policy.

Abstract:

Generalized in-hand manipulation has long been an unsolved challenge

of robotics. As a small step towards this grand goal, we demonstrate how to design

and learn a simple adaptive controller to achieve in-hand object rotation using

only ﬁngertips. The controller is trained entirely in simulation on only cylindrical

objects, which then – without any ﬁne-tuning – can be directly deployed to a real

robot hand to rotate dozens of objects with diverse sizes, shapes, and weights over

the

-axis. This is achieved via rapid online adaptation of the robot’s controller

to the object properties using only proprioception history. Furthermore, natural

and stable ﬁnger gaits automatically emerge from training the control policy via

reinforcement learning. Code and more videos are available at our Website.

Keywords: In-Hand Manipulation, Object Rotation, Reinforcement Learning

1 Introduction

Humans are remarkably good at manipulating objects in-hand – they can even adapt to new objects of

different shapes, sizes, mass and materials with no apparent effort. While several works have shown

in-hand object rotation with real-world multi-ﬁngered hands for a single or a few objects [

truly generalizable in-hand manipulation remains an unsolved challenge of robotics.

In this paper, we demonstrate that it is possible to train an adaptive controller capable of rotating

diverse objects over the

-axis with the ﬁngertips of a multi-ﬁngered robot hand (Figure 1). This task

is a simpliﬁcation of the general in-hand reorientation task, yet still quite challenging for robots since

at all times the ﬁngers need to maintain a dynamic or static force closure on the object to prevent it

from falling (as it can not make use of any other supporting surface such as the palm).

∗Equal Contribution.

6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.

arXiv:2210.04887v1 [cs.RO] 10 Oct 2022

Our approach is inspired by the recent advances in legged locomotion [

] using reinforcement

learning. The core of these works is to learn a compressed representation of different terrain properties

(called extrinsics) for walking, which is jointly trained with the control policy. During deployment,

the extrinsics is estimated online and the controller can perform rapid adaptation to it. Our key

insight is that, despite the diversity of real-world objects, for the task of in-hand object rotation, the

important physical properties such as local shape, mass, and size as perceived by the ﬁngertips can

be compressed to a compact representation. Once such a compressed representation (extrinsics) of

different objects is learned, the controller can estimate it online from proprioception history and use

it to adaptively manipulate a diverse set of objects.

Speciﬁcally, we encode the object’s intrinsic properties (such as mass and size) to an extrinsics vector,

and train an adaptive policy with it as an input. The learned policy can robustly and efﬁciently rotate

different objects in simulation environments. However, we do not have access to the extrinsics when

we deploy the policy in the real world. To tackle this problem, we use the rapid motor adaptation [

]

to learn an adaptation module which estimate the extrinsic vector, using the discrepancy between

observed proprioception history and the commanded actions. This adaptation module can also be

trained solely in simulation via supervised learning. The concept of estimating physical properties

using proprioceptive history has been widely used in locomotion [

] but has not yet been

explored for in-hand manipulation.

Experimental results on a multi-ﬁnger Allegro Hand [

] show that our method can successfully

rotate over 30 objects of diverse shapes, sizes (from

4.5 cm

7.5 cm

), mass (from

5 g

200 g

), and

other physical properties (e.g., deformable or soft objects) in the real world. We also observe that an

adaptive and smooth ﬁnger gait emerges from the learning process. Our approach shows the surprising

effectiveness of using only proprioception sensory signals for adaptation to different objects, even

without the usage of vision and tactile sensing. To further understand the underlying mechanisms

of our approach, we studied the estimated extrinsics when manipulating different objects. We ﬁnd

interpretable extrinsic values that correlate to mass and scale changes, and that a low-dimensional

structure of the embedding does exist, both of which are critical to our generalization ability.

2 Related Work

Classic Control for In-Hand Manipulation.

Dexterous in-hand manipulation has been an active

research area for decades [

]. Classic control methods usually need an analytical model of the object

and robot geometry to perform motion planning for object manipulation. For example, [

] rely

on such a model to plan ﬁnger movement to rotate objects. [

] assumes objects are piece-wise

smooth and use ﬁnger-tracking to rotate objects. [

] demonstrate reorientation of different

objects in simulation by generating trajectories using optimization. There have been also attempts to

deploy systems in the real-world. For example, [

] calculates precise contact locations to plan a

sequence of contact locations for twirling objects. [

] plans over a set of predeﬁned grasp strategies

to achieve object reorientation using two multi-ﬁngered hands. [

] use throwing or external

forces to perturb the object in the air and re-grasp it. Recently, works such as [

] do in-grasp

manipulation without breaking the contact. [

] demonstrates complex object reorientation skills using

a non-anthropomorphic hand by leveraging the compliance and an accurate pose tracker. The diversity

of the objects they can manipulate is still limited due to the intrinsic complexity of the physical

world. In contrast to traditional control approaches which may use heuristics or simpliﬁed models to

solve this task, we instead use model-free reinforcement learning to train an adaptive policy, and use

adaptation to achieve generalization.

Reinforcement Learning for In-Hand Manipulation.

To get around the need of an accurate object

model and physical property measures, in the last few years, there has been a growing interest in

using reinforcement learning directly in the real-world for dexterous in-hand manipulation. [

]

learns simple in-grasp rolling for cylindrical objects. [

] learns a dynamics model and plan

over it for rotating objects on the palm. [

] use human demonstration to accelerate the learning

process. However, since reinforcement learning is very sample inefﬁcient, the learned skills are rather

simple or have limited object diversity. Although complex skills such as re-orientating a diverse set

of objects [

] and tool use [

] can be obtained in simulation, transferring the results to

real-world remains challenging. Instead of directly training a policy in the real world, our approach

learns the policy entirely in the simulator and aims to directly transfer to the real world.

Sim-to-Real Transfer via Domain Randomization.

Several works aim to train reinforcement

learning policies using a simulator and directly deploy it in a real-world system. Domain randomiza-

𝑞!"#$, 𝑎!"%&

𝑞!, 𝑎!"'

Adaptation Module

Adaptation Module Training

𝑧!

Base

Policy

(π) 𝑎!

Base Policy Training

𝑜!

Object Prop Encoder

𝑧̂!

object position, scale,

mass, center of mass,

coefﬁcient of friction

||𝑧!%%%%𝑧̂!||"

𝑜!

𝑧̂!

Deployment

Copy and Freeze

Trainable Module in Red

Copy and Freeze

(𝜇)

(𝜙)

Adaptation Module (𝜙)

20 Hz

𝑎!

Base

Policy

(π)

𝑞!"#$, 𝑎!"%&

𝑞!, 𝑎!"'

Base Policy

(π)

Figure 2: An overview of our approach at different training and deployment stages. In Base Policy

Learning, we jointly optimize

and

using PPO [

]. The observation

only contains three past

joint positions and commanded actions. Next, in Adaptation Module Learning, we freeze the policy

and use supervised learning to train

which uses proprioception and action history to estimate

the extrinsics vector

. During Deployment, the base policy

uses the extrinsics

estimated and

updated online by φ.

tion [31] varies the simulation parameters during training to expose the policy to diverse simulation

environments so that they can be robustly deployed in the real-world. Representative examples are [

]

and [

]. They leverage massive computational resources and large-scale reinforcement learning

methods to learn an agile object reorientation skills and solve Rubik’s Cube with a single robot hand.

However, they still focus only on manipulating a limited number of objects. [

] learns a ﬁnger-gaiting

behavior efﬁciently and transfers to a real robot when hand facing downwards, but they do not only

use ﬁngertips and the objects considered are all cubes. Our approach focuses on generalization to a

diverse set of objects and can be trained within a few hours.

Sim-to-Real via Adaptation.

Instead of relying on domain randomization which is agnostic to

current environment parameters, [

] performs system identiﬁcation via initial calibration, or online

adaptive control to estimate the system parameters for Sim-to-Real Transfer. However, learning the

exact physical values and alignment between simulation and real-world may be sub-optimal because

of the intrinsic inaccuracy of physics simulations. An alternative way is to learn a low dimensional

embedding which encodes the environment parameters [

] which is then used by the control

policy to act. This paradigm has enabled robust and adaptive locomotion policies. However, it is

not straightforward to apply it on the in-hand manipulation task. Our approach demonstrates how to

design the reward and training environment to enable a natural and stable controller that can transfer

to the real world.

3 Rapid Motor Adaptation for In-Hand Object Rotation

An overview of our approach is shown in Figure 2. During deployment (Figure 2, bottom), our policy

infers a low-dimensional embedding of object’s properties such as size and mass from proprioception

and action history, which is then used by our base policy to rotate the object. We ﬁrst describe how

we train a base policy with object property provided by a simulator, then we discuss how to train an

adaptation module that is capable of inferring these properties.

3.1 Base Policy Training

Privileged Information.

In this paper, privileged information refers to the object’s properties such

as position, size, mass, coefﬁcient of friction, center of mass of the object. This information, denoted

by a 9-dim vector

et∈R9

at timestep

, can be accurately measured in simulation. We provide

this as an input to the policy, but instead of using

directly, we use an 8-dim embedding (called

extrinsics in [6]) zt=µ(et)which gives better generalization behavior as we show in Section 5.

Base Policy.

Our control policy

takes the current robot joint positions

qt∈R16

, the predicted

action

at−1∈R16

at the last timestep, together with the extrinsics vector

zt∈R8

as input, and

outputs the target of the PD Controller (denoted as

). We also augment the observation to include

two additional timesteps to have the velocity and acceleration information. Formally, the base policy

output at=π(ot,zt)where ot= (qt−2:t,at−3:t−1)∈R96.

Reward Function.

We jointly optimize the policy

and the embedding

using PPO [

]. The

reward function depends on several quantities:

is the object’s angular velocity.

is the desired

rotation axis (we use the

-axis in the hand coordinate system).

qinit

is the starting robot conﬁguration.

is the commanded torques at each timestep.

is the object’s linear velocity. Our reward function

(subscript tomitted for simplicity) to maximize is then

=rrot +λpose rpose +λlinvel rlinvel +λwork rwork +λtorque rtorque (1)

where

rrot .

= max(min(ω·ˆ

k, rmax), rmin)

is the rotation reward,

rpose .

=− kq−qinitk2

is the

hand pose deviation penalty,

rtorque .

=− kτk2

is the torque penalty,

rwork .

=−τT˙q

is the energy

consumption penalty, and

rlinvel .

=− kvk2

is the object linear velocity penalty. Note that in contrast

to [

] which explicitly encourages at least three ﬁngertips to always be in contact with the object,

we do not enforce any heuristic ﬁnger gaiting behaviour. Instead, a stable ﬁnger gaiting behaviour

emerges from the energy constraints and the penalty on deviation from the initial pose.

Object Initialization and Dynamics Randomization.

A good training environment has to provide

enough variety in simulation to enable generalization in the real world. In this work we ﬁnd that

using cylinders with different aspect ratios and masses provides such variety. We uniformly sample

different diameters and side lengths of the cylinder.

We initialize the object and the ﬁngers in a stable precision grasp. Instead of constructing the ﬁngertip

positions as in [

], we simply randomly sample the object position, pose, and robot joint position

around a canonical grasp until a stable grasp is achieved. We also randomize the mass, center of mass,

and friction of these objects (see appendix for the details).

3.2 Adaptation Module Training

We cannot directly deploy the learned policy

to the real world because we do not directly observe

the vector

and hence we cannot compute the extrinsics

. Instead, we estimate the extrinsics

vector

from the discrepancy between the proprioception history and the commanded action history

via an adaptation module

. This idea is inspired by recent work in locomotion [

] where the

proprioception history is used to estimate the terrain properties. We show that this information can

also be used to estimate the object properties.

To train this network, we ﬁrst collect trajectories and privileged information by executing the policy

at=π(ot,ˆ

zt)

with the predicted extrinsic vectors

zt=φ(qt−k:t,at−k−1:t−1)

. Meanwhile we also

store the ground-truth extrinsic vector ztand construct a training set

B={(q(i)

t−k:t,a(i)

t−k−1:t−1,z(i)

t,ˆ

z(i)

t)}N

i=1.

Then we optimize

by minimizing the

distance between

and

using Adam [

]. The process is

iterative until the loss converges. We apply the same object initialization and dynamics randomization

setting as the above section.

4 Experimental Setup and Implementation Details

Hardware Setup.

We use an Allegro Hand from Wonik Robotics [

]. The Allegro hand is a dexterous

anthropomorphic robot hand with four ﬁngers, with each ﬁnger having four degrees of freedom.

These 16 joints are controlled using position control at

20 Hz

. The target position commands are

converted to torque using a PD Controller (Kp= 3.0, Kd= 0.1) at 300 Hz.

Simulation Setup.

We use the IsaacGym [

] simulator. During training, we use 16384 parallel

environments to collection samples for training the agent. Each environment contains a simulated

Allegro Hand and an cylindrical object with different shape and physical properties (the exact

parameters are in the supplementary material). The simulation frequency is

120 Hz

and the control

frequency is 20 Hz. Each episode lasts for 400 control steps (equivalent to 20 s).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

In-HandObjectRotationviaRapidMotorAdaptationHaozhiQi,1,2AshishKumar,1RobertoCalandra2YiMa1JitendraMalik1,21UCBerkeley2MetaAIhttps://haozhi.io/hora/Figure1:Left:Ourcontrolleristrainedonlyinsimulationonsimplecylindricalobjectsofdifferentsizesandweights.Right:Withoutanyrealworldne-tuning,thecontroll...

展开>> 收起<<

In-Hand Object Rotation via Rapid Motor Adaptation Haozhi Qi12Ashish Kumar1Roberto Calandra2Yi Ma1Jitendra Malik12 1UC Berkeley2Meta AI.pdf

共16页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

In-Hand Object Rotation via Rapid Motor Adaptation Haozhi Qi12Ashish Kumar1Roberto Calandra2Yi Ma1Jitendra Malik12 1UC Berkeley2Meta AI

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: