Holo-Dex Teaching Dexterity with Immersive Mixed Reality Sridhar Pandian Arunachalam New York UniversityIrmak G uzey

2025-04-27 0 0 8.47MB 10 页 10玖币

侵权投诉

Holo-Dex: Teaching Dexterity with Immersive Mixed Reality

Sridhar Pandian Arunachalam

New York University

Irmak G¨

uzey

New York University

Soumith Chintala

Meta AI

Lerrel Pinto

New York University

(a) Demonstration collection in mixed reality (b) Learned dexterous policies

Fig. 1: We present HOLO-DEX, a framework that (a) collects high-quality demonstration data by placing human teachers in an immersive

mixed reality world, and then (b) learns visual policies from a handful of these demonstrations to solve dexterous manipulation tasks.

Abstract— A fundamental challenge in teaching robots is to

provide an effective interface for human teachers to demon-

strate useful skills to a robot. This challenge is exacerbated

in dexterous manipulation, where teaching high-dimensional,

contact-rich behaviors often require esoteric teleoperation tools.

In this work, we present HOLO-DEX, a framework for dexter-

ous manipulation that places a teacher in an immersive mixed

reality through commodity VR headsets. The high-ﬁdelity hand

pose estimator onboard the headset is used to teleoperate the

robot and collect demonstrations for a variety of general-

purpose dexterous tasks. Given these demonstrations, we use

powerful feature learning combined with non-parametric imi-

tation to train dexterous skills. Our experiments on six common

dexterous tasks, including in-hand rotation, spinning, and bottle

opening, indicate that HOLO-DEX can both collect high-quality

demonstration data and train skills in a matter of hours. Finally,

we ﬁnd that our trained skills can exhibit generalization on

objects not seen in training. Videos of HOLO-DEX are available

on https://holo-dex.github.io/.

I. INTRODUCTION

Learning-based methods have had a transformational ef-

fect in robotics on a wide range of domains from manipula-

tion [1,2], locomotion [3,4,5], and aerial robotics [6,7,8].

Such methods often produce policies that input raw sensory

observations and output robot actions. This circumvents

challenges in developing state-estimation modules, modeling

object properties and tuning controller gains, which requires

signiﬁcant domain expertise. Even with the steep progress

in robot learning, we are still long way off from dexterous

robots that can solve arbitrary robot tasks akin to methods

Correspondence to sridhar@nyu.edu.

in game play [9,10], text generation [11,12] or few-shot

vision [13,14].

To understand what might be missing in robot learning, we

need to ask a central question: How do we collect training

data for our robots? One option is to collect data on the robot

through self-supervised data collection strategies. While this

results in robust behaviors [15,16,17,18], they often require

extensive real-world interactions in the order of thousands

of hours even for relatively simple manipulation tasks [19].

An alternate option is to train on simulated data and then

transfer to the real robot (Sim2Real). This allows for learning

complex robotic behaviors multiple orders of magnitude

faster than on-robot learning [20,21]. However, setting up

simulated robot environments and specifying simulator pa-

rameters often requires extensive domain expertise [22,23].

A third, more practical option to collect data is by

asking human teachers to provide demonstrations [24,25].

Robots can then be trained to quickly imitate the demon-

strated data. Such imitation methods have recently shown

promise in a variety of challenging dexterous manipulation

problems [26,27,28]. However, there lies a fundamental

limitation in most of these works – collecting high-quality

demonstration data for dexterous robots is hard! They either

require expensive gloves [29], extensive calibration [27], or

suffer from monocular occlusions [28].

In this work, we present HOLO-DEX, a new framework

to collect demonstration data and train dexterous robots. It

uses VR headsets (e.g. Quest 2) to put human teachers in an

immersive virtual world. In this virtual world, the teacher can

view a robotic scene from the eyes of a robot, and control

it using their hands through inbuilt pose detectors. HOLO-

arXiv:2210.06463v1 [cs.RO] 12 Oct 2022

DEX allows humans to seamlessly provide robots with high-

quality demonstration data through a low-latency observa-

tional feedback system. HOLO-DEX offers three beneﬁts:

(a) Compared to self-supervised data collection methods,

it allows for rapid training without reward speciﬁcation as

it is built on powerful imitation learning techniques; (b)

Compared to Sim2Real approaches, our learned policies are

directly executable on real robots since they are trained on

real data; (c) Compared to other imitation approaches, it

signiﬁcantly reduces the need for domain expertise since

even untrained humans can operate VR devices.

We experimentally evaluate HOLO-DEX on six dexterous

manipulation tasks that require performing complex, contact-

rich behavior. These tasks range from in-hand object manip-

ulation to single-handed bottle opening. Across our tasks, we

ﬁnd that a teacher can provide demonstrations at an average

of 60sper demonstration using HOLO-DEX, which is 1.8×

faster than prior work in single-image teleoperation [28].

On 4/6tasks, HOLO-DEX can learn policies that achieve

>90% success rates. Surprisingly, we ﬁnd that the dexterous

policies learned through HOLO-DEX can generalize on new,

previously unseen objects.

In summary, this work presents HOLO-DEX, a new frame-

work for dexterous imitation learning with the following

contributions. First, we demonstrate that high-quality tele-

operation can be achieved by immersing human teachers

in mixed reality through inexpensive VR headsets. Second,

we experimentally show that the demonstrations collected

by HOLO-DEX can be used to train effective, and general-

purpose dexterous manipulation behaviors. Third, we analyze

and ablate HOLO-DEX over various decisions such as the

choice of hand tracker and imitation learning methods. Fi-

nally, we will release the mixed reality API, demonstrations

collected, and training code associated with HOLO-DEX on

https://holo-dex.github.io/.

II. RELATED WORK

Our framework builds upon several important works in

robot learning, imitation learning, teleoperation and dexter-

ous manipulation. In this section, we brieﬂy describe prior

research that is most relevant to ours.

A. Methodologies for Teaching Robots

There are several approaches one can take to teach robots.

Reinforcement Learning (RL) [30,31,32] can train policies

to maximize rewards while collecting data in an automated

manner. This process often requires a roboticist to spec-

ify the reward function along with ensuring safety during

self-supervised data collection [16,15]. Furthermore, such

approaches are often sample-inefﬁcient and might require

extensive simulation training for optimizing complex skills.

Simulation to Real (Sim2Real) approaches focus on train-

ing RL policies in simulation, followed by transferring to the

real robot [22,33,34]. Such a methodology of robot training

has received signiﬁcant success owing to the improvements

in modern robot simulators. Sim2Real still requires signiﬁ-

cant human involvement as every task needs to be carefully

modeled in the simulator. Moreover, even during training

special techniques are required to ensure that the resulting

policies can transfer to the real robot [21,35,36,37].

Imitation learning approaches focus on training poli-

cies from demonstrations provided by an expert. Behavior

Cloning (BC) is an ofﬂine technique that trains a pol-

icy to imitate the expert behavior in a supervised man-

ner [24,38,39,40]. Recently, non-parametric imitation

approaches have shown promise in learning from fewer

demonstrations [41,42,28]. Another set of imitation learning

is Inverse Reinforcement Learning (IRL) [43,25,44]. Here,

a reward function is inferred from demonstrations, followed

by using RL to optimize the inferred reward. While HOLO-

DEX is geared towards ofﬂine imitation, the demonstrations

we collect are compatible with IRL approaches as well.

B. Dexterous Teleoperation Frameworks

To effectively use imitation learning for dexterous ma-

nipulation we need to obtain accurate hand poses from

a human teacher. There are several approaches to gather

demonstrations for dexterous tasks. Using a custom glove

to measure a user’s hand movements such as CyberGlove

[29,45] or Shadow Dexterous Glove [46] has been a popular

solution. However, although such gloves have high accuracy,

they can be expensive and require signiﬁcant calibration

effort. Vision-based hand pose detectors have shown promise

for dexterous tasks. Some examples include using multiple

RGBD [27], single depth [47], RGB [28], and RGBD [37]

images. However, such methods either require custom cal-

ibration procedures [27] or suffer from occlusion-related

issues when using single cameras [28]. Recently, a new

generation of VR headsets has enabled advanced multi-

camera hand pose detection [48] that gave promising results

in [49,50]. This enhancement provides a robust solution

that is signiﬁcantly cheaper compared to CyberGlove and

requires little calibration. While VR tools have been used

to collect demonstrations [51,52] for low-dimensional end-

effector control, HOLO-DEX shows that the VR headsets can

be used for high-dimensional control in augmented reality.

Concurrent to our work, Radosavovic et al. [53] also show

that hand tracking from VR can be used to teleoperate robot

hands albeit without using mixed reality.

C. Dexterous Manipulation

Due to its high-dimensional action space, learning com-

plex skills with dexterous multi-ﬁngered robot hand has been

a longstanding challenge [54,55,56,57]. Model-based RL

and control approaches have demonstrated signiﬁcant success

on tasks such as spinning objects and in-hand manipula-

tion [58,59]. Similarly, model-free RL approaches have

shown that Sim2Real can enable impressive skills such as

in-hand cube rotation and Rubik’s cube face turning [20,21].

However, both learning approaches requires hand-designing

reward functions along with system identiﬁcation [58] or

task-speciﬁc training procedures [21]. Coupled with long

training times, often requiring weeks [20,21], they make

dexterous manipulation difﬁcult to scale for general tasks.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Holo-Dex:TeachingDexteritywithImmersiveMixedRealitySridharPandianArunachalamNewYorkUniversityIrmakG¨uzeyNewYorkUniversitySoumithChintalaMetaAILerrelPintoNewYorkUniversityFig.1:WepresentHOLO-DEX,aframeworkthat(a)collectshigh-qualitydemonstrationdatabyplacinghumanteachersinanimmersivemixedrealityworld...

展开>> 收起<<

Holo-Dex Teaching Dexterity with Immersive Mixed Reality Sridhar Pandian Arunachalam New York UniversityIrmak G uzey.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Holo-Dex Teaching Dexterity with Immersive Mixed Reality Sridhar Pandian Arunachalam New York UniversityIrmak G uzey

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: