Holo-Dex Teaching Dexterity with Immersive Mixed Reality Sridhar Pandian Arunachalam New York UniversityIrmak G uzey

2025-04-27 0 0 8.47MB 10 页 10玖币
侵权投诉
Holo-Dex: Teaching Dexterity with Immersive Mixed Reality
Sridhar Pandian Arunachalam
New York University
Irmak G¨
uzey
New York University
Soumith Chintala
Meta AI
Lerrel Pinto
New York University
(a) Demonstration collection in mixed reality (b) Learned dexterous policies
Fig. 1: We present HOLO-DEX, a framework that (a) collects high-quality demonstration data by placing human teachers in an immersive
mixed reality world, and then (b) learns visual policies from a handful of these demonstrations to solve dexterous manipulation tasks.
Abstract A fundamental challenge in teaching robots is to
provide an effective interface for human teachers to demon-
strate useful skills to a robot. This challenge is exacerbated
in dexterous manipulation, where teaching high-dimensional,
contact-rich behaviors often require esoteric teleoperation tools.
In this work, we present HOLO-DEX, a framework for dexter-
ous manipulation that places a teacher in an immersive mixed
reality through commodity VR headsets. The high-fidelity hand
pose estimator onboard the headset is used to teleoperate the
robot and collect demonstrations for a variety of general-
purpose dexterous tasks. Given these demonstrations, we use
powerful feature learning combined with non-parametric imi-
tation to train dexterous skills. Our experiments on six common
dexterous tasks, including in-hand rotation, spinning, and bottle
opening, indicate that HOLO-DEX can both collect high-quality
demonstration data and train skills in a matter of hours. Finally,
we find that our trained skills can exhibit generalization on
objects not seen in training. Videos of HOLO-DEX are available
on https://holo-dex.github.io/.
I. INTRODUCTION
Learning-based methods have had a transformational ef-
fect in robotics on a wide range of domains from manipula-
tion [1,2], locomotion [3,4,5], and aerial robotics [6,7,8].
Such methods often produce policies that input raw sensory
observations and output robot actions. This circumvents
challenges in developing state-estimation modules, modeling
object properties and tuning controller gains, which requires
significant domain expertise. Even with the steep progress
in robot learning, we are still long way off from dexterous
robots that can solve arbitrary robot tasks akin to methods
Correspondence to sridhar@nyu.edu.
in game play [9,10], text generation [11,12] or few-shot
vision [13,14].
To understand what might be missing in robot learning, we
need to ask a central question: How do we collect training
data for our robots? One option is to collect data on the robot
through self-supervised data collection strategies. While this
results in robust behaviors [15,16,17,18], they often require
extensive real-world interactions in the order of thousands
of hours even for relatively simple manipulation tasks [19].
An alternate option is to train on simulated data and then
transfer to the real robot (Sim2Real). This allows for learning
complex robotic behaviors multiple orders of magnitude
faster than on-robot learning [20,21]. However, setting up
simulated robot environments and specifying simulator pa-
rameters often requires extensive domain expertise [22,23].
A third, more practical option to collect data is by
asking human teachers to provide demonstrations [24,25].
Robots can then be trained to quickly imitate the demon-
strated data. Such imitation methods have recently shown
promise in a variety of challenging dexterous manipulation
problems [26,27,28]. However, there lies a fundamental
limitation in most of these works – collecting high-quality
demonstration data for dexterous robots is hard! They either
require expensive gloves [29], extensive calibration [27], or
suffer from monocular occlusions [28].
In this work, we present HOLO-DEX, a new framework
to collect demonstration data and train dexterous robots. It
uses VR headsets (e.g. Quest 2) to put human teachers in an
immersive virtual world. In this virtual world, the teacher can
view a robotic scene from the eyes of a robot, and control
it using their hands through inbuilt pose detectors. HOLO-
arXiv:2210.06463v1 [cs.RO] 12 Oct 2022
DEX allows humans to seamlessly provide robots with high-
quality demonstration data through a low-latency observa-
tional feedback system. HOLO-DEX offers three benefits:
(a) Compared to self-supervised data collection methods,
it allows for rapid training without reward specification as
it is built on powerful imitation learning techniques; (b)
Compared to Sim2Real approaches, our learned policies are
directly executable on real robots since they are trained on
real data; (c) Compared to other imitation approaches, it
significantly reduces the need for domain expertise since
even untrained humans can operate VR devices.
We experimentally evaluate HOLO-DEX on six dexterous
manipulation tasks that require performing complex, contact-
rich behavior. These tasks range from in-hand object manip-
ulation to single-handed bottle opening. Across our tasks, we
find that a teacher can provide demonstrations at an average
of 60sper demonstration using HOLO-DEX, which is 1.8×
faster than prior work in single-image teleoperation [28].
On 4/6tasks, HOLO-DEX can learn policies that achieve
>90% success rates. Surprisingly, we find that the dexterous
policies learned through HOLO-DEX can generalize on new,
previously unseen objects.
In summary, this work presents HOLO-DEX, a new frame-
work for dexterous imitation learning with the following
contributions. First, we demonstrate that high-quality tele-
operation can be achieved by immersing human teachers
in mixed reality through inexpensive VR headsets. Second,
we experimentally show that the demonstrations collected
by HOLO-DEX can be used to train effective, and general-
purpose dexterous manipulation behaviors. Third, we analyze
and ablate HOLO-DEX over various decisions such as the
choice of hand tracker and imitation learning methods. Fi-
nally, we will release the mixed reality API, demonstrations
collected, and training code associated with HOLO-DEX on
https://holo-dex.github.io/.
II. RELATED WORK
Our framework builds upon several important works in
robot learning, imitation learning, teleoperation and dexter-
ous manipulation. In this section, we briefly describe prior
research that is most relevant to ours.
A. Methodologies for Teaching Robots
There are several approaches one can take to teach robots.
Reinforcement Learning (RL) [30,31,32] can train policies
to maximize rewards while collecting data in an automated
manner. This process often requires a roboticist to spec-
ify the reward function along with ensuring safety during
self-supervised data collection [16,15]. Furthermore, such
approaches are often sample-inefficient and might require
extensive simulation training for optimizing complex skills.
Simulation to Real (Sim2Real) approaches focus on train-
ing RL policies in simulation, followed by transferring to the
real robot [22,33,34]. Such a methodology of robot training
has received significant success owing to the improvements
in modern robot simulators. Sim2Real still requires signifi-
cant human involvement as every task needs to be carefully
modeled in the simulator. Moreover, even during training
special techniques are required to ensure that the resulting
policies can transfer to the real robot [21,35,36,37].
Imitation learning approaches focus on training poli-
cies from demonstrations provided by an expert. Behavior
Cloning (BC) is an offline technique that trains a pol-
icy to imitate the expert behavior in a supervised man-
ner [24,38,39,40]. Recently, non-parametric imitation
approaches have shown promise in learning from fewer
demonstrations [41,42,28]. Another set of imitation learning
is Inverse Reinforcement Learning (IRL) [43,25,44]. Here,
a reward function is inferred from demonstrations, followed
by using RL to optimize the inferred reward. While HOLO-
DEX is geared towards offline imitation, the demonstrations
we collect are compatible with IRL approaches as well.
B. Dexterous Teleoperation Frameworks
To effectively use imitation learning for dexterous ma-
nipulation we need to obtain accurate hand poses from
a human teacher. There are several approaches to gather
demonstrations for dexterous tasks. Using a custom glove
to measure a user’s hand movements such as CyberGlove
[29,45] or Shadow Dexterous Glove [46] has been a popular
solution. However, although such gloves have high accuracy,
they can be expensive and require significant calibration
effort. Vision-based hand pose detectors have shown promise
for dexterous tasks. Some examples include using multiple
RGBD [27], single depth [47], RGB [28], and RGBD [37]
images. However, such methods either require custom cal-
ibration procedures [27] or suffer from occlusion-related
issues when using single cameras [28]. Recently, a new
generation of VR headsets has enabled advanced multi-
camera hand pose detection [48] that gave promising results
in [49,50]. This enhancement provides a robust solution
that is significantly cheaper compared to CyberGlove and
requires little calibration. While VR tools have been used
to collect demonstrations [51,52] for low-dimensional end-
effector control, HOLO-DEX shows that the VR headsets can
be used for high-dimensional control in augmented reality.
Concurrent to our work, Radosavovic et al. [53] also show
that hand tracking from VR can be used to teleoperate robot
hands albeit without using mixed reality.
C. Dexterous Manipulation
Due to its high-dimensional action space, learning com-
plex skills with dexterous multi-fingered robot hand has been
a longstanding challenge [54,55,56,57]. Model-based RL
and control approaches have demonstrated significant success
on tasks such as spinning objects and in-hand manipula-
tion [58,59]. Similarly, model-free RL approaches have
shown that Sim2Real can enable impressive skills such as
in-hand cube rotation and Rubik’s cube face turning [20,21].
However, both learning approaches requires hand-designing
reward functions along with system identification [58] or
task-specific training procedures [21]. Coupled with long
training times, often requiring weeks [20,21], they make
dexterous manipulation difficult to scale for general tasks.
摘要:

Holo-Dex:TeachingDexteritywithImmersiveMixedRealitySridharPandianArunachalamNewYorkUniversityIrmakG¨uzeyNewYorkUniversitySoumithChintalaMetaAILerrelPintoNewYorkUniversityFig.1:WepresentHOLO-DEX,aframeworkthat(a)collectshigh-qualitydemonstrationdatabyplacinghumanteachersinanimmersivemixedrealityworld...

展开>> 收起<<
Holo-Dex Teaching Dexterity with Immersive Mixed Reality Sridhar Pandian Arunachalam New York UniversityIrmak G uzey.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:8.47MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注