DEX allows humans to seamlessly provide robots with high-
quality demonstration data through a low-latency observa-
tional feedback system. HOLO-DEX offers three benefits:
(a) Compared to self-supervised data collection methods,
it allows for rapid training without reward specification as
it is built on powerful imitation learning techniques; (b)
Compared to Sim2Real approaches, our learned policies are
directly executable on real robots since they are trained on
real data; (c) Compared to other imitation approaches, it
significantly reduces the need for domain expertise since
even untrained humans can operate VR devices.
We experimentally evaluate HOLO-DEX on six dexterous
manipulation tasks that require performing complex, contact-
rich behavior. These tasks range from in-hand object manip-
ulation to single-handed bottle opening. Across our tasks, we
find that a teacher can provide demonstrations at an average
of 60sper demonstration using HOLO-DEX, which is 1.8×
faster than prior work in single-image teleoperation [28].
On 4/6tasks, HOLO-DEX can learn policies that achieve
>90% success rates. Surprisingly, we find that the dexterous
policies learned through HOLO-DEX can generalize on new,
previously unseen objects.
In summary, this work presents HOLO-DEX, a new frame-
work for dexterous imitation learning with the following
contributions. First, we demonstrate that high-quality tele-
operation can be achieved by immersing human teachers
in mixed reality through inexpensive VR headsets. Second,
we experimentally show that the demonstrations collected
by HOLO-DEX can be used to train effective, and general-
purpose dexterous manipulation behaviors. Third, we analyze
and ablate HOLO-DEX over various decisions such as the
choice of hand tracker and imitation learning methods. Fi-
nally, we will release the mixed reality API, demonstrations
collected, and training code associated with HOLO-DEX on
https://holo-dex.github.io/.
II. RELATED WORK
Our framework builds upon several important works in
robot learning, imitation learning, teleoperation and dexter-
ous manipulation. In this section, we briefly describe prior
research that is most relevant to ours.
A. Methodologies for Teaching Robots
There are several approaches one can take to teach robots.
Reinforcement Learning (RL) [30,31,32] can train policies
to maximize rewards while collecting data in an automated
manner. This process often requires a roboticist to spec-
ify the reward function along with ensuring safety during
self-supervised data collection [16,15]. Furthermore, such
approaches are often sample-inefficient and might require
extensive simulation training for optimizing complex skills.
Simulation to Real (Sim2Real) approaches focus on train-
ing RL policies in simulation, followed by transferring to the
real robot [22,33,34]. Such a methodology of robot training
has received significant success owing to the improvements
in modern robot simulators. Sim2Real still requires signifi-
cant human involvement as every task needs to be carefully
modeled in the simulator. Moreover, even during training
special techniques are required to ensure that the resulting
policies can transfer to the real robot [21,35,36,37].
Imitation learning approaches focus on training poli-
cies from demonstrations provided by an expert. Behavior
Cloning (BC) is an offline technique that trains a pol-
icy to imitate the expert behavior in a supervised man-
ner [24,38,39,40]. Recently, non-parametric imitation
approaches have shown promise in learning from fewer
demonstrations [41,42,28]. Another set of imitation learning
is Inverse Reinforcement Learning (IRL) [43,25,44]. Here,
a reward function is inferred from demonstrations, followed
by using RL to optimize the inferred reward. While HOLO-
DEX is geared towards offline imitation, the demonstrations
we collect are compatible with IRL approaches as well.
B. Dexterous Teleoperation Frameworks
To effectively use imitation learning for dexterous ma-
nipulation we need to obtain accurate hand poses from
a human teacher. There are several approaches to gather
demonstrations for dexterous tasks. Using a custom glove
to measure a user’s hand movements such as CyberGlove
[29,45] or Shadow Dexterous Glove [46] has been a popular
solution. However, although such gloves have high accuracy,
they can be expensive and require significant calibration
effort. Vision-based hand pose detectors have shown promise
for dexterous tasks. Some examples include using multiple
RGBD [27], single depth [47], RGB [28], and RGBD [37]
images. However, such methods either require custom cal-
ibration procedures [27] or suffer from occlusion-related
issues when using single cameras [28]. Recently, a new
generation of VR headsets has enabled advanced multi-
camera hand pose detection [48] that gave promising results
in [49,50]. This enhancement provides a robust solution
that is significantly cheaper compared to CyberGlove and
requires little calibration. While VR tools have been used
to collect demonstrations [51,52] for low-dimensional end-
effector control, HOLO-DEX shows that the VR headsets can
be used for high-dimensional control in augmented reality.
Concurrent to our work, Radosavovic et al. [53] also show
that hand tracking from VR can be used to teleoperate robot
hands albeit without using mixed reality.
C. Dexterous Manipulation
Due to its high-dimensional action space, learning com-
plex skills with dexterous multi-fingered robot hand has been
a longstanding challenge [54,55,56,57]. Model-based RL
and control approaches have demonstrated significant success
on tasks such as spinning objects and in-hand manipula-
tion [58,59]. Similarly, model-free RL approaches have
shown that Sim2Real can enable impressive skills such as
in-hand cube rotation and Rubik’s cube face turning [20,21].
However, both learning approaches requires hand-designing
reward functions along with system identification [58] or
task-specific training procedures [21]. Coupled with long
training times, often requiring weeks [20,21], they make
dexterous manipulation difficult to scale for general tasks.