physical robot without involving the complexities of sim-to-
real transfer.
In this work, we consider what is the simplest way to
obtain goal conditioned control in a dynamic real world set-
ting such as precision table tennis? Can one design effective
alternatives to more intricate RL algorithms that perform well
in this difficult setup? In pursuit of this question, we consider
the necessity of different components in existing goal condi-
tioned learning pipelines, both RL and IL. Surprisingly, we
find that the synthesis of two existing techniques in iterative
self-supervised imitation learning [11], [12] indeed scales
to this setting. For ease of reference, we refer to this best
performing approach throughout as GoalsEye, a system for
high-precision goal reaching table tennis, trained with goal
conditioned behavior cloning plus self-supervised practice
(GCBC+SSP).
We find that the essential ingredients of success are: 1)
A minimal, but non-goal-directed “bootstrap” dataset to
overcome an initial difficult exploration problem [11]. 2) Re-
labeled goal conditioned imitation: GoalsEye uses simple
and sample efficient relabeled behavior cloning [11], [15],
[16], to train a goal-directed policy to reach any goal state
in the dataset without reward information. 3) Iterative self-
supervised goal reaching: The agent improves continuously
by giving itself random goals, then attempting to reach them
using the current policy [12]. All attempts, including failures,
are relabeled into a continuously expanding training set.
The main contributions of this work are: 1) We introduce
a setting of high-acceleration goal directed table tennis on a
physical robot. 2) We present GoalsEye, an iterative imitation
learning system that can improve continuously in the real
world to the point where it can execute precise, dynamic goal
reaching behavior at or above amateur human performance.
Our final system is able to control a physical robot at 20Hz
to land 40% of balls to within 20 centimeters of commanded
targets at 6.5 m/s (see https://sites.google.com/
view/goals-eye for videos). 3) We perform a large
empirical study, both in simulation and in the real world,
to determine what are the important components of success
in this setting. We note that even though we present experi-
mental results in the domain of robotic table tennis, nothing
in our recipe is specific to table tennis and can be applied in
principle to any task where a goal state can be specified at
test time.
II. RELATED WORK
Robotic table tennis. Table tennis has long served as as
a particularly difficult benchmark for robotics. Research in
robotic table tennis began in 1983 with a competition that had
simplified rules and a smaller table [17]. This competition
ran from 1983 to 1993 and several systems were developed
[18], [19], [20]; see [21] for a summary of these approaches.
This problem remains far from solved.
Most approaches are model-based in that they explicitly
model the ball and robot dynamics. The Omron Forpheus
robot [22] is the current exemplar, achieving impressive
results. These methods typically consist of several steps:
identifying virtual hitting points from ball trajectories [23],
[24], [25], [26], [27], [28], [29], [30], predicting ball ve-
locities by learning from data [23], [31], [32], [24] or
through a parameterized dynamics models [21], [26], [27]
calculating target paddle orientations and velocities, and
finally generating robot trajectories leading to desired paddle
targets [21], [33], [8], [34], [35], [23], [31], [32], [24], [21],
[33], [36], [37].
A number of methods do not model the robot dynamics
explicitly. These approaches fall into two broad groups,
those that utilize expert demonstrations [33], [8], [34], [38],
[39] and those that do not [40], [27], [41], [42]. Like our
best performing method, [39] is capable of learning from
sub-optimal demonstrations. However, the approach has no
mechanism to continuously improve beyond the demonstra-
tion data. In [8], authors demonstrate a system that learns
cooperative table tennis by creating a library of primitive
motions using kinesthetic teaching to constrain learning. In
a similar spirit, we collect an initial dataset of non-goal-
directed demonstration data of how to make contact and
return the ball to bootstrap autonomous learning.
Reinforcement learning (RL) is a common approach for
table tennis methods that do not utilize demonstrations.
Methods range from framing the problem as a single-step
bandit [27] to temporarily extended policies controlling the
robot in joint space [40] using on-policy RL, to Hierarchical
RL (HRL) [69]. Of particular interest is [41], which utilizes
muscular soft robots to facilitate safe exploration and learn
RL policies from scratch on a real robot.
Goal conditioned imitation learning. While many of
the above methods have been shown to scale to undirected
table tennis, few have tackled the problem of goal-directed
table tennis. Goal directed control is an active area of robot
learning, with many recent examples in both IL and RL
[56], [11], [15], [47]. Given the complexities of even single
task real world robot learning [57] finding simple methods
that scale to goal-directed real world behavior remains an
open question. While goal-conditioned imitation learning
[11], [15] offers a simple approach to multitask control, no
instances yet have been shown to scale to hard physical
problems like the one studied in this work, being largely
validated in simulation instead. We find surprisingly that the
simple combination of two existing IL methods [11], [12]
indeed scales to this setting, while being able to 1) learn
from less burdensome suboptimal (in the sense of being
non goal-directed) demonstrations, 2) use relabeled learning
to learn goal-reaching without rewards, and 3) continuously
self-improve beyond the initial data by using self-supervised
goal reaching.
Empirical studies in scaling robot learning. Like many
works in robot learning [58], [59], [60], ours studies em-
pirically whether existing methods scale to new and harder
robotic problems than the ones originally studied. For exam-
ple, studies such as [13] found new evidence that existing
algorithms (e.g SAC), previously only studied in simula-
tion, indeed scaled to hard problems such as real world
quadrupedal locomotion. Similarly, recent empirical studies
have shown that well motivated prior ideas did not scale
to more difficult robotic setups [61], [62]. For example, the