GoalsEye Learning High Speed Precision Table Tennis on a Physical Robot Tianli Ding12 Laura Graesser12 Saminda Abeyruwan1 David B. DAmbrosio1

2025-05-01 0 0 5.79MB 10 页 10玖币
侵权投诉
GoalsEye: Learning High Speed Precision Table Tennis on a Physical
Robot
Tianli Ding1,2, Laura Graesser1,2, Saminda Abeyruwan1, David B. D’Ambrosio1,
Anish Shankar1, Pierre Sermanet1, Pannag R. Sanketi1,3, Corey Lynch1,2,3
Abstract Learning goal conditioned control in the real
world is a challenging open problem in robotics. Reinforcement
learning systems have the potential to learn autonomously via
trial-and-error, but in practice the costs of manual reward
design, ensuring safe exploration, and hyperparameter tuning
are often enough to preclude real world deployment. Imitation
learning approaches, on the other hand, offer a simple way to
learn control in the real world, but typically require costly cu-
rated demonstration data and lack a mechanism for continuous
improvement. Recently, iterative imitation techniques have been
shown to learn goal directed control from undirected demon-
stration data, and improve continuously via self-supervised goal
reaching, but results thus far have been limited to simulated
environments. In this work, we present evidence that iterative
imitation learning can scale to goal-directed behavior on a real
robot in a dynamic setting: high speed, precision table tennis
(e.g. “land the ball on this particular target”). We find that
this approach offers a straightforward way to do continuous
on-robot learning, without complexities such as reward design
or sim-to-real transfer. It is also scalable—sample efficient
enough to train on a physical robot in just a few hours. In
real world evaluations, we find that the resulting policy can
perform on par or better than amateur humans (with players
sampled randomly from a robotics lab) at the task of returning
the ball to specific targets on the table. Finally, we analyze
the effect of an initial undirected bootstrap dataset size on
performance, finding that a modest amount of unstructured
demonstration data provided up-front drastically speeds up
the convergence of a general purpose goal-reaching policy.
See https://sites.google.com/view/goals-eye for
videos.
I. INTRODUCTION
Robot learning has been applied to a wide range of
challenging real world tasks, including dexterous manipula-
tion [1], [2], legged locomotion [3], [4], and grasping [5],
[6]. It is less common, however, to see robotic learning
applied to dynamic, high-acceleration, high-frequency tasks
like precision table tennis (Figure 1a). Such settings put
significant demands on a learning algorithm around safe
exploration, accuracy, and sample efficiency. An outstanding
question for robot learning is: can current techniques scale
to meet the hard requirements of this setting?
Consider the setup in Figure 1a: a robot must issue 8-DOF
continuous control commands in joint space at 20Hz to con-
trol an arm holding a paddle. The commanded behavior must
precisely position and orient the paddle in time and space in
order to connect with a ball fired at 7 meters per second. The
right follow-through motion must be orchestrated in order to
1Robotics at Google, Google Research, Mountain View, United States.
2Corresponding authors: tding@google.com, lauragraesser@google.com,
coreylynch@google.com.
3Equal advising.
(a) Example target location shown in red x, with an error margin
defined by the circle (for display purposes only, no marks are
present during training or evaluation).
(b) Goal thresholds (c) Five eval goal locations
Fig. 1: Precision Table Tennis. At test time, the robot must
hit a ball fired at approximately 7m/s to a commanded
target location on the opposing side of the table.
return the ball to the other side of the table. Strictly more
difficult is the problem of learning to return the ball to an
arbitrary target location on the table, e.g. “hit the back left
corner” or “land the ball just over the net on the right side”.
Imitation Learning (IL) [7] provides a simple and stable
approach to learning robot behavior, but requires access to
demonstrations. Collecting expert demonstrations of precise
goal targeting in such a high speed setting, say from teleop-
eration or kinesthetic teaching [8] is a complex engineering
problem. Attempting to learn precise table tennis by trial
and error using reinforcement learning (RL) is a similarly
difficult proposition given its sample inefficiency and that
the random exploration that is typical at the beginning
stages of RL may damage the robot. High-frequency control
also results in long horizon episodes. These are among the
biggest challenges facing current deep RL techniques [9].
While many recent RL approaches successfully learn in
simulation, then transfer to the real world [10], [1], doing
so in this setting remains difficult especially considering the
requirement of precise, dynamic control. Here we restrict
our focus to learning a hard dynamic problem directly on a
arXiv:2210.03662v2 [cs.RO] 13 Oct 2022
physical robot without involving the complexities of sim-to-
real transfer.
In this work, we consider what is the simplest way to
obtain goal conditioned control in a dynamic real world set-
ting such as precision table tennis? Can one design effective
alternatives to more intricate RL algorithms that perform well
in this difficult setup? In pursuit of this question, we consider
the necessity of different components in existing goal condi-
tioned learning pipelines, both RL and IL. Surprisingly, we
find that the synthesis of two existing techniques in iterative
self-supervised imitation learning [11], [12] indeed scales
to this setting. For ease of reference, we refer to this best
performing approach throughout as GoalsEye, a system for
high-precision goal reaching table tennis, trained with goal
conditioned behavior cloning plus self-supervised practice
(GCBC+SSP).
We find that the essential ingredients of success are: 1)
A minimal, but non-goal-directed “bootstrap” dataset to
overcome an initial difficult exploration problem [11]. 2) Re-
labeled goal conditioned imitation: GoalsEye uses simple
and sample efficient relabeled behavior cloning [11], [15],
[16], to train a goal-directed policy to reach any goal state
in the dataset without reward information. 3) Iterative self-
supervised goal reaching: The agent improves continuously
by giving itself random goals, then attempting to reach them
using the current policy [12]. All attempts, including failures,
are relabeled into a continuously expanding training set.
The main contributions of this work are: 1) We introduce
a setting of high-acceleration goal directed table tennis on a
physical robot. 2) We present GoalsEye, an iterative imitation
learning system that can improve continuously in the real
world to the point where it can execute precise, dynamic goal
reaching behavior at or above amateur human performance.
Our final system is able to control a physical robot at 20Hz
to land 40% of balls to within 20 centimeters of commanded
targets at 6.5 m/s (see https://sites.google.com/
view/goals-eye for videos). 3) We perform a large
empirical study, both in simulation and in the real world,
to determine what are the important components of success
in this setting. We note that even though we present experi-
mental results in the domain of robotic table tennis, nothing
in our recipe is specific to table tennis and can be applied in
principle to any task where a goal state can be specified at
test time.
II. RELATED WORK
Robotic table tennis. Table tennis has long served as as
a particularly difficult benchmark for robotics. Research in
robotic table tennis began in 1983 with a competition that had
simplified rules and a smaller table [17]. This competition
ran from 1983 to 1993 and several systems were developed
[18], [19], [20]; see [21] for a summary of these approaches.
This problem remains far from solved.
Most approaches are model-based in that they explicitly
model the ball and robot dynamics. The Omron Forpheus
robot [22] is the current exemplar, achieving impressive
results. These methods typically consist of several steps:
identifying virtual hitting points from ball trajectories [23],
[24], [25], [26], [27], [28], [29], [30], predicting ball ve-
locities by learning from data [23], [31], [32], [24] or
through a parameterized dynamics models [21], [26], [27]
calculating target paddle orientations and velocities, and
finally generating robot trajectories leading to desired paddle
targets [21], [33], [8], [34], [35], [23], [31], [32], [24], [21],
[33], [36], [37].
A number of methods do not model the robot dynamics
explicitly. These approaches fall into two broad groups,
those that utilize expert demonstrations [33], [8], [34], [38],
[39] and those that do not [40], [27], [41], [42]. Like our
best performing method, [39] is capable of learning from
sub-optimal demonstrations. However, the approach has no
mechanism to continuously improve beyond the demonstra-
tion data. In [8], authors demonstrate a system that learns
cooperative table tennis by creating a library of primitive
motions using kinesthetic teaching to constrain learning. In
a similar spirit, we collect an initial dataset of non-goal-
directed demonstration data of how to make contact and
return the ball to bootstrap autonomous learning.
Reinforcement learning (RL) is a common approach for
table tennis methods that do not utilize demonstrations.
Methods range from framing the problem as a single-step
bandit [27] to temporarily extended policies controlling the
robot in joint space [40] using on-policy RL, to Hierarchical
RL (HRL) [69]. Of particular interest is [41], which utilizes
muscular soft robots to facilitate safe exploration and learn
RL policies from scratch on a real robot.
Goal conditioned imitation learning. While many of
the above methods have been shown to scale to undirected
table tennis, few have tackled the problem of goal-directed
table tennis. Goal directed control is an active area of robot
learning, with many recent examples in both IL and RL
[56], [11], [15], [47]. Given the complexities of even single
task real world robot learning [57] finding simple methods
that scale to goal-directed real world behavior remains an
open question. While goal-conditioned imitation learning
[11], [15] offers a simple approach to multitask control, no
instances yet have been shown to scale to hard physical
problems like the one studied in this work, being largely
validated in simulation instead. We find surprisingly that the
simple combination of two existing IL methods [11], [12]
indeed scales to this setting, while being able to 1) learn
from less burdensome suboptimal (in the sense of being
non goal-directed) demonstrations, 2) use relabeled learning
to learn goal-reaching without rewards, and 3) continuously
self-improve beyond the initial data by using self-supervised
goal reaching.
Empirical studies in scaling robot learning. Like many
works in robot learning [58], [59], [60], ours studies em-
pirically whether existing methods scale to new and harder
robotic problems than the ones originally studied. For exam-
ple, studies such as [13] found new evidence that existing
algorithms (e.g SAC), previously only studied in simula-
tion, indeed scaled to hard problems such as real world
quadrupedal locomotion. Similarly, recent empirical studies
have shown that well motivated prior ideas did not scale
to more difficult robotic setups [61], [62]. For example, the
摘要:

GoalsEye:LearningHighSpeedPrecisionTableTennisonaPhysicalRobotTianliDing1;2,LauraGraesser1;2,SamindaAbeyruwan1,DavidB.D'Ambrosio1,AnishShankar1,PierreSermanet1,PannagR.Sanketi1;3,CoreyLynch1;2;3Abstract—Learninggoalconditionedcontrolintherealworldisachallengingopenprobleminrobotics.Reinforcementlear...

展开>> 收起<<
GoalsEye Learning High Speed Precision Table Tennis on a Physical Robot Tianli Ding12 Laura Graesser12 Saminda Abeyruwan1 David B. DAmbrosio1.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:5.79MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注