GoalsEye Learning High Speed Precision Table Tennis on a Physical Robot Tianli Ding12 Laura Graesser12 Saminda Abeyruwan1 David B. DAmbrosio1

2025-05-01 1 0 5.79MB 10 页 10玖币

侵权投诉

GoalsEye: Learning High Speed Precision Table Tennis on a Physical

Robot

Tianli Ding1,2, Laura Graesser1,2, Saminda Abeyruwan1, David B. D’Ambrosio1,

Anish Shankar1, Pierre Sermanet1, Pannag R. Sanketi1,3, Corey Lynch1,2,3

Abstract— Learning goal conditioned control in the real

world is a challenging open problem in robotics. Reinforcement

learning systems have the potential to learn autonomously via

trial-and-error, but in practice the costs of manual reward

design, ensuring safe exploration, and hyperparameter tuning

are often enough to preclude real world deployment. Imitation

learning approaches, on the other hand, offer a simple way to

learn control in the real world, but typically require costly cu-

rated demonstration data and lack a mechanism for continuous

improvement. Recently, iterative imitation techniques have been

shown to learn goal directed control from undirected demon-

stration data, and improve continuously via self-supervised goal

reaching, but results thus far have been limited to simulated

environments. In this work, we present evidence that iterative

imitation learning can scale to goal-directed behavior on a real

robot in a dynamic setting: high speed, precision table tennis

(e.g. “land the ball on this particular target”). We ﬁnd that

this approach offers a straightforward way to do continuous

on-robot learning, without complexities such as reward design

or sim-to-real transfer. It is also scalable—sample efﬁcient

enough to train on a physical robot in just a few hours. In

real world evaluations, we ﬁnd that the resulting policy can

perform on par or better than amateur humans (with players

sampled randomly from a robotics lab) at the task of returning

the ball to speciﬁc targets on the table. Finally, we analyze

the effect of an initial undirected bootstrap dataset size on

performance, ﬁnding that a modest amount of unstructured

demonstration data provided up-front drastically speeds up

the convergence of a general purpose goal-reaching policy.

See https://sites.google.com/view/goals-eye for

videos.

I. INTRODUCTION

Robot learning has been applied to a wide range of

challenging real world tasks, including dexterous manipula-

tion [1], [2], legged locomotion [3], [4], and grasping [5],

[6]. It is less common, however, to see robotic learning

applied to dynamic, high-acceleration, high-frequency tasks

like precision table tennis (Figure 1a). Such settings put

signiﬁcant demands on a learning algorithm around safe

exploration, accuracy, and sample efﬁciency. An outstanding

question for robot learning is: can current techniques scale

to meet the hard requirements of this setting?

Consider the setup in Figure 1a: a robot must issue 8-DOF

continuous control commands in joint space at 20Hz to con-

trol an arm holding a paddle. The commanded behavior must

precisely position and orient the paddle in time and space in

order to connect with a ball ﬁred at 7 meters per second. The

right follow-through motion must be orchestrated in order to

1Robotics at Google, Google Research, Mountain View, United States.

2Corresponding authors: tding@google.com, lauragraesser@google.com,

coreylynch@google.com.

3Equal advising.

(a) Example target location shown in red x, with an error margin

deﬁned by the circle (for display purposes only, no marks are

present during training or evaluation).

(b) Goal thresholds (c) Five eval goal locations

Fig. 1: Precision Table Tennis. At test time, the robot must

hit a ball ﬁred at approximately 7m/s to a commanded

target location on the opposing side of the table.

return the ball to the other side of the table. Strictly more

difﬁcult is the problem of learning to return the ball to an

arbitrary target location on the table, e.g. “hit the back left

corner” or “land the ball just over the net on the right side”.

Imitation Learning (IL) [7] provides a simple and stable

approach to learning robot behavior, but requires access to

demonstrations. Collecting expert demonstrations of precise

goal targeting in such a high speed setting, say from teleop-

eration or kinesthetic teaching [8] is a complex engineering

problem. Attempting to learn precise table tennis by trial

and error using reinforcement learning (RL) is a similarly

difﬁcult proposition given its sample inefﬁciency and that

the random exploration that is typical at the beginning

stages of RL may damage the robot. High-frequency control

also results in long horizon episodes. These are among the

biggest challenges facing current deep RL techniques [9].

While many recent RL approaches successfully learn in

simulation, then transfer to the real world [10], [1], doing

so in this setting remains difﬁcult especially considering the

requirement of precise, dynamic control. Here we restrict

our focus to learning a hard dynamic problem directly on a

arXiv:2210.03662v2 [cs.RO] 13 Oct 2022

physical robot without involving the complexities of sim-to-

real transfer.

In this work, we consider what is the simplest way to

obtain goal conditioned control in a dynamic real world set-

ting such as precision table tennis? Can one design effective

alternatives to more intricate RL algorithms that perform well

in this difﬁcult setup? In pursuit of this question, we consider

the necessity of different components in existing goal condi-

tioned learning pipelines, both RL and IL. Surprisingly, we

ﬁnd that the synthesis of two existing techniques in iterative

self-supervised imitation learning [11], [12] indeed scales

to this setting. For ease of reference, we refer to this best

performing approach throughout as GoalsEye, a system for

high-precision goal reaching table tennis, trained with goal

conditioned behavior cloning plus self-supervised practice

(GCBC+SSP).

We ﬁnd that the essential ingredients of success are: 1)

A minimal, but non-goal-directed “bootstrap” dataset to

overcome an initial difﬁcult exploration problem [11]. 2) Re-

labeled goal conditioned imitation: GoalsEye uses simple

and sample efﬁcient relabeled behavior cloning [11], [15],

[16], to train a goal-directed policy to reach any goal state

in the dataset without reward information. 3) Iterative self-

supervised goal reaching: The agent improves continuously

by giving itself random goals, then attempting to reach them

using the current policy [12]. All attempts, including failures,

are relabeled into a continuously expanding training set.

The main contributions of this work are: 1) We introduce

a setting of high-acceleration goal directed table tennis on a

physical robot. 2) We present GoalsEye, an iterative imitation

learning system that can improve continuously in the real

world to the point where it can execute precise, dynamic goal

reaching behavior at or above amateur human performance.

Our ﬁnal system is able to control a physical robot at 20Hz

to land 40% of balls to within 20 centimeters of commanded

targets at 6.5 m/s (see https://sites.google.com/

view/goals-eye for videos). 3) We perform a large

empirical study, both in simulation and in the real world,

to determine what are the important components of success

in this setting. We note that even though we present experi-

mental results in the domain of robotic table tennis, nothing

in our recipe is speciﬁc to table tennis and can be applied in

principle to any task where a goal state can be speciﬁed at

test time.

II. RELATED WORK

Robotic table tennis. Table tennis has long served as as

a particularly difﬁcult benchmark for robotics. Research in

robotic table tennis began in 1983 with a competition that had

simpliﬁed rules and a smaller table [17]. This competition

ran from 1983 to 1993 and several systems were developed

[18], [19], [20]; see [21] for a summary of these approaches.

This problem remains far from solved.

Most approaches are model-based in that they explicitly

model the ball and robot dynamics. The Omron Forpheus

robot [22] is the current exemplar, achieving impressive

results. These methods typically consist of several steps:

identifying virtual hitting points from ball trajectories [23],

[24], [25], [26], [27], [28], [29], [30], predicting ball ve-

locities by learning from data [23], [31], [32], [24] or

through a parameterized dynamics models [21], [26], [27]

calculating target paddle orientations and velocities, and

ﬁnally generating robot trajectories leading to desired paddle

targets [21], [33], [8], [34], [35], [23], [31], [32], [24], [21],

[33], [36], [37].

A number of methods do not model the robot dynamics

explicitly. These approaches fall into two broad groups,

those that utilize expert demonstrations [33], [8], [34], [38],

[39] and those that do not [40], [27], [41], [42]. Like our

best performing method, [39] is capable of learning from

sub-optimal demonstrations. However, the approach has no

mechanism to continuously improve beyond the demonstra-

tion data. In [8], authors demonstrate a system that learns

cooperative table tennis by creating a library of primitive

motions using kinesthetic teaching to constrain learning. In

a similar spirit, we collect an initial dataset of non-goal-

directed demonstration data of how to make contact and

return the ball to bootstrap autonomous learning.

Reinforcement learning (RL) is a common approach for

table tennis methods that do not utilize demonstrations.

Methods range from framing the problem as a single-step

bandit [27] to temporarily extended policies controlling the

robot in joint space [40] using on-policy RL, to Hierarchical

RL (HRL) [69]. Of particular interest is [41], which utilizes

muscular soft robots to facilitate safe exploration and learn

RL policies from scratch on a real robot.

Goal conditioned imitation learning. While many of

the above methods have been shown to scale to undirected

table tennis, few have tackled the problem of goal-directed

table tennis. Goal directed control is an active area of robot

learning, with many recent examples in both IL and RL

[56], [11], [15], [47]. Given the complexities of even single

task real world robot learning [57] ﬁnding simple methods

that scale to goal-directed real world behavior remains an

open question. While goal-conditioned imitation learning

[11], [15] offers a simple approach to multitask control, no

instances yet have been shown to scale to hard physical

problems like the one studied in this work, being largely

validated in simulation instead. We ﬁnd surprisingly that the

simple combination of two existing IL methods [11], [12]

indeed scales to this setting, while being able to 1) learn

from less burdensome suboptimal (in the sense of being

non goal-directed) demonstrations, 2) use relabeled learning

to learn goal-reaching without rewards, and 3) continuously

self-improve beyond the initial data by using self-supervised

goal reaching.

Empirical studies in scaling robot learning. Like many

works in robot learning [58], [59], [60], ours studies em-

pirically whether existing methods scale to new and harder

robotic problems than the ones originally studied. For exam-

ple, studies such as [13] found new evidence that existing

algorithms (e.g SAC), previously only studied in simula-

tion, indeed scaled to hard problems such as real world

quadrupedal locomotion. Similarly, recent empirical studies

have shown that well motivated prior ideas did not scale

to more difﬁcult robotic setups [61], [62]. For example, the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GoalsEye:LearningHighSpeedPrecisionTableTennisonaPhysicalRobotTianliDing1;2,LauraGraesser1;2,SamindaAbeyruwan1,DavidB.D'Ambrosio1,AnishShankar1,PierreSermanet1,PannagR.Sanketi1;3,CoreyLynch1;2;3AbstractLearninggoalconditionedcontrolintherealworldisachallengingopenprobleminrobotics.Reinforcementlear...

展开>> 收起<<

GoalsEye Learning High Speed Precision Table Tennis on a Physical Robot Tianli Ding12 Laura Graesser12 Saminda Abeyruwan1 David B. DAmbrosio1.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

GoalsEye Learning High Speed Precision Table Tennis on a Physical Robot Tianli Ding12 Laura Graesser12 Saminda Abeyruwan1 David B. DAmbrosio1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: