Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis Zifan Xu Anirudh Nair Xuesu Xiaoyz and Peter Stonex

2025-04-29 1 0 4.56MB 7 页 10玖币

侵权投诉

Learning Real-world Autonomous Navigation by

Self-Supervised Environment Synthesis

Zifan Xu∗, Anirudh Nair∗, Xuesu Xiao†

‡, and Peter Stone∗

Abstract— Machine learning approaches have recently en-

abled autonomous navigation for mobile robots in a data-

driven manner. Since most existing learning-based navigation

systems are trained with data generated in artiﬁcially created

training environments, during real-world deployment at scale,

it is inevitable that robots will encounter unseen scenarios,

which are out of the training distribution and therefore lead

to poor real-world performance. On the other hand, directly

training in the real world is generally unsafe and inefﬁcient. To

address this issue, we introduce Self-supervised Environment

Synthesis (SES), in which, after real-world deployment with

safety and efﬁciency requirements, autonomous mobile robots

can utilize experience from the real-world deployment, recon-

struct navigation scenarios, and synthesize representative train-

ing environments in simulation. Training in these synthesized

environments leads to improved future performance in the real

world. The effectiveness of SES at synthesizing representative

simulation environments and improving real-world navigation

performance is evaluated via a large-scale deployment in a high-

ﬁdelity, realistic simulator1and a small-scale deployment on a

physical robot.

I. INTRODUCTION

While classical navigation systems have been able to move

mobile robots from one point to another in a collision-free

manner for decades [2], [3], learning-based approaches to

navigation have recently gained traction [4] due to their

ability to learn navigation behaviors purely from data without

extensive engineering effort. For example, learned navigation

systems can learn from human demonstrations [5] or self-

supervised trial and error [6]; they can learn navigation cost

functions that consider social norms and human preferences

[7]. They can also be combined with classical navigation

systems to assure navigation safety and enable adaptive

behaviors in different scenarios [8]–[11].

Due to the expense of trial-and-error training in the real

world (e.g., safety concerns and sample efﬁciency), most

navigation behaviors are learned in artiﬁcially created en-

vironments in simulation, which may not generalize well

to the real world (see Fig. 1). Despite efforts to create

simulation environments similar to the real world or enable

efﬁcient sim-to-real transfer, it is inevitable that robots will

encounter unfamiliar scenarios, especially in large-scale real-

world deployments.

The goal of this work is to improve real-world autonomous

navigation with safety and efﬁciency requirements based on

∗Department of Computer Science, The University of Texas at Austin

†Department of Computer Science, George Mason University ‡Everyday

Robots §Sony AI {zfxu, ani.nair}@utexas.edu, xiao@gmu.edu,

pstone@cs.utexas.edu

1Due to the lack of access to large-scale real-world deployment data, we

use simulated Matterport environments [1] as a surrogate of the real world.

Fig. 1: A navigation policy trained in simulation is expected

to be deployed in completely different domains of navigation

environments in the real world (e.g., households, factories,

and parks). The policy may also face different real-world

inter-domain deployments, in which a navigation policy

learned in one real-world domain will be deployed in another.

mobile robots’ own navigation experiences during actual

deployment. These conservative, potentially suboptimal, real-

world experiences (without risky real-world exploration) may

not be sufﬁcient to directly train an RL agent, but may be

sufﬁcient to reconstruct the real-world navigation scenarios

which an RL agent can interact with and actively explore

in simulation. On the other hand, given the large amount of

real-world deployment experiences available to many robots

deployed in the ﬁeld (consider 7 million connected iRobot

Roombas vacuuming homes day to day), it is infeasible to

reconstruct all these deployment environments and train in

simulation on a daily basis.

With this motivation in mind, this paper introduces Self-

supervised Environment Synthesis (SES), which enables mo-

bile robots deployed in the ﬁeld to ﬁrst reconstruct navigation

scenarios from experiences and then synthesize a represen-

tative set of simulation environments that is feasible for RL

training. Training in these simulated environments enables

robots to learn to address real-world challenges that they are

likely to encounter.

Importantly, the distribution of real-world navigation sce-

narios is often unbalanced, including mostly trivial open

scenarios. Therefore, we use an efﬁcient strategy that ﬁlters

out the trivial scenarios by a measure of navigation difﬁculty

and focus learning on the challenging navigation scenarios.

To synthesize the training environment set from the re-

constructed challenging navigation scenarios, three different

environment synthesis approaches—Generative Adversarial

Networks (GAN), K-means clustering with Principle Compo-

arXiv:2210.04852v1 [cs.RO] 10 Oct 2022

nent Analysis (PCA), and random sampling—are employed

to represent the challenging scenarios with a concise training

environment set that is feasible for an RL agent to learn from.

We denote the pipelines with three environment synthesis

approaches as SES-GAN,SES-PCA and SES-RS respectively.

We evaluate the three SES pipelines in Matterport, a dataset

of many simulated realistic household environments (which

serves as a surrogate of real world), and show that SES

improves the deployment in these environments compared

to policies trained in artiﬁcially generated environments [12],

and the best pipeline SES-GAN generates more representative

training environments and enables better deployment in Mat-

terport than the pipelines with other synthesis approaches.

II. RELATED WORK

A. Classical and Learning-Based Navigation

Mobile robot navigation has been investigated by roboti-

cists for decades [2], [3]. Classical approaches can move

robots from one point to another with a reasonable degree

of conﬁdence that they won’t collide with any obstacles.

However, these approaches require extensive engineering to

develop in the ﬁrst place and to adapt to different envi-

ronments. Moreover, when encountering an environment in

which a robot has failed or achieved suboptimal behavior

before, without re-engineering the system, the robot will

likely repeat the same mistake again.

Inspired by the success of machine learning in other

domains, roboticists have also applied machine learning to

autonomous navigation [4]. Most learning approaches to

navigation adopt an end-to-end approach, i.e., learning a

mapping from perceptual input directly to motion commands.

Such approaches require comparatively less engineering ef-

fort, and learn navigation behaviors purely from data [13],

e.g., from expert demonstrations [5], [14] or from trial and

error [6], [15]. However, these approaches often lack safety

guarantees and explainability, as provided by their classical

counterparts. Therefore, roboticists have also investigated

more structured ways of integrating learning with classical

navigation, such as learning local planners [16]–[18], terrain-

based cost functions [19], planner parameters [10], [11], driv-

ing styles [20], or social norms [7]. Their success notwith-

standing, learning-based navigation approaches inherit one

drawback from machine learning approaches in general: poor

generalizability when facing out-of-distribution data. When

deployed in the real world, especially at large scale, it is

inevitable that mobile robots will encounter scenarios that

are not included in their training distribution.

SES combats classical navigation’s inability to improve

from experience [21] and learning approaches’ poor gener-

alizability to real-world scenarios. It improves navigation by

synthesizing training environments from real-world deploy-

ment experiences.

B. Sim-to-real Transfer

Limited by the safety and efﬁciency requirements in the

real world, a learning-based navigation system is usually

trained in simulation. However, policies trained in simulation

can perform poorly in the real world due to the mismatch

between the simulation and the real world. This phenomenon

is commonly referred to as the sim-to-real gap.

One major source of the sim-to-real gap is the discrepan-

cies between the sensor input rendered in simulation and

the real robot’s sensors. For example, to bridge the gap

between real-world and synthetic camera images of a robotic

system, prior work has employed techniques such as pixel-

level domain adaptation, which translates synthetic images

to realistic ones at the pixel level [22], [23]. These adapted

pseudo-realistic images bridge the sim-to-real gap to some

extent, so policies learned in simulation can be executed

more successfully on real robots by adapting the images to

be more realistic. Another source of the sim-to-real gap is

caused by dynamics mismatch between simulation and the

real world e.g., due to an imperfect physics engine. A com-

mon paradigm to reduce the dynamics mismatch is Grounded

Simulation Learning (GSL), which either directly modiﬁes

(i.e., grounds) the simulator to better match the real world

[24], or learns an action transformer that induces simulator

transitions that more closely match the real world [25], [26].

In contrast to the two sim-to-real gaps introduced above,

this work addresses a gap caused by the environmental

mismatch (e.g., differences in the conﬁgurations and shapes

of obstacles, and start-goal locations). SES can be thought of

as an environmental grounding method that minimizes the

differences in navigation environments between simulation

and the real world based on the navigation experiences

collected during real-world deployment.

III. APPROACH

In this section, We ﬁrst formulate large-scale real-world

navigation as a multi-task RL problem in an unknown

navigation domain, which is deﬁned as a distribution of

navigation tasks. Sec. III-A formally deﬁnes the navigation

task and describes how a distribution of navigation tasks

forms a navigation domain. Then, Sec. III-B and III-C

discuss the two stages of SES: real-world navigation domain

extraction from real-world deployment data and environment

synthesis that generates a representative set of navigation

tasks. The whole pipeline of SES is summarized in Alg. 1.

A. Navigation Task and Navigation Domain

We focus on a standard goal-oriented navigation task, in

which a robot navigates in a navigation environment efrom

a provided starting pose αto a goal pose β. Each navigation

task Tis instantiated as a tuple T= (e, α, β). In real-

world applications, robots are not deployed to navigate in

one single environment or with the same start and goal all the

time. Instead, actual deployments in the real world usually

entail a distribution over multiple environments with many

start and goal poses. In this case, we represent the real-

world deployment as a navigation domain preal deﬁned as

a distribution of navigation tasks preal(T).SES generates a

new navigation domain pSES in simulation that, with a limited

number of tasks, models the distribution of tasks in preal so

that the navigation performance of policies trained in pSES

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningReal-worldAutonomousNavigationbySelf-SupervisedEnvironmentSynthesisZifanXu,AnirudhNair,XuesuXiaoy;z,andPeterStone;xAbstractMachinelearningapproacheshaverecentlyen-abledautonomousnavigationformobilerobotsinadata-drivenmanner.Sincemostexistinglearning-basednavigationsystemsaretrainedwithda...

展开>> 收起<<

Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis Zifan Xu Anirudh Nair Xuesu Xiaoyz and Peter Stonex.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis Zifan Xu Anirudh Nair Xuesu Xiaoyz and Peter Stonex

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: