Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis Zifan Xu Anirudh Nair Xuesu Xiaoyz and Peter Stonex

2025-04-29 0 0 4.56MB 7 页 10玖币
侵权投诉
Learning Real-world Autonomous Navigation by
Self-Supervised Environment Synthesis
Zifan Xu, Anirudh Nair, Xuesu Xiao
,
, and Peter Stone
,
§
Abstract Machine learning approaches have recently en-
abled autonomous navigation for mobile robots in a data-
driven manner. Since most existing learning-based navigation
systems are trained with data generated in artificially created
training environments, during real-world deployment at scale,
it is inevitable that robots will encounter unseen scenarios,
which are out of the training distribution and therefore lead
to poor real-world performance. On the other hand, directly
training in the real world is generally unsafe and inefficient. To
address this issue, we introduce Self-supervised Environment
Synthesis (SES), in which, after real-world deployment with
safety and efficiency requirements, autonomous mobile robots
can utilize experience from the real-world deployment, recon-
struct navigation scenarios, and synthesize representative train-
ing environments in simulation. Training in these synthesized
environments leads to improved future performance in the real
world. The effectiveness of SES at synthesizing representative
simulation environments and improving real-world navigation
performance is evaluated via a large-scale deployment in a high-
fidelity, realistic simulator1and a small-scale deployment on a
physical robot.
I. INTRODUCTION
While classical navigation systems have been able to move
mobile robots from one point to another in a collision-free
manner for decades [2], [3], learning-based approaches to
navigation have recently gained traction [4] due to their
ability to learn navigation behaviors purely from data without
extensive engineering effort. For example, learned navigation
systems can learn from human demonstrations [5] or self-
supervised trial and error [6]; they can learn navigation cost
functions that consider social norms and human preferences
[7]. They can also be combined with classical navigation
systems to assure navigation safety and enable adaptive
behaviors in different scenarios [8]–[11].
Due to the expense of trial-and-error training in the real
world (e.g., safety concerns and sample efficiency), most
navigation behaviors are learned in artificially created en-
vironments in simulation, which may not generalize well
to the real world (see Fig. 1). Despite efforts to create
simulation environments similar to the real world or enable
efficient sim-to-real transfer, it is inevitable that robots will
encounter unfamiliar scenarios, especially in large-scale real-
world deployments.
The goal of this work is to improve real-world autonomous
navigation with safety and efficiency requirements based on
Department of Computer Science, The University of Texas at Austin
Department of Computer Science, George Mason University Everyday
Robots §Sony AI {zfxu, ani.nair}@utexas.edu, xiao@gmu.edu,
pstone@cs.utexas.edu
1Due to the lack of access to large-scale real-world deployment data, we
use simulated Matterport environments [1] as a surrogate of the real world.
Fig. 1: A navigation policy trained in simulation is expected
to be deployed in completely different domains of navigation
environments in the real world (e.g., households, factories,
and parks). The policy may also face different real-world
inter-domain deployments, in which a navigation policy
learned in one real-world domain will be deployed in another.
mobile robots’ own navigation experiences during actual
deployment. These conservative, potentially suboptimal, real-
world experiences (without risky real-world exploration) may
not be sufficient to directly train an RL agent, but may be
sufficient to reconstruct the real-world navigation scenarios
which an RL agent can interact with and actively explore
in simulation. On the other hand, given the large amount of
real-world deployment experiences available to many robots
deployed in the field (consider 7 million connected iRobot
Roombas vacuuming homes day to day), it is infeasible to
reconstruct all these deployment environments and train in
simulation on a daily basis.
With this motivation in mind, this paper introduces Self-
supervised Environment Synthesis (SES), which enables mo-
bile robots deployed in the field to first reconstruct navigation
scenarios from experiences and then synthesize a represen-
tative set of simulation environments that is feasible for RL
training. Training in these simulated environments enables
robots to learn to address real-world challenges that they are
likely to encounter.
Importantly, the distribution of real-world navigation sce-
narios is often unbalanced, including mostly trivial open
scenarios. Therefore, we use an efficient strategy that filters
out the trivial scenarios by a measure of navigation difficulty
and focus learning on the challenging navigation scenarios.
To synthesize the training environment set from the re-
constructed challenging navigation scenarios, three different
environment synthesis approaches—Generative Adversarial
Networks (GAN), K-means clustering with Principle Compo-
arXiv:2210.04852v1 [cs.RO] 10 Oct 2022
nent Analysis (PCA), and random sampling—are employed
to represent the challenging scenarios with a concise training
environment set that is feasible for an RL agent to learn from.
We denote the pipelines with three environment synthesis
approaches as SES-GAN,SES-PCA and SES-RS respectively.
We evaluate the three SES pipelines in Matterport, a dataset
of many simulated realistic household environments (which
serves as a surrogate of real world), and show that SES
improves the deployment in these environments compared
to policies trained in artificially generated environments [12],
and the best pipeline SES-GAN generates more representative
training environments and enables better deployment in Mat-
terport than the pipelines with other synthesis approaches.
II. RELATED WORK
A. Classical and Learning-Based Navigation
Mobile robot navigation has been investigated by roboti-
cists for decades [2], [3]. Classical approaches can move
robots from one point to another with a reasonable degree
of confidence that they won’t collide with any obstacles.
However, these approaches require extensive engineering to
develop in the first place and to adapt to different envi-
ronments. Moreover, when encountering an environment in
which a robot has failed or achieved suboptimal behavior
before, without re-engineering the system, the robot will
likely repeat the same mistake again.
Inspired by the success of machine learning in other
domains, roboticists have also applied machine learning to
autonomous navigation [4]. Most learning approaches to
navigation adopt an end-to-end approach, i.e., learning a
mapping from perceptual input directly to motion commands.
Such approaches require comparatively less engineering ef-
fort, and learn navigation behaviors purely from data [13],
e.g., from expert demonstrations [5], [14] or from trial and
error [6], [15]. However, these approaches often lack safety
guarantees and explainability, as provided by their classical
counterparts. Therefore, roboticists have also investigated
more structured ways of integrating learning with classical
navigation, such as learning local planners [16]–[18], terrain-
based cost functions [19], planner parameters [10], [11], driv-
ing styles [20], or social norms [7]. Their success notwith-
standing, learning-based navigation approaches inherit one
drawback from machine learning approaches in general: poor
generalizability when facing out-of-distribution data. When
deployed in the real world, especially at large scale, it is
inevitable that mobile robots will encounter scenarios that
are not included in their training distribution.
SES combats classical navigation’s inability to improve
from experience [21] and learning approaches’ poor gener-
alizability to real-world scenarios. It improves navigation by
synthesizing training environments from real-world deploy-
ment experiences.
B. Sim-to-real Transfer
Limited by the safety and efficiency requirements in the
real world, a learning-based navigation system is usually
trained in simulation. However, policies trained in simulation
can perform poorly in the real world due to the mismatch
between the simulation and the real world. This phenomenon
is commonly referred to as the sim-to-real gap.
One major source of the sim-to-real gap is the discrepan-
cies between the sensor input rendered in simulation and
the real robot’s sensors. For example, to bridge the gap
between real-world and synthetic camera images of a robotic
system, prior work has employed techniques such as pixel-
level domain adaptation, which translates synthetic images
to realistic ones at the pixel level [22], [23]. These adapted
pseudo-realistic images bridge the sim-to-real gap to some
extent, so policies learned in simulation can be executed
more successfully on real robots by adapting the images to
be more realistic. Another source of the sim-to-real gap is
caused by dynamics mismatch between simulation and the
real world e.g., due to an imperfect physics engine. A com-
mon paradigm to reduce the dynamics mismatch is Grounded
Simulation Learning (GSL), which either directly modifies
(i.e., grounds) the simulator to better match the real world
[24], or learns an action transformer that induces simulator
transitions that more closely match the real world [25], [26].
In contrast to the two sim-to-real gaps introduced above,
this work addresses a gap caused by the environmental
mismatch (e.g., differences in the configurations and shapes
of obstacles, and start-goal locations). SES can be thought of
as an environmental grounding method that minimizes the
differences in navigation environments between simulation
and the real world based on the navigation experiences
collected during real-world deployment.
III. APPROACH
In this section, We first formulate large-scale real-world
navigation as a multi-task RL problem in an unknown
navigation domain, which is defined as a distribution of
navigation tasks. Sec. III-A formally defines the navigation
task and describes how a distribution of navigation tasks
forms a navigation domain. Then, Sec. III-B and III-C
discuss the two stages of SES: real-world navigation domain
extraction from real-world deployment data and environment
synthesis that generates a representative set of navigation
tasks. The whole pipeline of SES is summarized in Alg. 1.
A. Navigation Task and Navigation Domain
We focus on a standard goal-oriented navigation task, in
which a robot navigates in a navigation environment efrom
a provided starting pose αto a goal pose β. Each navigation
task Tis instantiated as a tuple T= (e, α, β). In real-
world applications, robots are not deployed to navigate in
one single environment or with the same start and goal all the
time. Instead, actual deployments in the real world usually
entail a distribution over multiple environments with many
start and goal poses. In this case, we represent the real-
world deployment as a navigation domain preal defined as
a distribution of navigation tasks preal(T).SES generates a
new navigation domain pSES in simulation that, with a limited
number of tasks, models the distribution of tasks in preal so
that the navigation performance of policies trained in pSES
摘要:

LearningReal-worldAutonomousNavigationbySelf-SupervisedEnvironmentSynthesisZifanXu,AnirudhNair,XuesuXiaoy;z,andPeterStone;xAbstract—Machinelearningapproacheshaverecentlyen-abledautonomousnavigationformobilerobotsinadata-drivenmanner.Sincemostexistinglearning-basednavigationsystemsaretrainedwithda...

展开>> 收起<<
Learning Real-world Autonomous Navigation by Self-Supervised Environment Synthesis Zifan Xu Anirudh Nair Xuesu Xiaoyz and Peter Stonex.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:4.56MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注