nent Analysis (PCA), and random sampling—are employed
to represent the challenging scenarios with a concise training
environment set that is feasible for an RL agent to learn from.
We denote the pipelines with three environment synthesis
approaches as SES-GAN,SES-PCA and SES-RS respectively.
We evaluate the three SES pipelines in Matterport, a dataset
of many simulated realistic household environments (which
serves as a surrogate of real world), and show that SES
improves the deployment in these environments compared
to policies trained in artificially generated environments [12],
and the best pipeline SES-GAN generates more representative
training environments and enables better deployment in Mat-
terport than the pipelines with other synthesis approaches.
II. RELATED WORK
A. Classical and Learning-Based Navigation
Mobile robot navigation has been investigated by roboti-
cists for decades [2], [3]. Classical approaches can move
robots from one point to another with a reasonable degree
of confidence that they won’t collide with any obstacles.
However, these approaches require extensive engineering to
develop in the first place and to adapt to different envi-
ronments. Moreover, when encountering an environment in
which a robot has failed or achieved suboptimal behavior
before, without re-engineering the system, the robot will
likely repeat the same mistake again.
Inspired by the success of machine learning in other
domains, roboticists have also applied machine learning to
autonomous navigation [4]. Most learning approaches to
navigation adopt an end-to-end approach, i.e., learning a
mapping from perceptual input directly to motion commands.
Such approaches require comparatively less engineering ef-
fort, and learn navigation behaviors purely from data [13],
e.g., from expert demonstrations [5], [14] or from trial and
error [6], [15]. However, these approaches often lack safety
guarantees and explainability, as provided by their classical
counterparts. Therefore, roboticists have also investigated
more structured ways of integrating learning with classical
navigation, such as learning local planners [16]–[18], terrain-
based cost functions [19], planner parameters [10], [11], driv-
ing styles [20], or social norms [7]. Their success notwith-
standing, learning-based navigation approaches inherit one
drawback from machine learning approaches in general: poor
generalizability when facing out-of-distribution data. When
deployed in the real world, especially at large scale, it is
inevitable that mobile robots will encounter scenarios that
are not included in their training distribution.
SES combats classical navigation’s inability to improve
from experience [21] and learning approaches’ poor gener-
alizability to real-world scenarios. It improves navigation by
synthesizing training environments from real-world deploy-
ment experiences.
B. Sim-to-real Transfer
Limited by the safety and efficiency requirements in the
real world, a learning-based navigation system is usually
trained in simulation. However, policies trained in simulation
can perform poorly in the real world due to the mismatch
between the simulation and the real world. This phenomenon
is commonly referred to as the sim-to-real gap.
One major source of the sim-to-real gap is the discrepan-
cies between the sensor input rendered in simulation and
the real robot’s sensors. For example, to bridge the gap
between real-world and synthetic camera images of a robotic
system, prior work has employed techniques such as pixel-
level domain adaptation, which translates synthetic images
to realistic ones at the pixel level [22], [23]. These adapted
pseudo-realistic images bridge the sim-to-real gap to some
extent, so policies learned in simulation can be executed
more successfully on real robots by adapting the images to
be more realistic. Another source of the sim-to-real gap is
caused by dynamics mismatch between simulation and the
real world e.g., due to an imperfect physics engine. A com-
mon paradigm to reduce the dynamics mismatch is Grounded
Simulation Learning (GSL), which either directly modifies
(i.e., grounds) the simulator to better match the real world
[24], or learns an action transformer that induces simulator
transitions that more closely match the real world [25], [26].
In contrast to the two sim-to-real gaps introduced above,
this work addresses a gap caused by the environmental
mismatch (e.g., differences in the configurations and shapes
of obstacles, and start-goal locations). SES can be thought of
as an environmental grounding method that minimizes the
differences in navigation environments between simulation
and the real world based on the navigation experiences
collected during real-world deployment.
III. APPROACH
In this section, We first formulate large-scale real-world
navigation as a multi-task RL problem in an unknown
navigation domain, which is defined as a distribution of
navigation tasks. Sec. III-A formally defines the navigation
task and describes how a distribution of navigation tasks
forms a navigation domain. Then, Sec. III-B and III-C
discuss the two stages of SES: real-world navigation domain
extraction from real-world deployment data and environment
synthesis that generates a representative set of navigation
tasks. The whole pipeline of SES is summarized in Alg. 1.
A. Navigation Task and Navigation Domain
We focus on a standard goal-oriented navigation task, in
which a robot navigates in a navigation environment efrom
a provided starting pose αto a goal pose β. Each navigation
task Tis instantiated as a tuple T= (e, α, β). In real-
world applications, robots are not deployed to navigate in
one single environment or with the same start and goal all the
time. Instead, actual deployments in the real world usually
entail a distribution over multiple environments with many
start and goal poses. In this case, we represent the real-
world deployment as a navigation domain preal defined as
a distribution of navigation tasks preal(T).SES generates a
new navigation domain pSES in simulation that, with a limited
number of tasks, models the distribution of tasks in preal so
that the navigation performance of policies trained in pSES