2YILDIRIM AND UGUR
Kitani, Ziebart, Bagnell, & Hebert, 2012; Kuderer et al.,
2012; Vasquez et al., 2014). Given perfect expert demonstra-
tions, IRL attempts to identify the underlying reward struc-
ture, which can be used by any Reinforcement Learning (RL)
algorithm to create a human-aware navigation policy. The
advantage of this approach is that the reward function is not
manually determined but is a linear combination of a set of
predefined features. However, the linearity assumption is
considered a strong assumption in Wulfmeier, Ondruska, and
Posner (2015).
Nonlinear rewards can better describe complex behaviors
in many real-world problems (Levine, Popovic, & Koltun,
2011). Researchers have been using deep learning tech-
niques to leverage this potential in social navigation. In
Chen, Everett, Liu, and How (2017), Deep Reinforcement
Learning was used to obtain a socially plausible navigation
policy. As with other RL approaches, this procedure relies
on a predefined reward. Similarly, Wulfmeier et al. (2015)
extracted nonlinear rewards, assuming that the features shap-
ing the reward function are known. On the other hand, Im-
itation Learning attempts to learn policies directly from the
data, relaxing assumptions about the reward or its features.
In Tai, Zhang, Liu, and Burgard (2018) and Gupta, Johnson,
Fei-Fei, Savarese, and Alahi (2018), Generative Adversarial
Networks were used for direct policy learning from demon-
strations. These approaches provided advanced solutions to
overcome the limitations mentioned above. However, these
models required too much data for training (Che, Okamura,
& Sadigh, 2020). On the other hand, learning from a small
data set and generalizing to new configurations are desirable.
We also observe that most of the studies on the social nav-
igation domain target only designing/learning the local con-
trollers of the robots since they are responsible for producing
motion commands. However, using only the local controller
makes the robot vulnerable to the local minima problems
(Koren & Borenstein, 1991), and might fail to navigate the
robot to its target position. Today, typical robotic navigation
systems adopt the two-layered hierarchical approach for path
planning tasks (Orebäck & Christensen, 2003). A robot first
calculates a trajectory in the so-called global planning phase
given an environment. Then, the robot follows the computed
trajectory with a controller in the so-called local planning
phase.
This paper proposes a novel approach built on top of state-
of-the-art neural network architecture, namely Conditional
Neural Processes (CNPs) (Garnelo et al., 2018). Given mul-
tiple demonstrations of a task, CNPs can encode complex
multi-modal trajectories. CNPs extract prior knowledge di-
rectly from training data by sampling observations and pre-
dicting a conditional distribution over any other target points.
Taking advantage of these capabilities, we extended CNPs
in two dimensions: to generate complete navigation trajec-
tories in the global planning phase and to generate goal-
directed behaviors while actively avoiding pedestrians in the
local planning phase. At both levels, our approaches produce
trajectories that show the characteristics of the demonstrated
ones. They can learn complex, nonlinear, and temporal re-
lationships associated with external parameters and goals.
Like other neural network-based learning systems, our sys-
tem may fail to generate trajectories or control signals when
it faces very different situations from the experienced ones,
i.e., when it is required to extrapolate to novel situations out-
side the training range. To detect and react to conditions that
may lead to failure, we propose continuously monitoring the
environment using a failure prediction system, detecting situ-
ations outside the training range, and falling back to a hand-
crafted reactive controller in case extrapolation is detected.
We verified our system in a simulated mobile robot in differ-
ent environments with static and moving pedestrians.
In the rest of this paper, we first give a literature review
explaining the concepts used throughout this study. Then, we
introduce our architecture in detail and elaborate on each sys-
tem module. Later, we present our experiments and results.
Finally, we conclude with a summary and future directions.
Related Work
Hybrid Path Planning
Traditionally, approaches to solving the path planning
problem can be divided into two categories based on the en-
vironmental knowledge used: deliberate and reactive plan-
ning (Orebäck & Christensen, 2003). Deliberate planners use
environmental knowledge through static maps and compute
the robot’s trajectory before execution. On the other hand,
reactive planners rely on sensory information to deal with
obstacles in a local frame around the robot. Both approaches
have advantages and disadvantages. Affected by the hybrid
deliberate/reactive paradigm (Dudek & Jenkin, 2010), a hy-
brid controller combining the two approaches has become a
well-established approach to solving the path planning prob-
lem in recent years, (Murphy, 2019). In the following, we
provide an overview of the two important building blocks of
typical hybrid path planners, global and local planners, and
the concept of social navigation.
Global Path Planning
In the first phase of standard hierarchical path planning,
a global planning procedure is applied. On the static map
of the environment, the task of a global planner is to create
a path from the starting position to the destination. Global
planners use utility functions to assign costs to the possible
navigation trajectories they find. The use of utility functions
allows the global planner to choose the trajectory with the
desired properties, such as optimal length or time.
Prior to social navigation improvements, utility functions
overlook the social aspects of the navigation trajectories of