Learning Social Navigation from Demonstrations with Conditional Neural Processes Yigit Yildirim and Emre Ugur

2025-04-29 0 0 5.09MB 19 页 10玖币
侵权投诉
Learning Social Navigation from Demonstrations with Conditional
Neural Processes
Yigit Yildirim and Emre Ugur
Computer Engineering Department, Bogazici University, Istanbul, Turkey
Sociability is essential for modern robots to increase their acceptability in human environments.
Traditional techniques use manually engineered utility functions inspired by observing pedes-
trian behaviors to achieve social navigation. However, social aspects of navigation are diverse,
changing across dierent types of environments, societies, and population densities, making it
unrealistic to use hand-crafted techniques in each domain. This paper presents a data-driven
navigation architecture that uses state-of-the-art neural architectures, namely Conditional Neu-
ral Processes, to learn global and local controllers of the mobile robot from observations.
Additionally, we leverage a state-of-the-art, deep prediction mechanism to detect situations
not similar to the trained ones, where reactive controllers step in to ensure safe navigation.
Our results demonstrate that the proposed framework can successfully carry out navigation
tasks regarding social norms in the data. Further, we showed that our system produces fewer
personal-zone violations, causing less discomfort.
Keywords: social navigation, path planning, conditional neural process, data-driven control,
random network distillation, generative adversarial networks, hybrid navigation architecture
Introduction
Researchers have been studying mobile robot navigation
for decades. Many notable techniques have been proposed in
this area over the years, such as (Burgard et al., 1999; Nour-
bakhsh, Kunz, & Willeke, 2003; Thrun et al., 2000), where
safety and robustness features have been prioritized. In other
words, the principal driving factor behind the development
in this field has been collision avoidance (Fox, Burgard, &
Thrun, 1997). On the other hand, as humans start to share
their environment with robots, new requirements for mobile
robot navigation have emerged.
Physical and mental aspects of safety were separately
evaluated in Nonaka, Inoue, Arai, and Mae (2004). This sep-
aration reveals the need to question the psychological e-
ciency of the navigation systems of mobile robots. To ensure
the smooth integration of robots into human environments,
these systems must be social and as natural and understand-
able to humans as they are safe.
Kruse, Pandey, Alami, and Kirsch (2013) defines social
navigation as navigating the robot in such a way that min-
imizes the annoyance that its motion produces. Eorts to
decrease the annoyance or anxiety of the pedestrians inter-
acting with a navigating robot can be included in the social
navigation domain. To this end, the majority of the studies
in the literature target to increase the comfortableness of the
interaction. Fong, Nourbakhsh, and Dautenhahn (2003) as-
serts that people find it more comforting to interact with ma-
chines same the same way they interact with other people.
Therefore, many researchers have aimed to decrease the dis-
comfort that robot navigation generates by replicating human
navigation with mobile robots.
The studies in the literature that aim to imitate human be-
havior in mobile robot navigation fall into two categories:
manually coded controllers and learning-based ones. Man-
ually coded controllers rely on hand-crafted optimization
functions to resemble the motion of robots to that of hu-
mans. One of the notable studies of this category is the So-
cial Force Model (SFM) (Helbing & Molnar, 1995). Based
on behavioral techniques from the social sciences, SFM sug-
gests that pedestrians move under the eect of specific ab-
stract forces, just like particles in an electric field. While the
navigational goal attracts the pedestrian, obstacles and other
people exert repulsive forces. Despite its wide application
(Farina, Fontanelli, Garulli, Giannitrapani, & Prattichizzo,
2017; Ferrer, Garrell, & Sanfeliu, 2013; Zanlungo, Ikeda,
& Kanda, 2011), some researchers argue that hand-crafted
models have limited applicability in controlled environments
(Vasquez, Okal, & Arras, 2014) and that they are not gen-
eral and applicable to dierent, varying social environments,
especially during avoidance maneuvers (Kretzschmar, Spies,
Sprunk, & Burgard, 2016). In real-world scenarios, social
compliance of robot navigation requires adaptability. Kud-
erer, Kretzschmar, Sprunk, and Burgard (2012) proposed the
use of data-driven approaches to create such adaptive con-
trollers. Researchers have used numerous machine learning
algorithms to create better adaptive, socially compliant navi-
gation frameworks. One of the most popular algorithms is In-
verse Reinforcement Learning (IRL) (Kim & Pineau, 2016;
arXiv:2210.03582v2 [cs.RO] 26 Dec 2022
2YILDIRIM AND UGUR
Kitani, Ziebart, Bagnell, & Hebert, 2012; Kuderer et al.,
2012; Vasquez et al., 2014). Given perfect expert demonstra-
tions, IRL attempts to identify the underlying reward struc-
ture, which can be used by any Reinforcement Learning (RL)
algorithm to create a human-aware navigation policy. The
advantage of this approach is that the reward function is not
manually determined but is a linear combination of a set of
predefined features. However, the linearity assumption is
considered a strong assumption in Wulfmeier, Ondruska, and
Posner (2015).
Nonlinear rewards can better describe complex behaviors
in many real-world problems (Levine, Popovic, & Koltun,
2011). Researchers have been using deep learning tech-
niques to leverage this potential in social navigation. In
Chen, Everett, Liu, and How (2017), Deep Reinforcement
Learning was used to obtain a socially plausible navigation
policy. As with other RL approaches, this procedure relies
on a predefined reward. Similarly, Wulfmeier et al. (2015)
extracted nonlinear rewards, assuming that the features shap-
ing the reward function are known. On the other hand, Im-
itation Learning attempts to learn policies directly from the
data, relaxing assumptions about the reward or its features.
In Tai, Zhang, Liu, and Burgard (2018) and Gupta, Johnson,
Fei-Fei, Savarese, and Alahi (2018), Generative Adversarial
Networks were used for direct policy learning from demon-
strations. These approaches provided advanced solutions to
overcome the limitations mentioned above. However, these
models required too much data for training (Che, Okamura,
& Sadigh, 2020). On the other hand, learning from a small
data set and generalizing to new configurations are desirable.
We also observe that most of the studies on the social nav-
igation domain target only designing/learning the local con-
trollers of the robots since they are responsible for producing
motion commands. However, using only the local controller
makes the robot vulnerable to the local minima problems
(Koren & Borenstein, 1991), and might fail to navigate the
robot to its target position. Today, typical robotic navigation
systems adopt the two-layered hierarchical approach for path
planning tasks (Orebäck & Christensen, 2003). A robot first
calculates a trajectory in the so-called global planning phase
given an environment. Then, the robot follows the computed
trajectory with a controller in the so-called local planning
phase.
This paper proposes a novel approach built on top of state-
of-the-art neural network architecture, namely Conditional
Neural Processes (CNPs) (Garnelo et al., 2018). Given mul-
tiple demonstrations of a task, CNPs can encode complex
multi-modal trajectories. CNPs extract prior knowledge di-
rectly from training data by sampling observations and pre-
dicting a conditional distribution over any other target points.
Taking advantage of these capabilities, we extended CNPs
in two dimensions: to generate complete navigation trajec-
tories in the global planning phase and to generate goal-
directed behaviors while actively avoiding pedestrians in the
local planning phase. At both levels, our approaches produce
trajectories that show the characteristics of the demonstrated
ones. They can learn complex, nonlinear, and temporal re-
lationships associated with external parameters and goals.
Like other neural network-based learning systems, our sys-
tem may fail to generate trajectories or control signals when
it faces very dierent situations from the experienced ones,
i.e., when it is required to extrapolate to novel situations out-
side the training range. To detect and react to conditions that
may lead to failure, we propose continuously monitoring the
environment using a failure prediction system, detecting situ-
ations outside the training range, and falling back to a hand-
crafted reactive controller in case extrapolation is detected.
We verified our system in a simulated mobile robot in dier-
ent environments with static and moving pedestrians.
In the rest of this paper, we first give a literature review
explaining the concepts used throughout this study. Then, we
introduce our architecture in detail and elaborate on each sys-
tem module. Later, we present our experiments and results.
Finally, we conclude with a summary and future directions.
Related Work
Hybrid Path Planning
Traditionally, approaches to solving the path planning
problem can be divided into two categories based on the en-
vironmental knowledge used: deliberate and reactive plan-
ning (Orebäck & Christensen, 2003). Deliberate planners use
environmental knowledge through static maps and compute
the robot’s trajectory before execution. On the other hand,
reactive planners rely on sensory information to deal with
obstacles in a local frame around the robot. Both approaches
have advantages and disadvantages. Aected by the hybrid
deliberate/reactive paradigm (Dudek & Jenkin, 2010), a hy-
brid controller combining the two approaches has become a
well-established approach to solving the path planning prob-
lem in recent years, (Murphy, 2019). In the following, we
provide an overview of the two important building blocks of
typical hybrid path planners, global and local planners, and
the concept of social navigation.
Global Path Planning
In the first phase of standard hierarchical path planning,
a global planning procedure is applied. On the static map
of the environment, the task of a global planner is to create
a path from the starting position to the destination. Global
planners use utility functions to assign costs to the possible
navigation trajectories they find. The use of utility functions
allows the global planner to choose the trajectory with the
desired properties, such as optimal length or time.
Prior to social navigation improvements, utility functions
overlook the social aspects of the navigation trajectories of
SOCIAL NAVIGATION WITH CONDITIONAL NEURAL PROCESSES 3
robots. On the other hand, the trajectories with optimal
physical properties may not be preferred from a social point
of view. The utility function of a more socially competent
global planner optimizes the trajectories with respect to the
social norms that humans follow (Kruse et al., 2013).
Conventionally, many graph search algorithms have been
applied to compute the trajectory between initial and tar-
get configurations, the most popular being A* explained in
Kambhampati and Davis (1986). For a complete list of
global planning approaches, see Giesbrecht (2004). Global
planning itself is not sucient to navigate the robot between
two points. Local planning is required to create velocity
commands appropriate for the case of new or dynamic ob-
stacles.
Local Path Planning
The local planning procedures are used in the second
phase of hierarchical path planning to realize the computed
trajectories. The main objective of the local planner is to
generate velocity commands to allow the robot to move be-
tween the checkpoints of the precomputed trajectory. In ad-
dition, it is the task of the local planner to avoid the obsta-
cles near the robot using sensory information about the en-
vironment of the robot. Avoiding obstacles requires a reac-
tive control paradigm since it is impossible to consciously
plan for dynamic obstacles such as humans or other robots.
There are many local planning algorithms in the literature,
such as Borenstein, Koren, et al. (1991); Fox et al. (1997);
Khatib (1985); Rosmann, Homann, and Bertram (2015);
Vadakkepat, Tan, and Ming-Liang (2000); Zhu, Yan, and
Xing (2006). For a complete list, see Cai, Wang, Cheng,
De Silva, and Meng (2020).
Many traditional local planners are well suited for this
task from a safety point of view. On the other hand, the
traditional controllers do not consider social norms despite
providing physical safety. They view people as obstacles to
be avoided. Even though creating social plans at the global
planning level increases the social aspect of navigation, it is
essential to implement social maneuvers when the robot en-
counters a pedestrian. Recent attempts to create local plan-
ners that take these norms into account paved the way for
more socially compliant local controllers (Ferrer et al., 2013;
Kim & Pineau, 2016; Kretzschmar et al., 2016; Vasquez et
al., 2014).
Social Navigation
Social Navigation implies the exhibition of socially com-
petent behaviors during the navigation of the robot. The non-
verbal interaction caused by the navigation of the robot may
be improved by the integration of social and cultural norms
that people follow. Kruse et al. (2013) describe the benefits
of social navigation as follows: it increases the comfort of
the people around the robot, improves the naturalness of the
Figure 1
Comparison between the regular and social navigation. On
the left, the robot passes between a group of two pedestrians,
taking an energy-ecient trajectory. This behavior has the
disadvantage of disturbing the people encountered on the
way. The navigation trajectory on the right prioritizes the
people’s comfort instead of eciency. Therefore, it is less
disturbing and expected from socially compliant robots.
robotic platform, and enhances the sociability of the robot.
The concept of social navigation lies in the intersection of
two fields: navigation and human-robot interaction. Figure
1 presents a comparison of the purely safe approach with a
socially compliant version. A robot with a perfectly safe nav-
igation plan might disregard the importance of the comfort
level of the encountered pedestrians, such as the one on the
left. In contrast, although non-optimal, the executed naviga-
tion trajectory on the right is more socially compliant since
it cares about the comfort level of the people around.
The early works in the domain are influenced by studies
in social sciences. The concept of proxemics, introduced in
Hall (1966), defines abstract social zones around the people
and provides a basis for many studies in socially-compliant
robot navigation (Asghari Oskoei, Walters, & Dautenhahn,
2010; Huang, Li, & Fu, 2010; Lam, Chou, Chang, & Fu,
2010; Mead, Atrash, & Matari´
c, 2011; Syrdal, Koay, Walters,
& Dautenhahn, 2007). Although these pioneer studies relied
on the predefined set of rules to achieve social navigation,
they drew attention to the subject and made it more popular.
Another important study in social sciences is the Helbing
and Molnar (1995) where researchers introduce the Social
Force Model (SFM) to explain the navigational behaviors of
4YILDIRIM AND UGUR
Figure 2
The overview of the proposed navigation pipeline. The Data-Driven Navigation System is composed of four modules:
Data-Driven Global Controller, Data-Driven Local Controller, Failure Prediction, and Hand-Crafted Reactive Controller.
When a navigation task is given to the robot, the Data-Driven Global Controller generates a continuous navigation trajectory.
On this trajectory, via-points are created, which must be reached one by one by the Data-Driven Local Controller. These
modules are data-driven and are prone to extrapolation errors. Such cases are detected by the Failure Prediction Module,
which transfers the control of the mobile robot to the Hand-Crafted Reactive Controller temporarily.
humans. They define a set of functions to calculate the local
movements of pedestrians to reach a global goal. The sim-
plicity of the SFM model causes many researchers in robotics
to adopt the approach to move the robots as humans do, (Fa-
rina et al., 2017; Ferrer et al., 2013; Zanlungo et al., 2011).
As of late, data-driven approaches have become more
prevalent in explaining human behavior since they are more
adaptable to many situations. Many researchers apply In-
verse Reinforcement Learning (IRL) to calculate a reward
function that describes the navigational behavior of the hu-
man, (Kim & Pineau, 2016; Kitani et al., 2012; Vasquez et
al., 2014). Another stream of work uses Deep Reinforcement
Learning to generate human-aware robot navigation, (Chen
et al., 2017). These studies assume the availability of either
the reward or the features that compose the reward. To re-
lax this assumption, approaches that learn the social norms
merely from the data became more popular.
In Alahi et al. (2016), researchers use networks with
Long-Short Term Memory cells (LSTMs). Later, an im-
proved version of this study is presented in Gupta et al.
(2018), where researchers use LSTMs inside a generative
adversarial setting. These studies successfully predict hu-
man navigation trajectory in a limited local frame. Despite
being considerably close to the social navigation domain, the
trajectory prediction methods in these studies cannot be di-
rectly applied to mobile robots. A robot placed in a real-
world environment has to meet serious time-complexity con-
siderations. Moreover, these approaches are limited to lo-
cal frames, while path-planning on mobile robots requires a
global navigation goal (Latombe, 1991).
Data-driven approaches generally process navigation tra-
jectories in datasets to realize social navigation. However, as
mentioned in Mavrogiannis et al. (2021), the scarcity of es-
tablished datasets has been a significant issue in the domain.
Until recently, many studies in the literature (e.g. Traut-
man and Krause (2010); Vemula, Muelling, and Oh (2018))
摘要:

LearningSocialNavigationfromDemonstrationswithConditionalNeuralProcessesYigitYildirimandEmreUgurComputerEngineeringDepartment,BogaziciUniversity,Istanbul,TurkeySociabilityisessentialformodernrobotstoincreasetheiracceptabilityinhumanenvironments.Traditionaltechniquesusemanuallyengineeredutilityfuncti...

展开>> 收起<<
Learning Social Navigation from Demonstrations with Conditional Neural Processes Yigit Yildirim and Emre Ugur.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:5.09MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注