Learning Social Navigation from Demonstrations with Conditional Neural Processes Yigit Yildirim and Emre Ugur

2025-04-29 0 0 5.09MB 19 页 10玖币

侵权投诉

Learning Social Navigation from Demonstrations with Conditional

Neural Processes

Yigit Yildirim and Emre Ugur

Computer Engineering Department, Bogazici University, Istanbul, Turkey

Sociability is essential for modern robots to increase their acceptability in human environments.

Traditional techniques use manually engineered utility functions inspired by observing pedes-

trian behaviors to achieve social navigation. However, social aspects of navigation are diverse,

changing across diﬀerent types of environments, societies, and population densities, making it

unrealistic to use hand-crafted techniques in each domain. This paper presents a data-driven

navigation architecture that uses state-of-the-art neural architectures, namely Conditional Neu-

ral Processes, to learn global and local controllers of the mobile robot from observations.

Additionally, we leverage a state-of-the-art, deep prediction mechanism to detect situations

not similar to the trained ones, where reactive controllers step in to ensure safe navigation.

Our results demonstrate that the proposed framework can successfully carry out navigation

tasks regarding social norms in the data. Further, we showed that our system produces fewer

personal-zone violations, causing less discomfort.

Keywords: social navigation, path planning, conditional neural process, data-driven control,

random network distillation, generative adversarial networks, hybrid navigation architecture

Introduction

Researchers have been studying mobile robot navigation

for decades. Many notable techniques have been proposed in

this area over the years, such as (Burgard et al., 1999; Nour-

bakhsh, Kunz, & Willeke, 2003; Thrun et al., 2000), where

safety and robustness features have been prioritized. In other

words, the principal driving factor behind the development

in this ﬁeld has been collision avoidance (Fox, Burgard, &

Thrun, 1997). On the other hand, as humans start to share

their environment with robots, new requirements for mobile

robot navigation have emerged.

Physical and mental aspects of safety were separately

evaluated in Nonaka, Inoue, Arai, and Mae (2004). This sep-

aration reveals the need to question the psychological eﬃ-

ciency of the navigation systems of mobile robots. To ensure

the smooth integration of robots into human environments,

these systems must be social and as natural and understand-

able to humans as they are safe.

Kruse, Pandey, Alami, and Kirsch (2013) deﬁnes social

navigation as navigating the robot in such a way that min-

imizes the annoyance that its motion produces. Eﬀorts to

decrease the annoyance or anxiety of the pedestrians inter-

acting with a navigating robot can be included in the social

navigation domain. To this end, the majority of the studies

in the literature target to increase the comfortableness of the

interaction. Fong, Nourbakhsh, and Dautenhahn (2003) as-

serts that people ﬁnd it more comforting to interact with ma-

chines same the same way they interact with other people.

Therefore, many researchers have aimed to decrease the dis-

comfort that robot navigation generates by replicating human

navigation with mobile robots.

The studies in the literature that aim to imitate human be-

havior in mobile robot navigation fall into two categories:

manually coded controllers and learning-based ones. Man-

ually coded controllers rely on hand-crafted optimization

functions to resemble the motion of robots to that of hu-

mans. One of the notable studies of this category is the So-

cial Force Model (SFM) (Helbing & Molnar, 1995). Based

on behavioral techniques from the social sciences, SFM sug-

gests that pedestrians move under the eﬀect of speciﬁc ab-

stract forces, just like particles in an electric ﬁeld. While the

navigational goal attracts the pedestrian, obstacles and other

people exert repulsive forces. Despite its wide application

(Farina, Fontanelli, Garulli, Giannitrapani, & Prattichizzo,

2017; Ferrer, Garrell, & Sanfeliu, 2013; Zanlungo, Ikeda,

& Kanda, 2011), some researchers argue that hand-crafted

models have limited applicability in controlled environments

(Vasquez, Okal, & Arras, 2014) and that they are not gen-

eral and applicable to diﬀerent, varying social environments,

especially during avoidance maneuvers (Kretzschmar, Spies,

Sprunk, & Burgard, 2016). In real-world scenarios, social

compliance of robot navigation requires adaptability. Kud-

erer, Kretzschmar, Sprunk, and Burgard (2012) proposed the

use of data-driven approaches to create such adaptive con-

trollers. Researchers have used numerous machine learning

algorithms to create better adaptive, socially compliant navi-

gation frameworks. One of the most popular algorithms is In-

verse Reinforcement Learning (IRL) (Kim & Pineau, 2016;

arXiv:2210.03582v2 [cs.RO] 26 Dec 2022

2YILDIRIM AND UGUR

Kitani, Ziebart, Bagnell, & Hebert, 2012; Kuderer et al.,

2012; Vasquez et al., 2014). Given perfect expert demonstra-

tions, IRL attempts to identify the underlying reward struc-

ture, which can be used by any Reinforcement Learning (RL)

algorithm to create a human-aware navigation policy. The

advantage of this approach is that the reward function is not

manually determined but is a linear combination of a set of

predeﬁned features. However, the linearity assumption is

considered a strong assumption in Wulfmeier, Ondruska, and

Posner (2015).

Nonlinear rewards can better describe complex behaviors

in many real-world problems (Levine, Popovic, & Koltun,

2011). Researchers have been using deep learning tech-

niques to leverage this potential in social navigation. In

Chen, Everett, Liu, and How (2017), Deep Reinforcement

Learning was used to obtain a socially plausible navigation

policy. As with other RL approaches, this procedure relies

on a predeﬁned reward. Similarly, Wulfmeier et al. (2015)

extracted nonlinear rewards, assuming that the features shap-

ing the reward function are known. On the other hand, Im-

itation Learning attempts to learn policies directly from the

data, relaxing assumptions about the reward or its features.

In Tai, Zhang, Liu, and Burgard (2018) and Gupta, Johnson,

Fei-Fei, Savarese, and Alahi (2018), Generative Adversarial

Networks were used for direct policy learning from demon-

strations. These approaches provided advanced solutions to

overcome the limitations mentioned above. However, these

models required too much data for training (Che, Okamura,

& Sadigh, 2020). On the other hand, learning from a small

data set and generalizing to new conﬁgurations are desirable.

We also observe that most of the studies on the social nav-

igation domain target only designing/learning the local con-

trollers of the robots since they are responsible for producing

motion commands. However, using only the local controller

makes the robot vulnerable to the local minima problems

(Koren & Borenstein, 1991), and might fail to navigate the

robot to its target position. Today, typical robotic navigation

systems adopt the two-layered hierarchical approach for path

planning tasks (Orebäck & Christensen, 2003). A robot ﬁrst

calculates a trajectory in the so-called global planning phase

given an environment. Then, the robot follows the computed

trajectory with a controller in the so-called local planning

phase.

This paper proposes a novel approach built on top of state-

of-the-art neural network architecture, namely Conditional

Neural Processes (CNPs) (Garnelo et al., 2018). Given mul-

tiple demonstrations of a task, CNPs can encode complex

multi-modal trajectories. CNPs extract prior knowledge di-

rectly from training data by sampling observations and pre-

dicting a conditional distribution over any other target points.

Taking advantage of these capabilities, we extended CNPs

in two dimensions: to generate complete navigation trajec-

tories in the global planning phase and to generate goal-

directed behaviors while actively avoiding pedestrians in the

local planning phase. At both levels, our approaches produce

trajectories that show the characteristics of the demonstrated

ones. They can learn complex, nonlinear, and temporal re-

lationships associated with external parameters and goals.

Like other neural network-based learning systems, our sys-

tem may fail to generate trajectories or control signals when

it faces very diﬀerent situations from the experienced ones,

i.e., when it is required to extrapolate to novel situations out-

side the training range. To detect and react to conditions that

may lead to failure, we propose continuously monitoring the

environment using a failure prediction system, detecting situ-

ations outside the training range, and falling back to a hand-

crafted reactive controller in case extrapolation is detected.

We veriﬁed our system in a simulated mobile robot in diﬀer-

ent environments with static and moving pedestrians.

In the rest of this paper, we ﬁrst give a literature review

explaining the concepts used throughout this study. Then, we

introduce our architecture in detail and elaborate on each sys-

tem module. Later, we present our experiments and results.

Finally, we conclude with a summary and future directions.

Related Work

Hybrid Path Planning

Traditionally, approaches to solving the path planning

problem can be divided into two categories based on the en-

vironmental knowledge used: deliberate and reactive plan-

ning (Orebäck & Christensen, 2003). Deliberate planners use

environmental knowledge through static maps and compute

the robot’s trajectory before execution. On the other hand,

reactive planners rely on sensory information to deal with

obstacles in a local frame around the robot. Both approaches

have advantages and disadvantages. Aﬀected by the hybrid

deliberate/reactive paradigm (Dudek & Jenkin, 2010), a hy-

brid controller combining the two approaches has become a

well-established approach to solving the path planning prob-

lem in recent years, (Murphy, 2019). In the following, we

provide an overview of the two important building blocks of

typical hybrid path planners, global and local planners, and

the concept of social navigation.

Global Path Planning

In the ﬁrst phase of standard hierarchical path planning,

a global planning procedure is applied. On the static map

of the environment, the task of a global planner is to create

a path from the starting position to the destination. Global

planners use utility functions to assign costs to the possible

navigation trajectories they ﬁnd. The use of utility functions

allows the global planner to choose the trajectory with the

desired properties, such as optimal length or time.

Prior to social navigation improvements, utility functions

overlook the social aspects of the navigation trajectories of

SOCIAL NAVIGATION WITH CONDITIONAL NEURAL PROCESSES 3

robots. On the other hand, the trajectories with optimal

physical properties may not be preferred from a social point

of view. The utility function of a more socially competent

global planner optimizes the trajectories with respect to the

social norms that humans follow (Kruse et al., 2013).

Conventionally, many graph search algorithms have been

applied to compute the trajectory between initial and tar-

get conﬁgurations, the most popular being A* explained in

Kambhampati and Davis (1986). For a complete list of

global planning approaches, see Giesbrecht (2004). Global

planning itself is not suﬃcient to navigate the robot between

two points. Local planning is required to create velocity

commands appropriate for the case of new or dynamic ob-

stacles.

Local Path Planning

The local planning procedures are used in the second

phase of hierarchical path planning to realize the computed

trajectories. The main objective of the local planner is to

generate velocity commands to allow the robot to move be-

tween the checkpoints of the precomputed trajectory. In ad-

dition, it is the task of the local planner to avoid the obsta-

cles near the robot using sensory information about the en-

vironment of the robot. Avoiding obstacles requires a reac-

tive control paradigm since it is impossible to consciously

plan for dynamic obstacles such as humans or other robots.

There are many local planning algorithms in the literature,

such as Borenstein, Koren, et al. (1991); Fox et al. (1997);

Khatib (1985); Rosmann, Hoﬀmann, and Bertram (2015);

Vadakkepat, Tan, and Ming-Liang (2000); Zhu, Yan, and

Xing (2006). For a complete list, see Cai, Wang, Cheng,

De Silva, and Meng (2020).

Many traditional local planners are well suited for this

task from a safety point of view. On the other hand, the

traditional controllers do not consider social norms despite

providing physical safety. They view people as obstacles to

be avoided. Even though creating social plans at the global

planning level increases the social aspect of navigation, it is

essential to implement social maneuvers when the robot en-

counters a pedestrian. Recent attempts to create local plan-

ners that take these norms into account paved the way for

more socially compliant local controllers (Ferrer et al., 2013;

Kim & Pineau, 2016; Kretzschmar et al., 2016; Vasquez et

al., 2014).

Social Navigation

Social Navigation implies the exhibition of socially com-

petent behaviors during the navigation of the robot. The non-

verbal interaction caused by the navigation of the robot may

be improved by the integration of social and cultural norms

that people follow. Kruse et al. (2013) describe the beneﬁts

of social navigation as follows: it increases the comfort of

the people around the robot, improves the naturalness of the

Figure 1

Comparison between the regular and social navigation. On

the left, the robot passes between a group of two pedestrians,

taking an energy-eﬃcient trajectory. This behavior has the

disadvantage of disturbing the people encountered on the

way. The navigation trajectory on the right prioritizes the

people’s comfort instead of eﬃciency. Therefore, it is less

disturbing and expected from socially compliant robots.

robotic platform, and enhances the sociability of the robot.

The concept of social navigation lies in the intersection of

two ﬁelds: navigation and human-robot interaction. Figure

1 presents a comparison of the purely safe approach with a

socially compliant version. A robot with a perfectly safe nav-

igation plan might disregard the importance of the comfort

level of the encountered pedestrians, such as the one on the

left. In contrast, although non-optimal, the executed naviga-

tion trajectory on the right is more socially compliant since

it cares about the comfort level of the people around.

The early works in the domain are inﬂuenced by studies

in social sciences. The concept of proxemics, introduced in

Hall (1966), deﬁnes abstract social zones around the people

and provides a basis for many studies in socially-compliant

robot navigation (Asghari Oskoei, Walters, & Dautenhahn,

2010; Huang, Li, & Fu, 2010; Lam, Chou, Chang, & Fu,

2010; Mead, Atrash, & Matari´

c, 2011; Syrdal, Koay, Walters,

& Dautenhahn, 2007). Although these pioneer studies relied

on the predeﬁned set of rules to achieve social navigation,

they drew attention to the subject and made it more popular.

Another important study in social sciences is the Helbing

and Molnar (1995) where researchers introduce the Social

Force Model (SFM) to explain the navigational behaviors of

4YILDIRIM AND UGUR

Figure 2

The overview of the proposed navigation pipeline. The Data-Driven Navigation System is composed of four modules:

Data-Driven Global Controller, Data-Driven Local Controller, Failure Prediction, and Hand-Crafted Reactive Controller.

When a navigation task is given to the robot, the Data-Driven Global Controller generates a continuous navigation trajectory.

On this trajectory, via-points are created, which must be reached one by one by the Data-Driven Local Controller. These

modules are data-driven and are prone to extrapolation errors. Such cases are detected by the Failure Prediction Module,

which transfers the control of the mobile robot to the Hand-Crafted Reactive Controller temporarily.

humans. They deﬁne a set of functions to calculate the local

movements of pedestrians to reach a global goal. The sim-

plicity of the SFM model causes many researchers in robotics

to adopt the approach to move the robots as humans do, (Fa-

rina et al., 2017; Ferrer et al., 2013; Zanlungo et al., 2011).

As of late, data-driven approaches have become more

prevalent in explaining human behavior since they are more

adaptable to many situations. Many researchers apply In-

verse Reinforcement Learning (IRL) to calculate a reward

function that describes the navigational behavior of the hu-

man, (Kim & Pineau, 2016; Kitani et al., 2012; Vasquez et

al., 2014). Another stream of work uses Deep Reinforcement

Learning to generate human-aware robot navigation, (Chen

et al., 2017). These studies assume the availability of either

the reward or the features that compose the reward. To re-

lax this assumption, approaches that learn the social norms

merely from the data became more popular.

In Alahi et al. (2016), researchers use networks with

Long-Short Term Memory cells (LSTMs). Later, an im-

proved version of this study is presented in Gupta et al.

(2018), where researchers use LSTMs inside a generative

adversarial setting. These studies successfully predict hu-

man navigation trajectory in a limited local frame. Despite

being considerably close to the social navigation domain, the

trajectory prediction methods in these studies cannot be di-

rectly applied to mobile robots. A robot placed in a real-

world environment has to meet serious time-complexity con-

siderations. Moreover, these approaches are limited to lo-

cal frames, while path-planning on mobile robots requires a

global navigation goal (Latombe, 1991).

Data-driven approaches generally process navigation tra-

jectories in datasets to realize social navigation. However, as

mentioned in Mavrogiannis et al. (2021), the scarcity of es-

tablished datasets has been a signiﬁcant issue in the domain.

Until recently, many studies in the literature (e.g. Traut-

man and Krause (2010); Vemula, Muelling, and Oh (2018))

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningSocialNavigationfromDemonstrationswithConditionalNeuralProcessesYigitYildirimandEmreUgurComputerEngineeringDepartment,BogaziciUniversity,Istanbul,TurkeySociabilityisessentialformodernrobotstoincreasetheiracceptabilityinhumanenvironments.Traditionaltechniquesusemanuallyengineeredutilityfuncti...

展开>> 收起<<

Learning Social Navigation from Demonstrations with Conditional Neural Processes Yigit Yildirim and Emre Ugur.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning Social Navigation from Demonstrations with Conditional Neural Processes Yigit Yildirim and Emre Ugur

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: