Occlusion-Aware Crowd Navigation Using People as Sensors Ye-Ji Mun1 Masha Itkina2 Shuijing Liu1 and Katherine Driggs-Campbell1 Abstract Autonomous navigation in crowded spaces poses a

2025-05-02 0 0 4.72MB 8 页 10玖币

侵权投诉

Occlusion-Aware Crowd Navigation Using People as Sensors

Ye-Ji Mun1, Masha Itkina2, Shuijing Liu1, and Katherine Driggs-Campbell1

Abstract— Autonomous navigation in crowded spaces poses a

challenge for mobile robots due to the highly dynamic, partially

observable environment. Occlusions are highly prevalent in such

settings due to a limited sensor ﬁeld of view and obstruct-

ing human agents. Previous work has shown that observed

interactive behaviors of human agents can be used to estimate

potential obstacles despite occlusions. We propose integrating

such social inference techniques into the planning pipeline.

We use a variational autoencoder with a specially designed

loss function to learn representations that are meaningful for

occlusion inference. This work adopts a deep reinforcement

learning approach to incorporate the learned representation

into occlusion-aware planning. In simulation, our occlusion-

aware policy achieves comparable collision avoidance perfor-

mance to fully observable navigation by estimating agents in

occluded spaces. We demonstrate successful policy transfer

from simulation to the real-world Turtlebot 2i. To the best

of our knowledge, this work is the ﬁrst to use social occlusion

inference for crowd navigation. Our implementation is available

at https://github.com/yejimun/PaS_CrowdNav.

I. INTRODUCTION

Navigating in a pedestrian-rich environment is an im-

portant yet challenging problem for a mobile robot due

to deﬁciencies in perception. In cluttered settings, spatial

occlusions are inevitable due to obstructing human agents

and a limited sensor ﬁeld of view (FOV). Existing crowd

navigation methods often neglect occlusions and assume

complete knowledge of the environment is provided [1],

[2]. When deployed in the real-world, these algorithms only

consider the detected or observed human agents for collision

avoidance. As a result, collisions may occur when occluded

human agents suddenly emerge on the robot’s path. However,

under similar limitations, humans can safely navigate as they

instinctively reason about potential risks. Humans are able to

complement their limited sensing capabilities using insights

from their past experiences as well as their understanding

of social norms (e.g. keeping an appropriate distance from

others) [3]. Similar to humans, planning policies should be

able to intelligently make inferences in occluded regions to

safely navigate partially observable environments.

Previous literature in autonomous driving has proposed

successful occlusion-aware planning algorithms [4], [5], but

the setting considered is an inherently structured environment

such as an intersection. In crowd navigation, the mobility

of human agents is unrestricted resulting in highly diverse

This project was supported in part by the Ford-Stanford Alliance and a

gift from Mercedes-Benz Research & Development North America, and in

part by the National Science Foundation under Grant No. 2143435.

1Ye-Ji Mun, Shuijing Liu, and Katherine Driggs-Campbell are with

the Electrical and Computer Engineering Department, University of

Illinois at Urbana-Champaign, USA. Email: {yejimun2, sliu105,

krdc}@illinois.edu.

2Masha Itkina is with the Aeronautics and Astronautics Department,

Stanford University, USA. Email: {mitkina}@stanford.edu.

Fig. 1: Turtlebot2i reasoning about a potential occluded human

(red circle) based on interactive behaviors of the observed humans.

Our PaS inferred OGM in occluded regions is shown on the right.

behaviors [6], and, thus, making occlusion reasoning more

challenging. Prior works [7]–[10] demonstrate that missing

environmental information can be inferred by observing other

people’s interactive behaviors. For example, a human slowing

down or stopping abruptly may imply the presence of an

obstacle in its path as humans tend to follow the principle

of least action and keep their speed constant [3].

Our work is inspired by a growing body of literature on

social inference. Afolabi et al. [9] ﬁrst coin the term ‘People

as Sensors’ (PaS) and demonstrate how occluded pedestrians

can be inferred by observing human drivers’ reactions. This

work employs occupancy grid maps (OGMs) [11] for rep-

resenting agents and the environment as they do not require

prior environment knowledge and can handle an arbitrary

number of agents in the scene [12]. Itkina et al. [8] scale

the PaS framework to multi-agent social inference in driving

scenes by posing the occlusion inference task as a sensor

fusion problem. In this work, we explore the use of PaS in

unstructured crowd navigation settings to estimate the loca-

tion of occluded, freely traversing human agents. We also go

beyond inference by integrating the social inference features

into planning and analyzing how our enhanced perception

pipeline can improve collision avoidance strategies.

We propose incorporating this social inference mechanism

into a deep reinforcement learning (RL) algorithm for robust

navigation in a partially observable, crowded environment.

We train our policy network end-to-end with an occlusion

inference module to augment the incomplete perception.

For occlusion inference, we employ a variational autoen-

coder (VAE) [13] architecture to encode interactions between

human agents into a low-dimensional latent space using

specialized loss terms. The RL policy network takes the

latent representation as input, which enables the robot to

proactively avoid occluded agents. Simulation results show

signiﬁcant improvement in partially observable navigation

arXiv:2210.00552v3 [cs.RO] 28 Apr 2023

with our occlusion inference technique. We demonstrate

successful policy transfer to the real-world Turtlebot2i.

Contributions: (1) We propose a deep RL framework for

map-based crowd navigation that can make occlusion-aware

action plans for a partially observable, cluttered environment.

(2) We integrate a VAE into the deep RL algorithm that is

trained using specialized loss terms to extract features for oc-

clusion inference. (3) We demonstrate that the joint learning

of the occlusion inference and path planning modules results

in targeted map estimation that can handle temporary and

long-term occlusions enabling proactive collision avoidance.

II. RELATED WORKS

Occlusion Inference: Occlusion inference strategies must

be adapted to the occlusion type (i.e. partial vs. full and

temporary vs. persistent) and the nature of the environment.

Several studies use semantic segmentation to inpaint the

unobserved portions of partially occluded objects [14], [15].

During temporary occlusions, previously observed objects

can be hallucinated from memory using recurrent neural

networks (RNNs) and skip-connections [16], [17]. Wang et

al. [18] hallucinate static objects using a long short-term

memory (LSTM) [19] network and an auxiliary matching

loss. Inspired by this approach, we also incorporate a match-

ing loss, but our algorithm performs high-level reasoning for

dynamic humans in the presence of long-term occlusions.

A new line of work proposed reasoning about persistently

fully occluded dynamic agents using the reactive behaviors

of observed human agents [7]–[9], [20]. Amirian et al. [20]

extract statistical patterns from past observations to estimate

the probability of human occupancy in occluded regions of

crowded scenes. Afolabi et al. [9] infer the presence of an

occluded pedestrian in a crosswalk from the reactive behav-

iors of an observed driver. Itkina et al. [8] generalize this idea

to multiple drivers as ‘sensors’ by employing sensor fusion

techniques. We also use the social behaviors of human agents

to inform occlusion inference of temporarily and persistently

fully occluded agents. We incorporate the interactive features

in an RL framework to improve navigation.

Planning Under Occlusions: A partially observable

Markov decision process (POMDP) [21] is often used to

explicitly consider hidden states when planning under occlu-

sions [4], [22]. However, these approaches require the num-

ber of occluded agents to be pre-speciﬁed, and are intractable

with a large number of agents. Deep RL methods have the

capacity to capture complex features without requiring prior

knowledge of the environment. Liang et al. [23] demonstrate

sim-to-real steering in densely crowded scenes using deep

RL. To handle occlusions, the robot learns to make sharp

turns to avoid suddenly emerging pedestrians from occluded

regions. We present a means to anticipate such occluded

agents using observed social behaviors in crowds, resulting in

smoother robot trajectories. Wang et al. [24] construct a deep

RL algorithm to achieve 3D map-based robot navigation in

static, occluded environments. Following this line of work,

we propose a map-based deep RL approach that handles

occlusions, while navigating highly dynamic environments.

Crowd Navigation: Classical crowd navigation techniques

like social force models [3] and velocity-based methods [1],

[25], [26] follow predeﬁned reaction rules to avoid collisions

(e.g. taking the right side of the path to avoid other agents).

However, these reaction-based approaches can be short

sighted and over-simplify pedestrian strategies for collision

avoidance [27], [28]. Other works perform long horizon ob-

stacle avoidance by ﬁrst predicting human agent trajectories

and then ﬁnding a feasible path that safely avoids the human

agents [29]–[31]. These trajectory-based methods are known

to suffer from the robot freezing problem in dense crowds

where a feasible path may not be found. Learning-based

approaches have been shown to more closely imitate human-

like behaviors by learning implicit features that encode social

behaviors [27]. Pair-wise interactions between agents are

often learned to reason about a dynamic environment and

perform collision avoidance [2], [32]. In such methods, the

complexity grows with the number of agents in the scene.

Additionally, only visible, fully detected agents are typically

considered. In our algorithm, we employ OGMs to compactly

represent an arbitrary number of agents and learn the mutual

inﬂuence between agents simultaneously.

III. PROBLEM STATEMENT

We consider a crowd navigation task where a mobile robot

encounters occlusions caused by some agents obstructing

other agents from view or by a limited FOV. The robot’s goal

is to safely avoid all nearby human agents despite limited

visibility and efﬁciently navigate to its target location.

We formulate the partially observable interactions between

agents as a model-free RL problem with continuous state and

action spaces, Sand A. At each time t, the robot in state

st∈Stakes an action at∈Agiven an observation ot∈ O.

The policy π:O → Adirectly maps the observed state ot

to an action atthat maximizes the future discounted return:

Vπ(st) =

∞

k=t

γkR(sk, ak, s0

k),(1)

where R(s, a, s0)is the reward function and γis the discount

factor. We assume that the human agents’ movements are

not inﬂuenced by the robot. This assumption is common for

crowd navigation as it prevents the robot from achieving

collision avoidance effortlessly (i.e. the human agents cir-

cumvent the robot while the robot marches straight toward its

goal) [32]. Since our aim in this work is to investigate if the

robot can employ occlusion inference to prevent collisions

in occluded settings, this assumption encourages the robot to

actively reason about the presence of occluded agents.

We employ OGMs to represent the environment map

surrounding the robot from a bird’s-eye view as shown

in Fig. 2. As collisions are unlikely to occur with distant

agents, we consider a local OGM around the robot for policy

learning. We generate two local OGMs centered around the

robot at time t: a ground-truth OGM Gt∈ {0,1}H×Wand

an observation OGM Ot∈ {0,0.5,1}H×W, where Hand

Ware the OGM height and width, respectively. The ground-

truth OGM Gtcaptures the true occupancy information for

all visible and occluded obstacles, as indicated with free (0)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Occlusion-AwareCrowdNavigationUsingPeopleasSensorsYe-JiMun1,MashaItkina2,ShuijingLiu1,andKatherineDriggs-Campbell1AbstractAutonomousnavigationincrowdedspacesposesachallengeformobilerobotsduetothehighlydynamic,partiallyobservableenvironment.Occlusionsarehighlyprevalentinsuchsettingsduetoalimitedsens...

展开>> 收起<<

Occlusion-Aware Crowd Navigation Using People as Sensors Ye-Ji Mun1 Masha Itkina2 Shuijing Liu1 and Katherine Driggs-Campbell1 Abstract Autonomous navigation in crowded spaces poses a.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Occlusion-Aware Crowd Navigation Using People as Sensors Ye-Ji Mun1 Masha Itkina2 Shuijing Liu1 and Katherine Driggs-Campbell1 Abstract Autonomous navigation in crowded spaces poses a

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: