Occlusion-Aware Crowd Navigation Using People as Sensors Ye-Ji Mun1 Masha Itkina2 Shuijing Liu1 and Katherine Driggs-Campbell1 Abstract Autonomous navigation in crowded spaces poses a

2025-05-02 0 0 4.72MB 8 页 10玖币
侵权投诉
Occlusion-Aware Crowd Navigation Using People as Sensors
Ye-Ji Mun1, Masha Itkina2, Shuijing Liu1, and Katherine Driggs-Campbell1
Abstract Autonomous navigation in crowded spaces poses a
challenge for mobile robots due to the highly dynamic, partially
observable environment. Occlusions are highly prevalent in such
settings due to a limited sensor field of view and obstruct-
ing human agents. Previous work has shown that observed
interactive behaviors of human agents can be used to estimate
potential obstacles despite occlusions. We propose integrating
such social inference techniques into the planning pipeline.
We use a variational autoencoder with a specially designed
loss function to learn representations that are meaningful for
occlusion inference. This work adopts a deep reinforcement
learning approach to incorporate the learned representation
into occlusion-aware planning. In simulation, our occlusion-
aware policy achieves comparable collision avoidance perfor-
mance to fully observable navigation by estimating agents in
occluded spaces. We demonstrate successful policy transfer
from simulation to the real-world Turtlebot 2i. To the best
of our knowledge, this work is the first to use social occlusion
inference for crowd navigation. Our implementation is available
at https://github.com/yejimun/PaS_CrowdNav.
I. INTRODUCTION
Navigating in a pedestrian-rich environment is an im-
portant yet challenging problem for a mobile robot due
to deficiencies in perception. In cluttered settings, spatial
occlusions are inevitable due to obstructing human agents
and a limited sensor field of view (FOV). Existing crowd
navigation methods often neglect occlusions and assume
complete knowledge of the environment is provided [1],
[2]. When deployed in the real-world, these algorithms only
consider the detected or observed human agents for collision
avoidance. As a result, collisions may occur when occluded
human agents suddenly emerge on the robot’s path. However,
under similar limitations, humans can safely navigate as they
instinctively reason about potential risks. Humans are able to
complement their limited sensing capabilities using insights
from their past experiences as well as their understanding
of social norms (e.g. keeping an appropriate distance from
others) [3]. Similar to humans, planning policies should be
able to intelligently make inferences in occluded regions to
safely navigate partially observable environments.
Previous literature in autonomous driving has proposed
successful occlusion-aware planning algorithms [4], [5], but
the setting considered is an inherently structured environment
such as an intersection. In crowd navigation, the mobility
of human agents is unrestricted resulting in highly diverse
This project was supported in part by the Ford-Stanford Alliance and a
gift from Mercedes-Benz Research & Development North America, and in
part by the National Science Foundation under Grant No. 2143435.
1Ye-Ji Mun, Shuijing Liu, and Katherine Driggs-Campbell are with
the Electrical and Computer Engineering Department, University of
Illinois at Urbana-Champaign, USA. Email: {yejimun2, sliu105,
krdc}@illinois.edu.
2Masha Itkina is with the Aeronautics and Astronautics Department,
Stanford University, USA. Email: {mitkina}@stanford.edu.
Fig. 1: Turtlebot2i reasoning about a potential occluded human
(red circle) based on interactive behaviors of the observed humans.
Our PaS inferred OGM in occluded regions is shown on the right.
behaviors [6], and, thus, making occlusion reasoning more
challenging. Prior works [7]–[10] demonstrate that missing
environmental information can be inferred by observing other
people’s interactive behaviors. For example, a human slowing
down or stopping abruptly may imply the presence of an
obstacle in its path as humans tend to follow the principle
of least action and keep their speed constant [3].
Our work is inspired by a growing body of literature on
social inference. Afolabi et al. [9] first coin the term ‘People
as Sensors’ (PaS) and demonstrate how occluded pedestrians
can be inferred by observing human drivers’ reactions. This
work employs occupancy grid maps (OGMs) [11] for rep-
resenting agents and the environment as they do not require
prior environment knowledge and can handle an arbitrary
number of agents in the scene [12]. Itkina et al. [8] scale
the PaS framework to multi-agent social inference in driving
scenes by posing the occlusion inference task as a sensor
fusion problem. In this work, we explore the use of PaS in
unstructured crowd navigation settings to estimate the loca-
tion of occluded, freely traversing human agents. We also go
beyond inference by integrating the social inference features
into planning and analyzing how our enhanced perception
pipeline can improve collision avoidance strategies.
We propose incorporating this social inference mechanism
into a deep reinforcement learning (RL) algorithm for robust
navigation in a partially observable, crowded environment.
We train our policy network end-to-end with an occlusion
inference module to augment the incomplete perception.
For occlusion inference, we employ a variational autoen-
coder (VAE) [13] architecture to encode interactions between
human agents into a low-dimensional latent space using
specialized loss terms. The RL policy network takes the
latent representation as input, which enables the robot to
proactively avoid occluded agents. Simulation results show
significant improvement in partially observable navigation
arXiv:2210.00552v3 [cs.RO] 28 Apr 2023
with our occlusion inference technique. We demonstrate
successful policy transfer to the real-world Turtlebot2i.
Contributions: (1) We propose a deep RL framework for
map-based crowd navigation that can make occlusion-aware
action plans for a partially observable, cluttered environment.
(2) We integrate a VAE into the deep RL algorithm that is
trained using specialized loss terms to extract features for oc-
clusion inference. (3) We demonstrate that the joint learning
of the occlusion inference and path planning modules results
in targeted map estimation that can handle temporary and
long-term occlusions enabling proactive collision avoidance.
II. RELATED WORKS
Occlusion Inference: Occlusion inference strategies must
be adapted to the occlusion type (i.e. partial vs. full and
temporary vs. persistent) and the nature of the environment.
Several studies use semantic segmentation to inpaint the
unobserved portions of partially occluded objects [14], [15].
During temporary occlusions, previously observed objects
can be hallucinated from memory using recurrent neural
networks (RNNs) and skip-connections [16], [17]. Wang et
al. [18] hallucinate static objects using a long short-term
memory (LSTM) [19] network and an auxiliary matching
loss. Inspired by this approach, we also incorporate a match-
ing loss, but our algorithm performs high-level reasoning for
dynamic humans in the presence of long-term occlusions.
A new line of work proposed reasoning about persistently
fully occluded dynamic agents using the reactive behaviors
of observed human agents [7]–[9], [20]. Amirian et al. [20]
extract statistical patterns from past observations to estimate
the probability of human occupancy in occluded regions of
crowded scenes. Afolabi et al. [9] infer the presence of an
occluded pedestrian in a crosswalk from the reactive behav-
iors of an observed driver. Itkina et al. [8] generalize this idea
to multiple drivers as ‘sensors’ by employing sensor fusion
techniques. We also use the social behaviors of human agents
to inform occlusion inference of temporarily and persistently
fully occluded agents. We incorporate the interactive features
in an RL framework to improve navigation.
Planning Under Occlusions: A partially observable
Markov decision process (POMDP) [21] is often used to
explicitly consider hidden states when planning under occlu-
sions [4], [22]. However, these approaches require the num-
ber of occluded agents to be pre-specified, and are intractable
with a large number of agents. Deep RL methods have the
capacity to capture complex features without requiring prior
knowledge of the environment. Liang et al. [23] demonstrate
sim-to-real steering in densely crowded scenes using deep
RL. To handle occlusions, the robot learns to make sharp
turns to avoid suddenly emerging pedestrians from occluded
regions. We present a means to anticipate such occluded
agents using observed social behaviors in crowds, resulting in
smoother robot trajectories. Wang et al. [24] construct a deep
RL algorithm to achieve 3D map-based robot navigation in
static, occluded environments. Following this line of work,
we propose a map-based deep RL approach that handles
occlusions, while navigating highly dynamic environments.
Crowd Navigation: Classical crowd navigation techniques
like social force models [3] and velocity-based methods [1],
[25], [26] follow predefined reaction rules to avoid collisions
(e.g. taking the right side of the path to avoid other agents).
However, these reaction-based approaches can be short
sighted and over-simplify pedestrian strategies for collision
avoidance [27], [28]. Other works perform long horizon ob-
stacle avoidance by first predicting human agent trajectories
and then finding a feasible path that safely avoids the human
agents [29]–[31]. These trajectory-based methods are known
to suffer from the robot freezing problem in dense crowds
where a feasible path may not be found. Learning-based
approaches have been shown to more closely imitate human-
like behaviors by learning implicit features that encode social
behaviors [27]. Pair-wise interactions between agents are
often learned to reason about a dynamic environment and
perform collision avoidance [2], [32]. In such methods, the
complexity grows with the number of agents in the scene.
Additionally, only visible, fully detected agents are typically
considered. In our algorithm, we employ OGMs to compactly
represent an arbitrary number of agents and learn the mutual
influence between agents simultaneously.
III. PROBLEM STATEMENT
We consider a crowd navigation task where a mobile robot
encounters occlusions caused by some agents obstructing
other agents from view or by a limited FOV. The robot’s goal
is to safely avoid all nearby human agents despite limited
visibility and efficiently navigate to its target location.
We formulate the partially observable interactions between
agents as a model-free RL problem with continuous state and
action spaces, Sand A. At each time t, the robot in state
stStakes an action atAgiven an observation ot∈ O.
The policy π:O → Adirectly maps the observed state ot
to an action atthat maximizes the future discounted return:
Vπ(st) =
X
k=t
γkR(sk, ak, s0
k),(1)
where R(s, a, s0)is the reward function and γis the discount
factor. We assume that the human agents’ movements are
not influenced by the robot. This assumption is common for
crowd navigation as it prevents the robot from achieving
collision avoidance effortlessly (i.e. the human agents cir-
cumvent the robot while the robot marches straight toward its
goal) [32]. Since our aim in this work is to investigate if the
robot can employ occlusion inference to prevent collisions
in occluded settings, this assumption encourages the robot to
actively reason about the presence of occluded agents.
We employ OGMs to represent the environment map
surrounding the robot from a bird’s-eye view as shown
in Fig. 2. As collisions are unlikely to occur with distant
agents, we consider a local OGM around the robot for policy
learning. We generate two local OGMs centered around the
robot at time t: a ground-truth OGM Gt∈ {0,1}H×Wand
an observation OGM Ot∈ {0,0.5,1}H×W, where Hand
Ware the OGM height and width, respectively. The ground-
truth OGM Gtcaptures the true occupancy information for
all visible and occluded obstacles, as indicated with free (0)
摘要:

Occlusion-AwareCrowdNavigationUsingPeopleasSensorsYe-JiMun1,MashaItkina2,ShuijingLiu1,andKatherineDriggs-Campbell1Abstract—Autonomousnavigationincrowdedspacesposesachallengeformobilerobotsduetothehighlydynamic,partiallyobservableenvironment.Occlusionsarehighlyprevalentinsuchsettingsduetoalimitedsens...

展开>> 收起<<
Occlusion-Aware Crowd Navigation Using People as Sensors Ye-Ji Mun1 Masha Itkina2 Shuijing Liu1 and Katherine Driggs-Campbell1 Abstract Autonomous navigation in crowded spaces poses a.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:4.72MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注