with our occlusion inference technique. We demonstrate
successful policy transfer to the real-world Turtlebot2i.
Contributions: (1) We propose a deep RL framework for
map-based crowd navigation that can make occlusion-aware
action plans for a partially observable, cluttered environment.
(2) We integrate a VAE into the deep RL algorithm that is
trained using specialized loss terms to extract features for oc-
clusion inference. (3) We demonstrate that the joint learning
of the occlusion inference and path planning modules results
in targeted map estimation that can handle temporary and
long-term occlusions enabling proactive collision avoidance.
II. RELATED WORKS
Occlusion Inference: Occlusion inference strategies must
be adapted to the occlusion type (i.e. partial vs. full and
temporary vs. persistent) and the nature of the environment.
Several studies use semantic segmentation to inpaint the
unobserved portions of partially occluded objects [14], [15].
During temporary occlusions, previously observed objects
can be hallucinated from memory using recurrent neural
networks (RNNs) and skip-connections [16], [17]. Wang et
al. [18] hallucinate static objects using a long short-term
memory (LSTM) [19] network and an auxiliary matching
loss. Inspired by this approach, we also incorporate a match-
ing loss, but our algorithm performs high-level reasoning for
dynamic humans in the presence of long-term occlusions.
A new line of work proposed reasoning about persistently
fully occluded dynamic agents using the reactive behaviors
of observed human agents [7]–[9], [20]. Amirian et al. [20]
extract statistical patterns from past observations to estimate
the probability of human occupancy in occluded regions of
crowded scenes. Afolabi et al. [9] infer the presence of an
occluded pedestrian in a crosswalk from the reactive behav-
iors of an observed driver. Itkina et al. [8] generalize this idea
to multiple drivers as ‘sensors’ by employing sensor fusion
techniques. We also use the social behaviors of human agents
to inform occlusion inference of temporarily and persistently
fully occluded agents. We incorporate the interactive features
in an RL framework to improve navigation.
Planning Under Occlusions: A partially observable
Markov decision process (POMDP) [21] is often used to
explicitly consider hidden states when planning under occlu-
sions [4], [22]. However, these approaches require the num-
ber of occluded agents to be pre-specified, and are intractable
with a large number of agents. Deep RL methods have the
capacity to capture complex features without requiring prior
knowledge of the environment. Liang et al. [23] demonstrate
sim-to-real steering in densely crowded scenes using deep
RL. To handle occlusions, the robot learns to make sharp
turns to avoid suddenly emerging pedestrians from occluded
regions. We present a means to anticipate such occluded
agents using observed social behaviors in crowds, resulting in
smoother robot trajectories. Wang et al. [24] construct a deep
RL algorithm to achieve 3D map-based robot navigation in
static, occluded environments. Following this line of work,
we propose a map-based deep RL approach that handles
occlusions, while navigating highly dynamic environments.
Crowd Navigation: Classical crowd navigation techniques
like social force models [3] and velocity-based methods [1],
[25], [26] follow predefined reaction rules to avoid collisions
(e.g. taking the right side of the path to avoid other agents).
However, these reaction-based approaches can be short
sighted and over-simplify pedestrian strategies for collision
avoidance [27], [28]. Other works perform long horizon ob-
stacle avoidance by first predicting human agent trajectories
and then finding a feasible path that safely avoids the human
agents [29]–[31]. These trajectory-based methods are known
to suffer from the robot freezing problem in dense crowds
where a feasible path may not be found. Learning-based
approaches have been shown to more closely imitate human-
like behaviors by learning implicit features that encode social
behaviors [27]. Pair-wise interactions between agents are
often learned to reason about a dynamic environment and
perform collision avoidance [2], [32]. In such methods, the
complexity grows with the number of agents in the scene.
Additionally, only visible, fully detected agents are typically
considered. In our algorithm, we employ OGMs to compactly
represent an arbitrary number of agents and learn the mutual
influence between agents simultaneously.
III. PROBLEM STATEMENT
We consider a crowd navigation task where a mobile robot
encounters occlusions caused by some agents obstructing
other agents from view or by a limited FOV. The robot’s goal
is to safely avoid all nearby human agents despite limited
visibility and efficiently navigate to its target location.
We formulate the partially observable interactions between
agents as a model-free RL problem with continuous state and
action spaces, Sand A. At each time t, the robot in state
st∈Stakes an action at∈Agiven an observation ot∈ O.
The policy π:O → Adirectly maps the observed state ot
to an action atthat maximizes the future discounted return:
Vπ(st) =
∞
X
k=t
γkR(sk, ak, s0
k),(1)
where R(s, a, s0)is the reward function and γis the discount
factor. We assume that the human agents’ movements are
not influenced by the robot. This assumption is common for
crowd navigation as it prevents the robot from achieving
collision avoidance effortlessly (i.e. the human agents cir-
cumvent the robot while the robot marches straight toward its
goal) [32]. Since our aim in this work is to investigate if the
robot can employ occlusion inference to prevent collisions
in occluded settings, this assumption encourages the robot to
actively reason about the presence of occluded agents.
We employ OGMs to represent the environment map
surrounding the robot from a bird’s-eye view as shown
in Fig. 2. As collisions are unlikely to occur with distant
agents, we consider a local OGM around the robot for policy
learning. We generate two local OGMs centered around the
robot at time t: a ground-truth OGM Gt∈ {0,1}H×Wand
an observation OGM Ot∈ {0,0.5,1}H×W, where Hand
Ware the OGM height and width, respectively. The ground-
truth OGM Gtcaptures the true occupancy information for
all visible and occluded obstacles, as indicated with free (0)