
Deep reinforcement learning oriented for real world dynamic scenarios
Diego Mart´
ınez, Luis Riazuelo and Luis Montano1
Abstract— Autonomous navigation in dynamic environments
is a complex but essential task for autonomous robots. Re-
cent deep reinforcement learning approaches show promising
results to solve the problem, but it is not solved yet, as they
typically assume no robot kinodynamic restrictions, holonomic
movement or perfect environment knowledge. Moreover, most
algorithms fail in the real world due to the inability to generate
real-world training data for the huge variability of possible
scenarios. In this work, we present a novel planner, DQN-
DOVS, that uses deep reinforcement learning on a descriptive
robocentric velocity space model to navigate in highly dynamic
environments. It is trained using a smart curriculum learning
approach on a simulator that faithfully reproduces the real
world, reducing the gap between the reality and simulation.
We test the resulting algorithm in scenarios with different
number of obstacles and compare it with many state-of-the-
art approaches, obtaining a better performance. Finally, we try
the algorithm in a ground robot, using the same setup as in
the simulation experiments.
I. INTRODUCTION
Motion planning and navigation in dynamic scenarios is a
complex problem that has not a defined solution. Traditional
planners fail in environments where the map is mutable
or obstacles are dynamic, leading to suboptimal trajectories
or collisions. Those planners typically consider only the
current obstacles’ position measured by the sensors, without
considering the future trajectories they may have.
New approaches that try to solve this issue include promis-
ing learning based methods. Nevertheless, they do not work
properly in the real world: They do not consider robot
kinodynamic constraints, only consider dynamic obstacles
or assume perfect knowledge of the environment. Moreover,
they would need huge real-world data to train the algorithms
for the real world, and generating it is not possible.
We propose a planner that is able to navigate through
dynamic and hybrid real-world environments. The planner
is based on the Dynamic Object Velocity Space (DOVS)
model, presented in [1], which reflects the scenario dy-
namism information. In that work, the kinodynamics of
the robot and the obstacles of the environment are used
to establish the feasible velocities of the robot that do not
lead to a collision. In our approach, the DOVS model is
used in a new planner called DQN-DOVS, which utilizes
deep reinforcement learning techniques. The planner applies
the rich information provided by the DOVS as an input,
taking advantage over other approaches that use raw sensor
measurements and are not able to generalize. Once the agent
1The authors are with the Robotics, Perception and Real
Time Group, Aragon Institute of Engineering Research
(I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain.
diegomartinez,riazuelo,montano@unizar.es
Fig. 1. Scenario of a robot with static and dynamic obstacles and the RVIZ
visualization of the scenario sensed.
learns how to interpret the DOVS, it is able to navigate in any
scenario (it does not need real-world data of a huge variety of
scenarios); and the training weights learned in the simulated
world work as well in real-world environments without fine-
tuning. In addition, it uses a dynamic window in the robot
velocity space to define the set of actions available keeping
robot kinodymanics.
The DQN-DOVS algorithm is trained and tested in a real-
world simulator, where all information is extracted from the
sensor measurements, even the own robot localization. A
comparison of the model with other planners of the state-of-
the-art is also provided, as well as other experiments working
with a real robot, like in Figure 1.
II. BACKGROUND
A. Related work
Motion planners of static and continuous environments
may not be used to deal with dynamic obstacles, as they lead
to collisions and suboptimal trajectories. Some traditional
approaches for dynamic environments include artificial po-
tential fields [2], are probability based [3] or use a reciprocal
collision avoidance (ORCA) [4].
A big group of works are velocity space based. The
Velocity Obstacle (VO), introduced in [5], refers to the set
of velocities of the robot that could lead to collide with
an obstacle that has a certain velocity in the near-future,
which should not be chosen. Based on the VO concept, the
Dynamic Object Velocity Space (DOVS) is defined in [1]
as the velocity-time space for non-holonomic robots, which
includes the unsafe robot velocities for all the obstacles and
the time to collision for computing safe robot velocities in
a time horizon. In the work, a planner based on strategies
is also defined, the S-DOVS. A planner based on basic
reinforcement learning on top of the DOVS is also proposed
in [6], making decisions based on Q-values stored in tables.
arXiv:2210.11392v1 [cs.RO] 20 Oct 2022