Deep reinforcement learning oriented for real world dynamic scenarios Diego Mart ınez Luis Riazuelo and Luis Montano1 Abstract Autonomous navigation in dynamic environments

2025-05-06 0 0 4.01MB 6 页 10玖币
侵权投诉
Deep reinforcement learning oriented for real world dynamic scenarios
Diego Mart´
ınez, Luis Riazuelo and Luis Montano1
Abstract Autonomous navigation in dynamic environments
is a complex but essential task for autonomous robots. Re-
cent deep reinforcement learning approaches show promising
results to solve the problem, but it is not solved yet, as they
typically assume no robot kinodynamic restrictions, holonomic
movement or perfect environment knowledge. Moreover, most
algorithms fail in the real world due to the inability to generate
real-world training data for the huge variability of possible
scenarios. In this work, we present a novel planner, DQN-
DOVS, that uses deep reinforcement learning on a descriptive
robocentric velocity space model to navigate in highly dynamic
environments. It is trained using a smart curriculum learning
approach on a simulator that faithfully reproduces the real
world, reducing the gap between the reality and simulation.
We test the resulting algorithm in scenarios with different
number of obstacles and compare it with many state-of-the-
art approaches, obtaining a better performance. Finally, we try
the algorithm in a ground robot, using the same setup as in
the simulation experiments.
I. INTRODUCTION
Motion planning and navigation in dynamic scenarios is a
complex problem that has not a defined solution. Traditional
planners fail in environments where the map is mutable
or obstacles are dynamic, leading to suboptimal trajectories
or collisions. Those planners typically consider only the
current obstacles’ position measured by the sensors, without
considering the future trajectories they may have.
New approaches that try to solve this issue include promis-
ing learning based methods. Nevertheless, they do not work
properly in the real world: They do not consider robot
kinodynamic constraints, only consider dynamic obstacles
or assume perfect knowledge of the environment. Moreover,
they would need huge real-world data to train the algorithms
for the real world, and generating it is not possible.
We propose a planner that is able to navigate through
dynamic and hybrid real-world environments. The planner
is based on the Dynamic Object Velocity Space (DOVS)
model, presented in [1], which reflects the scenario dy-
namism information. In that work, the kinodynamics of
the robot and the obstacles of the environment are used
to establish the feasible velocities of the robot that do not
lead to a collision. In our approach, the DOVS model is
used in a new planner called DQN-DOVS, which utilizes
deep reinforcement learning techniques. The planner applies
the rich information provided by the DOVS as an input,
taking advantage over other approaches that use raw sensor
measurements and are not able to generalize. Once the agent
1The authors are with the Robotics, Perception and Real
Time Group, Aragon Institute of Engineering Research
(I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain.
diegomartinez,riazuelo,montano@unizar.es
Fig. 1. Scenario of a robot with static and dynamic obstacles and the RVIZ
visualization of the scenario sensed.
learns how to interpret the DOVS, it is able to navigate in any
scenario (it does not need real-world data of a huge variety of
scenarios); and the training weights learned in the simulated
world work as well in real-world environments without fine-
tuning. In addition, it uses a dynamic window in the robot
velocity space to define the set of actions available keeping
robot kinodymanics.
The DQN-DOVS algorithm is trained and tested in a real-
world simulator, where all information is extracted from the
sensor measurements, even the own robot localization. A
comparison of the model with other planners of the state-of-
the-art is also provided, as well as other experiments working
with a real robot, like in Figure 1.
II. BACKGROUND
A. Related work
Motion planners of static and continuous environments
may not be used to deal with dynamic obstacles, as they lead
to collisions and suboptimal trajectories. Some traditional
approaches for dynamic environments include artificial po-
tential fields [2], are probability based [3] or use a reciprocal
collision avoidance (ORCA) [4].
A big group of works are velocity space based. The
Velocity Obstacle (VO), introduced in [5], refers to the set
of velocities of the robot that could lead to collide with
an obstacle that has a certain velocity in the near-future,
which should not be chosen. Based on the VO concept, the
Dynamic Object Velocity Space (DOVS) is defined in [1]
as the velocity-time space for non-holonomic robots, which
includes the unsafe robot velocities for all the obstacles and
the time to collision for computing safe robot velocities in
a time horizon. In the work, a planner based on strategies
is also defined, the S-DOVS. A planner based on basic
reinforcement learning on top of the DOVS is also proposed
in [6], making decisions based on Q-values stored in tables.
arXiv:2210.11392v1 [cs.RO] 20 Oct 2022
Reinforcement learning is a method used to learn to
estimate the optimal policy that optimizes the cumulative
reward obtained in an episode. In [7], the Q-values are
estimated with a deep neural network, defining the first Deep
Q-Network (DQN). Many extensions have been proposed
to this original algorithm. Some works have proven the
best performances of the state-of-the-art, including DQN
with multiple modifications [8], distributed reinforcement
learning [9] or actor-critic methods [10]. The study presented
in [11] shows that combining reinforcement learning with
curriculum learning, could give useful results, specially to
learn problems that could be too difficult to learn from
scratch.
Some works offer analysis of the importance of reinforce-
ment learning in robot motion planning and the limitations
of traditional planners in dynamic environments. Defining
strategies for every situation that may be found in the real
world is intractable, and reinforcement learning may be used
to solve the decision-making problem, which is complex
and has many degrees of freedom. [12] proposes a deep
reinforcement learning model that takes as the input of the
model LIDAR measurements and the position of the goal,
obtaining better results than conventional planners.
The work described in [13] (SARL) simulates a crowd
and try to make the robot anticipate the crowd interactions
with other robots and with each others, comparing its method
with ORCA [4] and other two deep reinforcement learning
methods: CADRL [14] and LSTM-RL [15]. ORCA fails in
this crowded environment due to the need of the reciprocal
assumption, and CADRL fails because it does not take into
account the whole crowd, just a single pair for interaction.
In the environment presented, both SARL and LSTM-RL
have the best performance. In all of these approaches,
the simulators used are non-realistic and no kinodynamic
restrictions are considered.
An example of an approach that considers the restrictions
is [16], which combines deep reinforcement learning with
DWA [17], but only achieving a success rate of 0.54 in sparse
dynamic scenarios.
B. Dynamic Object Velocity Space (DOVS)
The DOVS model presented in [1], which models the
dynamism of the environment, is used as a basis of this
work. To build the model, the robot size is reduced to a
point and the obstacles are enlarged with the robot radius
(the final collision areas are the same). The area swept by
each moving obstacle (collision band) is computed using
the trajectory of the obstacles, which is assumed to be
known or estimated from the sensor information. Then, the
maximum and minimum velocities that can avoid a collision
with that obstacle are calculated, repeating the process for
every obstacle. Using that information with maximum and
minimum velocities of the robot the limits of the Dynamic
Object Velocity (DOV) are obtained. The key of this model is
that the set of free velocities is recomputed in every time step,
so the obstacles’ trajectory estimation needs to be precise
only for the next few time steps.
Linear velocity
Angular velocity
Goal distance
Goal angle
Obstacle angle
Obstacle linear
velocity
Obstacle motion
angle
Obstacle distance
8
Fig. 2. Representation of the state of the agent. The DOVS is in the red
square and the extracted velocity grid below, concatenated to other 8 robot
variables to construct the 408 elements input vector of the learning system.
The velocities in the DOV are unsafe, as they lead to
collisions in a time horizon, while the rest of the velocities
are available to navigate. Velocities inside the DOV could
be chosen if the following commands lead to a free velocity.
The navigation in that space is achieved using a dynamic
window that considers robot kinodynamics. The DOVS is
built including all the information in the robot velocity space,
and it may be represented as seen in Figure 2, inside a
red square. Linear velocities vare in the Y-axis, angular
velocities ωin the X-axis, DOV are in black, the green
rhombus is the dynamic window centered around the robot
current velocity, the green line represents the velocities
(ω, v)that lead to the goal following a circular trajectory
(radius =v/ω), and the big black triangle the differential-
drive kinematic restriction (the robot may not go at maximum
linear and angular velocity at the same time). In this way,
all the information about the dynamism of the environment
and the own robot needed for the robot motion planning is
modeled.
C. Contribution
The works presented in the state of the art have some
limitations when they are to be applied in real environments.
Some of them use only the raw sensor measurements as the
input or use some processed information, like obstacle posi-
tion and velocities. The main problem with those approaches
is the impossibility to generate appropriate real-world train-
ing data. They are trained in non-realistic simulators, and
in different real world scenarios they would not do what to
do. Furthermore, only few approaches do not use holonomic
robots, and even less consider robot kinodynamic restrictions.
The contribution of this work is a deep reinforcement
learning motion planner that:
摘要:

DeepreinforcementlearningorientedforrealworlddynamicscenariosDiegoMart´nez,LuisRiazueloandLuisMontano1Abstract—Autonomousnavigationindynamicenvironmentsisacomplexbutessentialtaskforautonomousrobots.Re-centdeepreinforcementlearningapproachesshowpromisingresultstosolvetheproblem,butitisnotsolvedyet,a...

展开>> 收起<<
Deep reinforcement learning oriented for real world dynamic scenarios Diego Mart ınez Luis Riazuelo and Luis Montano1 Abstract Autonomous navigation in dynamic environments.pdf

共6页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:6 页 大小:4.01MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 6
客服
关注