Deep reinforcement learning oriented for real world dynamic scenarios Diego Mart ınez Luis Riazuelo and Luis Montano1 Abstract Autonomous navigation in dynamic environments

2025-05-06 0 0 4.01MB 6 页 10玖币

侵权投诉

Deep reinforcement learning oriented for real world dynamic scenarios

Diego Mart´

ınez, Luis Riazuelo and Luis Montano1

Abstract— Autonomous navigation in dynamic environments

is a complex but essential task for autonomous robots. Re-

cent deep reinforcement learning approaches show promising

results to solve the problem, but it is not solved yet, as they

typically assume no robot kinodynamic restrictions, holonomic

movement or perfect environment knowledge. Moreover, most

algorithms fail in the real world due to the inability to generate

real-world training data for the huge variability of possible

scenarios. In this work, we present a novel planner, DQN-

DOVS, that uses deep reinforcement learning on a descriptive

robocentric velocity space model to navigate in highly dynamic

environments. It is trained using a smart curriculum learning

approach on a simulator that faithfully reproduces the real

world, reducing the gap between the reality and simulation.

We test the resulting algorithm in scenarios with different

number of obstacles and compare it with many state-of-the-

art approaches, obtaining a better performance. Finally, we try

the algorithm in a ground robot, using the same setup as in

the simulation experiments.

I. INTRODUCTION

Motion planning and navigation in dynamic scenarios is a

complex problem that has not a deﬁned solution. Traditional

planners fail in environments where the map is mutable

or obstacles are dynamic, leading to suboptimal trajectories

or collisions. Those planners typically consider only the

current obstacles’ position measured by the sensors, without

considering the future trajectories they may have.

New approaches that try to solve this issue include promis-

ing learning based methods. Nevertheless, they do not work

properly in the real world: They do not consider robot

kinodynamic constraints, only consider dynamic obstacles

or assume perfect knowledge of the environment. Moreover,

they would need huge real-world data to train the algorithms

for the real world, and generating it is not possible.

We propose a planner that is able to navigate through

dynamic and hybrid real-world environments. The planner

is based on the Dynamic Object Velocity Space (DOVS)

model, presented in [1], which reﬂects the scenario dy-

namism information. In that work, the kinodynamics of

the robot and the obstacles of the environment are used

to establish the feasible velocities of the robot that do not

lead to a collision. In our approach, the DOVS model is

used in a new planner called DQN-DOVS, which utilizes

deep reinforcement learning techniques. The planner applies

the rich information provided by the DOVS as an input,

taking advantage over other approaches that use raw sensor

measurements and are not able to generalize. Once the agent

1The authors are with the Robotics, Perception and Real

Time Group, Aragon Institute of Engineering Research

(I3A), Universidad de Zaragoza, 50018 Zaragoza, Spain.

diegomartinez,riazuelo,montano@unizar.es

Fig. 1. Scenario of a robot with static and dynamic obstacles and the RVIZ

visualization of the scenario sensed.

learns how to interpret the DOVS, it is able to navigate in any

scenario (it does not need real-world data of a huge variety of

scenarios); and the training weights learned in the simulated

world work as well in real-world environments without ﬁne-

tuning. In addition, it uses a dynamic window in the robot

velocity space to deﬁne the set of actions available keeping

robot kinodymanics.

The DQN-DOVS algorithm is trained and tested in a real-

world simulator, where all information is extracted from the

sensor measurements, even the own robot localization. A

comparison of the model with other planners of the state-of-

the-art is also provided, as well as other experiments working

with a real robot, like in Figure 1.

II. BACKGROUND

A. Related work

Motion planners of static and continuous environments

may not be used to deal with dynamic obstacles, as they lead

to collisions and suboptimal trajectories. Some traditional

approaches for dynamic environments include artiﬁcial po-

tential ﬁelds [2], are probability based [3] or use a reciprocal

collision avoidance (ORCA) [4].

A big group of works are velocity space based. The

Velocity Obstacle (VO), introduced in [5], refers to the set

of velocities of the robot that could lead to collide with

an obstacle that has a certain velocity in the near-future,

which should not be chosen. Based on the VO concept, the

Dynamic Object Velocity Space (DOVS) is deﬁned in [1]

as the velocity-time space for non-holonomic robots, which

includes the unsafe robot velocities for all the obstacles and

the time to collision for computing safe robot velocities in

a time horizon. In the work, a planner based on strategies

is also deﬁned, the S-DOVS. A planner based on basic

reinforcement learning on top of the DOVS is also proposed

in [6], making decisions based on Q-values stored in tables.

arXiv:2210.11392v1 [cs.RO] 20 Oct 2022

Reinforcement learning is a method used to learn to

estimate the optimal policy that optimizes the cumulative

reward obtained in an episode. In [7], the Q-values are

estimated with a deep neural network, deﬁning the ﬁrst Deep

Q-Network (DQN). Many extensions have been proposed

to this original algorithm. Some works have proven the

best performances of the state-of-the-art, including DQN

with multiple modiﬁcations [8], distributed reinforcement

learning [9] or actor-critic methods [10]. The study presented

in [11] shows that combining reinforcement learning with

curriculum learning, could give useful results, specially to

learn problems that could be too difﬁcult to learn from

scratch.

Some works offer analysis of the importance of reinforce-

ment learning in robot motion planning and the limitations

of traditional planners in dynamic environments. Deﬁning

strategies for every situation that may be found in the real

world is intractable, and reinforcement learning may be used

to solve the decision-making problem, which is complex

and has many degrees of freedom. [12] proposes a deep

reinforcement learning model that takes as the input of the

model LIDAR measurements and the position of the goal,

obtaining better results than conventional planners.

The work described in [13] (SARL) simulates a crowd

and try to make the robot anticipate the crowd interactions

with other robots and with each others, comparing its method

with ORCA [4] and other two deep reinforcement learning

methods: CADRL [14] and LSTM-RL [15]. ORCA fails in

this crowded environment due to the need of the reciprocal

assumption, and CADRL fails because it does not take into

account the whole crowd, just a single pair for interaction.

In the environment presented, both SARL and LSTM-RL

have the best performance. In all of these approaches,

the simulators used are non-realistic and no kinodynamic

restrictions are considered.

An example of an approach that considers the restrictions

is [16], which combines deep reinforcement learning with

DWA [17], but only achieving a success rate of 0.54 in sparse

dynamic scenarios.

B. Dynamic Object Velocity Space (DOVS)

The DOVS model presented in [1], which models the

dynamism of the environment, is used as a basis of this

work. To build the model, the robot size is reduced to a

point and the obstacles are enlarged with the robot radius

(the ﬁnal collision areas are the same). The area swept by

each moving obstacle (collision band) is computed using

the trajectory of the obstacles, which is assumed to be

known or estimated from the sensor information. Then, the

maximum and minimum velocities that can avoid a collision

with that obstacle are calculated, repeating the process for

every obstacle. Using that information with maximum and

minimum velocities of the robot the limits of the Dynamic

Object Velocity (DOV) are obtained. The key of this model is

that the set of free velocities is recomputed in every time step,

so the obstacles’ trajectory estimation needs to be precise

only for the next few time steps.

Linear velocity

Angular velocity

Goal distance

Goal angle

Obstacle angle

Obstacle linear

velocity

Obstacle motion

angle

Obstacle distance

400

Fig. 2. Representation of the state of the agent. The DOVS is in the red

square and the extracted velocity grid below, concatenated to other 8 robot

variables to construct the 408 elements input vector of the learning system.

The velocities in the DOV are unsafe, as they lead to

collisions in a time horizon, while the rest of the velocities

are available to navigate. Velocities inside the DOV could

be chosen if the following commands lead to a free velocity.

The navigation in that space is achieved using a dynamic

window that considers robot kinodynamics. The DOVS is

built including all the information in the robot velocity space,

and it may be represented as seen in Figure 2, inside a

red square. Linear velocities vare in the Y-axis, angular

velocities ωin the X-axis, DOV are in black, the green

rhombus is the dynamic window centered around the robot

current velocity, the green line represents the velocities

(ω, v)that lead to the goal following a circular trajectory

(radius =v/ω), and the big black triangle the differential-

drive kinematic restriction (the robot may not go at maximum

linear and angular velocity at the same time). In this way,

all the information about the dynamism of the environment

and the own robot needed for the robot motion planning is

modeled.

C. Contribution

The works presented in the state of the art have some

limitations when they are to be applied in real environments.

Some of them use only the raw sensor measurements as the

input or use some processed information, like obstacle posi-

tion and velocities. The main problem with those approaches

is the impossibility to generate appropriate real-world train-

ing data. They are trained in non-realistic simulators, and

in different real world scenarios they would not do what to

do. Furthermore, only few approaches do not use holonomic

robots, and even less consider robot kinodynamic restrictions.

The contribution of this work is a deep reinforcement

learning motion planner that:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DeepreinforcementlearningorientedforrealworlddynamicscenariosDiegoMart´nez,LuisRiazueloandLuisMontano1AbstractAutonomousnavigationindynamicenvironmentsisacomplexbutessentialtaskforautonomousrobots.Re-centdeepreinforcementlearningapproachesshowpromisingresultstosolvetheproblem,butitisnotsolvedyet,a...

展开>> 收起<<

Deep reinforcement learning oriented for real world dynamic scenarios Diego Mart ınez Luis Riazuelo and Luis Montano1 Abstract Autonomous navigation in dynamic environments.pdf

共6页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Deep reinforcement learning oriented for real world dynamic scenarios Diego Mart ınez Luis Riazuelo and Luis Montano1 Abstract Autonomous navigation in dynamic environments

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: