Scenario-based Evaluation of Prediction Models for Automated Vehicles Manuel Muñoz Sánchez1Jos Elfring2Emilia Silvas3and René van de Molengraft1 Abstract To operate safely an automated vehicle A V must

2025-04-24 0 0 1.52MB 7 页 10玖币
侵权投诉
Scenario-based Evaluation of Prediction Models for Automated Vehicles
Manuel Muñoz Sánchez1Jos Elfring2Emilia Silvas3and René van de Molengraft1
Abstract To operate safely, an automated vehicle (AV) must
anticipate how the environment around it will evolve. For that
purpose, it is important to know which prediction models are
most appropriate for every situation. Currently, assessment of
prediction models is often performed over a set of trajectories
without distinction of the type of movement they capture, result-
ing in the inability to determine the suitability of each model for
different situations. In this work we illustrate how standardized
evaluation methods result in wrong conclusions regarding a
model’s predictive capabilities, preventing a clear assessment
of prediction models and potentially leading to dangerous on-
road situations. We argue that following evaluation practices
in safety assessment for AVs, assessment of prediction models
should be performed in a scenario-based fashion. To encourage
scenario-based assessment of prediction models and illustrate
the dangers of improper assessment, we categorize trajectories
of the Waymo Open Motion dataset according to the type of
movement they capture. Next, three different models are thor-
oughly evaluated for different trajectory types and prediction
horizons. Results show that common evaluation methods are
insufficient and the assessment should be performed depending
on the application in which the model will operate.
I. INTRODUCTION
Automated vehicles (AVs) have become popular in recent
years since they have the potential to increase road safety,
efficiency and comfort [1]–[3]. To operate safely, an AV must
accurately anticipate the future motion of other road users
(RUs) in its surroundings [4]. To build trajectory prediction
models, deep learning (DL) techniques are gaining attention
[5], since they can effectively learn complex interactions
between different RUs [6], [7] and the road infrastructure
[8], [9] from past observations to produce more accurate
predictions. Traditionally, training these models effectively
was a problematic task since the amount of data required was
not easily available. However, this issue has been alleviated
in recent years with the release of several large public
datasets [10]–[14]. A common practice to assess a model’s
predictive accuracy is to consider a fraction of the dataset
reserved for this purpose (commonly referred to as test
data), and to compare the model’s predictions with the real
trajectories. The output of prediction models may vary, hence
different metrics exist to quantify the disparity between the
real and predicted trajectories [4]. For example, some models
This work was supported by SAFE-UP under EU’s Horizon 2020 research
and innovation programme, grant agreement 861570.
1Manuel Muñoz Sánchez, Emilia Silvas, Jos Elfring and René van de
Molengraft are with the Department of Mechanical Engineering, Eindhoven
University of Technology, Eindhoven, The Netherlands.
2Jos Elfring is also with the Product Unit Autonomous Driving, TomTom,
Amsterdam, The Netherlands.
3Emilia Silvas is also with the Department of Integrated Vehicle Safety,
TNO, Helmond, The Netherlands.
True Trajectory
Accurate Prediction
Inaccurate Prediction
A
J
B
E
F
D
G
H
I
K
C
Fig. 1. Example where a model that is accurate on average fails to predict
a pedestrian trajectory, leading to a dangerous situation.
produce a single prediction, while others produce a set of
feasible trajectories and associated confidence for each.
Despite the existence of various evaluation metrics for
prediction models, several challenges remain unaddressed in
current evaluation practices, such as the inability of these
metrics to capture a model’s robustness or generalization
capabilities [5]. Perhaps the most severe shortcoming is that
all trajectories are considered equal for error computation
despite capturing significantly different behaviors, which
can lead to dangerours situations due to misjudgement of
a model’s suitability for specific situations. For instance,
consider the situation shown in Fig. 1, where an AV (A)
predicts the future trajectory of surrounding RUs (B-K) in a
crowded urban scenario. Current evaluation practices would
deem this model suitable for RU trajectory prediction in
crowded urban scenarios, since its predictions are highly
accurate on average. It accurately predicts pedestrians on the
sidewalk (B-D), crossing at designated crossings (E,F), and
lane-following cyclists and vehicles (G-I). However, in this
example only a few of these RUs are relevant to the AV (I,
J). Additionally, failure cases like the pedestrians crossing at
non-designated crossings (J, K) can go unnoticed since all
trajectories are considered equally for error computation.
The importance of a thorough evaluation for different
types of trajectories has been recognized previously [15].
However, current efforts to improve evaluation of prediction
models focus mainly on interactions between pedestrians
(e.g. collision-avoidance [15]), and disregard interactions of
RUs with the road infrastructure (e.g. pedestrian stops at
a red traffic light). Additionally, the evaluation procedure
should provide a transparent assessment of a model’s suit-
ability for the intended application. For instance, for AVs, an
inaccurate prediction for a pedestrian walking in front of the
vehicle should be considered more important or severe than
one of a pedestrian that is walking behind the vehicle or far
arXiv:2210.06553v1 [cs.AI] 11 Oct 2022
Scenario
parametrization
& categories
Prediction
performance
metrics
Vehicle level
performance
metrics
Risk estimation,
planning, control,
actuation
Real driving
datasets
Application ODD
Event
detection
Automated driving functions
Sensor, fusion
and world
models
Driving datasets
Scenario database for assessment
…..CV
Prediction
models
.
..
..
Scenario based safety & performance assessment
CNN LSTM
X
Labelling & Road
users trajectories
Fig. 2. Overview of a scenario-based assessment pipeline. The orange arrow indicates the standard approach to evaluate prediction models. Green blocks
concern prediction-specific activities. Blue blocks are currently lacking but required steps for thorough evaluation of prediction models. White blocks are
relevant for vehicle level assessment, but out of the scope of this work.
from the road. As a second example, consider the purpose of
vehicle fuel and energy optimization, where accurate long-
term predictions are required for optimal path planning. On
the contrary, for the development of emergency advanced
driver-assistance system features (e.g. emergency braking or
emergency steering) accurate short-term predictions become
more relevant. Thus, it is important to assess the suitability
of models with respect to the functional applications they
will be used in, in other words, their operational design
domain (ODD), and therein test for various scenarios and
their overall impact with vehicle level performance metrics,
not only prediction metrics (Fig. 2).
Current evaluation practices reporting averaged errors over
all predicted trajectories are beneficial for ease of comparison
between different models. If a model achieves a lower error
over all the trajectories in the dataset, one can confidently say
such a model is more accurate, at least on average. However,
it remains unclear under which circumstances this model is
preferred over others. To show that an improved assessment
of prediction models is needed, and working towards that
goal, the contribution of our work is as follows:
1) We illustrate the extent to which common evaluation
methods, which only report average errors over all tra-
jectories, result in misleading conclusions of a model’s
predictive capabilities, and argue that a scenario-based
assessment is a more suitable approach.
2) We facilitate scenario-based evaluation of prediction
models providing an open source framework1, which
will allow for a transparent evaluation of a model’s
capabilities for different situations, leading to an optimal
choice of prediction model depending on the applica-
tion.
The remainder of this article is structured as follows.
Section II introduces common trajectory prediction metrics
and datasets, and presents related work on scenario-based
evaluation. Section III introduces the prediction models
compared and outlines how the comparison will be done
using standard evaluation practices. Section IV presents an
analysis of the results, and Section V concludes the work
and highlights future improvements.
1Code available at https://github.com/manolotis/SBEP
II. PRELIMINARIES
This section summarizes the most commonly used perfor-
mance metrics, recent datasets used to develop AV applica-
tions, and related work on scenario-based evaluation.
A. Common Trajectory Prediction Metrics
A plethora of performance indicators exist to evaluate
trajectory prediction models [4], with average displacement
error (ADE) and final displacement error (FDE) being the
most popular [5]. ADE measures the difference between
the predicted and ground truth trajectories, averaged over
all prediction horizons. FDE measures this difference at
a specific horizon. To allow comparison of deterministic
models that produce a single trajectory with probabilistic
models that produce multiple feasible trajectories, variants
of these metrics are used which report the errors of the
trajectory that achieved the best accuracy. These variants are
commonly referred to as minADE and minFDE. Although
these metrics have several limitations [4] and new metrics
have been introduced recently to address some of these
limitations [10], we use them in this work since they remain
the most common performance indicators at the moment.
To formally define these metrics, let ˆ
Sdenote a set of
trajectory predictions for a set of road users Nat future
prediction horizons T. The minADE of the predictions for
prediction horizon tis given by
minADE(ˆ
S, t) = X
nN
min
ˆsˆ
SnX
t0T
t0t
kˆstsn
tk2
|N|×|T|,(1)
where ˆ
Sndenotes the set of predictions for a road user n,
ˆstdenotes the predicted position at time t, and sn
tdenotes
the true position of road user nat time t. Additionally, |.|
denotes the size of a set and k.k2denotes the L2-norm of a
vector. Similarly, the minFDE at a given prediction horizon
tis defined as
minFDE(ˆ
S, t) = X
nN
min
ˆsˆ
Sn
kˆstsn
tk2
|N|.(2)
B. Recent Datasets & Waymo’s Motion Prediction Challenge
Several large datasets are publicly available for develop-
ment and evaluation of prediction models, with some of the
most recent and often used ones as summarized in Table I.
摘要:

Scenario-basedEvaluationofPredictionModelsforAutomatedVehiclesManuelMuñozSánchez1JosElfring2EmiliaSilvas3andRenévandeMolengraft1Abstract—Tooperatesafely,anautomatedvehicle(AV)mustanticipatehowtheenvironmentarounditwillevolve.Forthatpurpose,itisimportanttoknowwhichpredictionmodelsaremostappropriatefo...

展开>> 收起<<
Scenario-based Evaluation of Prediction Models for Automated Vehicles Manuel Muñoz Sánchez1Jos Elfring2Emilia Silvas3and René van de Molengraft1 Abstract To operate safely an automated vehicle A V must.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:1.52MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注