Scenario-based Evaluation of Prediction Models for Automated Vehicles Manuel Muñoz Sánchez1Jos Elfring2Emilia Silvas3and René van de Molengraft1 Abstract To operate safely an automated vehicle A V must

2025-04-24 1 0 1.52MB 7 页 10玖币

侵权投诉

Scenario-based Evaluation of Prediction Models for Automated Vehicles

Manuel Muñoz Sánchez1Jos Elfring2Emilia Silvas3and René van de Molengraft1

Abstract— To operate safely, an automated vehicle (AV) must

anticipate how the environment around it will evolve. For that

purpose, it is important to know which prediction models are

most appropriate for every situation. Currently, assessment of

prediction models is often performed over a set of trajectories

without distinction of the type of movement they capture, result-

ing in the inability to determine the suitability of each model for

different situations. In this work we illustrate how standardized

evaluation methods result in wrong conclusions regarding a

model’s predictive capabilities, preventing a clear assessment

of prediction models and potentially leading to dangerous on-

road situations. We argue that following evaluation practices

in safety assessment for AVs, assessment of prediction models

should be performed in a scenario-based fashion. To encourage

scenario-based assessment of prediction models and illustrate

the dangers of improper assessment, we categorize trajectories

of the Waymo Open Motion dataset according to the type of

movement they capture. Next, three different models are thor-

oughly evaluated for different trajectory types and prediction

horizons. Results show that common evaluation methods are

insufﬁcient and the assessment should be performed depending

on the application in which the model will operate.

I. INTRODUCTION

Automated vehicles (AVs) have become popular in recent

years since they have the potential to increase road safety,

efﬁciency and comfort [1]–[3]. To operate safely, an AV must

accurately anticipate the future motion of other road users

(RUs) in its surroundings [4]. To build trajectory prediction

models, deep learning (DL) techniques are gaining attention

[5], since they can effectively learn complex interactions

between different RUs [6], [7] and the road infrastructure

[8], [9] from past observations to produce more accurate

predictions. Traditionally, training these models effectively

was a problematic task since the amount of data required was

not easily available. However, this issue has been alleviated

in recent years with the release of several large public

datasets [10]–[14]. A common practice to assess a model’s

predictive accuracy is to consider a fraction of the dataset

reserved for this purpose (commonly referred to as test

data), and to compare the model’s predictions with the real

trajectories. The output of prediction models may vary, hence

different metrics exist to quantify the disparity between the

real and predicted trajectories [4]. For example, some models

This work was supported by SAFE-UP under EU’s Horizon 2020 research

and innovation programme, grant agreement 861570.

1Manuel Muñoz Sánchez, Emilia Silvas, Jos Elfring and René van de

Molengraft are with the Department of Mechanical Engineering, Eindhoven

University of Technology, Eindhoven, The Netherlands.

2Jos Elfring is also with the Product Unit Autonomous Driving, TomTom,

Amsterdam, The Netherlands.

3Emilia Silvas is also with the Department of Integrated Vehicle Safety,

TNO, Helmond, The Netherlands.

True Trajectory

Accurate Prediction

Inaccurate Prediction

Fig. 1. Example where a model that is accurate on average fails to predict

a pedestrian trajectory, leading to a dangerous situation.

produce a single prediction, while others produce a set of

feasible trajectories and associated conﬁdence for each.

Despite the existence of various evaluation metrics for

prediction models, several challenges remain unaddressed in

current evaluation practices, such as the inability of these

metrics to capture a model’s robustness or generalization

capabilities [5]. Perhaps the most severe shortcoming is that

all trajectories are considered equal for error computation

despite capturing signiﬁcantly different behaviors, which

can lead to dangerours situations due to misjudgement of

a model’s suitability for speciﬁc situations. For instance,

consider the situation shown in Fig. 1, where an AV (A)

predicts the future trajectory of surrounding RUs (B-K) in a

crowded urban scenario. Current evaluation practices would

deem this model suitable for RU trajectory prediction in

crowded urban scenarios, since its predictions are highly

accurate on average. It accurately predicts pedestrians on the

sidewalk (B-D), crossing at designated crossings (E,F), and

lane-following cyclists and vehicles (G-I). However, in this

example only a few of these RUs are relevant to the AV (I,

J). Additionally, failure cases like the pedestrians crossing at

non-designated crossings (J, K) can go unnoticed since all

trajectories are considered equally for error computation.

The importance of a thorough evaluation for different

types of trajectories has been recognized previously [15].

However, current efforts to improve evaluation of prediction

models focus mainly on interactions between pedestrians

(e.g. collision-avoidance [15]), and disregard interactions of

RUs with the road infrastructure (e.g. pedestrian stops at

a red trafﬁc light). Additionally, the evaluation procedure

should provide a transparent assessment of a model’s suit-

ability for the intended application. For instance, for AVs, an

inaccurate prediction for a pedestrian walking in front of the

vehicle should be considered more important or severe than

one of a pedestrian that is walking behind the vehicle or far

arXiv:2210.06553v1 [cs.AI] 11 Oct 2022

Scenario

parametrization

& categories

Prediction

performance

metrics

Vehicle level

performance

metrics

Risk estimation,

planning, control,

actuation

Real driving

datasets

Application ODD

Event

detection

Automated driving functions

Sensor, fusion

and world

models

Driving datasets

Scenario database for assessment

…..CV

Prediction

models

Scenario based safety & performance assessment

CNN LSTM

Labelling & Road

users trajectories

Fig. 2. Overview of a scenario-based assessment pipeline. The orange arrow indicates the standard approach to evaluate prediction models. Green blocks

concern prediction-speciﬁc activities. Blue blocks are currently lacking but required steps for thorough evaluation of prediction models. White blocks are

relevant for vehicle level assessment, but out of the scope of this work.

from the road. As a second example, consider the purpose of

vehicle fuel and energy optimization, where accurate long-

term predictions are required for optimal path planning. On

the contrary, for the development of emergency advanced

driver-assistance system features (e.g. emergency braking or

emergency steering) accurate short-term predictions become

more relevant. Thus, it is important to assess the suitability

of models with respect to the functional applications they

will be used in, in other words, their operational design

domain (ODD), and therein test for various scenarios and

their overall impact with vehicle level performance metrics,

not only prediction metrics (Fig. 2).

Current evaluation practices reporting averaged errors over

all predicted trajectories are beneﬁcial for ease of comparison

between different models. If a model achieves a lower error

over all the trajectories in the dataset, one can conﬁdently say

such a model is more accurate, at least on average. However,

it remains unclear under which circumstances this model is

preferred over others. To show that an improved assessment

of prediction models is needed, and working towards that

goal, the contribution of our work is as follows:

1) We illustrate the extent to which common evaluation

methods, which only report average errors over all tra-

jectories, result in misleading conclusions of a model’s

predictive capabilities, and argue that a scenario-based

assessment is a more suitable approach.

2) We facilitate scenario-based evaluation of prediction

models providing an open source framework1, which

will allow for a transparent evaluation of a model’s

capabilities for different situations, leading to an optimal

choice of prediction model depending on the applica-

tion.

The remainder of this article is structured as follows.

Section II introduces common trajectory prediction metrics

and datasets, and presents related work on scenario-based

evaluation. Section III introduces the prediction models

compared and outlines how the comparison will be done

using standard evaluation practices. Section IV presents an

analysis of the results, and Section V concludes the work

and highlights future improvements.

1Code available at https://github.com/manolotis/SBEP

II. PRELIMINARIES

This section summarizes the most commonly used perfor-

mance metrics, recent datasets used to develop AV applica-

tions, and related work on scenario-based evaluation.

A. Common Trajectory Prediction Metrics

A plethora of performance indicators exist to evaluate

trajectory prediction models [4], with average displacement

error (ADE) and ﬁnal displacement error (FDE) being the

most popular [5]. ADE measures the difference between

the predicted and ground truth trajectories, averaged over

all prediction horizons. FDE measures this difference at

a speciﬁc horizon. To allow comparison of deterministic

models that produce a single trajectory with probabilistic

models that produce multiple feasible trajectories, variants

of these metrics are used which report the errors of the

trajectory that achieved the best accuracy. These variants are

commonly referred to as minADE and minFDE. Although

these metrics have several limitations [4] and new metrics

have been introduced recently to address some of these

limitations [10], we use them in this work since they remain

the most common performance indicators at the moment.

To formally deﬁne these metrics, let ˆ

Sdenote a set of

trajectory predictions for a set of road users Nat future

prediction horizons T. The minADE of the predictions for

prediction horizon tis given by

minADE(ˆ

S, t) = X

n∈N

min

ˆs∈ˆ

SnX

t0∈T

t0≤t

kˆst−sn

tk2

|N|×|T|,(1)

where ˆ

Sndenotes the set of predictions for a road user n,

ˆstdenotes the predicted position at time t, and sn

tdenotes

the true position of road user nat time t. Additionally, |.|

denotes the size of a set and k.k2denotes the L2-norm of a

vector. Similarly, the minFDE at a given prediction horizon

tis deﬁned as

minFDE(ˆ

S, t) = X

n∈N

min

ˆs∈ˆ

kˆst−sn

tk2

|N|.(2)

B. Recent Datasets & Waymo’s Motion Prediction Challenge

Several large datasets are publicly available for develop-

ment and evaluation of prediction models, with some of the

most recent and often used ones as summarized in Table I.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Scenario-basedEvaluationofPredictionModelsforAutomatedVehiclesManuelMuñozSánchez1JosElfring2EmiliaSilvas3andRenévandeMolengraft1AbstractTooperatesafely,anautomatedvehicle(AV)mustanticipatehowtheenvironmentarounditwillevolve.Forthatpurpose,itisimportanttoknowwhichpredictionmodelsaremostappropriatefo...

展开>> 收起<<

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Scenario-based Evaluation of Prediction Models for Automated Vehicles Manuel Muñoz Sánchez1Jos Elfring2Emilia Silvas3and René van de Molengraft1 Abstract To operate safely an automated vehicle A V must

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: