DiPA Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving Anthony Knittel1 Majd Hawasly1 Stefano V . Albrecht12 John Redford1 Subramanian Ramamoorthy12

2025-04-26 0 0 1.1MB 8 页 10玖币

侵权投诉

DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous

Driving

Anthony Knittel1, Majd Hawasly1, Stefano V. Albrecht1,2, John Redford1, Subramanian Ramamoorthy1,2

Abstract— Accurate prediction is important for operating an

autonomous vehicle in interactive scenarios. Prediction must

be fast, to support multiple requests from a planner exploring

a range of possible futures. The generated predictions must

accurately represent the probabilities of predicted trajectories,

while also capturing different modes of behaviour (such as

turning left vs continuing straight at a junction). To this

end, we present DiPA, an interactive predictor that addresses

these challenging requirements. Previous interactive prediction

methods use an encoding of k-mode-samples, which under-

represents the full distribution. Other methods optimise closest-

mode evaluations, which test whether one of the predictions

is similar to the ground-truth, but allow additional unlikely

predictions to occur, over-representing unlikely predictions.

DiPA addresses these limitations by using a Gaussian-Mixture-

Model to encode the full distribution, and optimising predictions

using both probabilistic and closest-mode measures. These

objectives respectively optimise probabilistic accuracy and the

ability to capture distinct behaviours, and there is a challenging

trade-off between them. We are able to solve both together using

a novel training regime. DiPA achieves new state-of-the-art

performance on the INTERACTION and NGSIM datasets, and

improves over the baseline (MFP) when both closest-mode and

probabilistic evaluations are used. This demonstrates effective

prediction for supporting a planner on interactive scenarios.

I. INTRODUCTION

Prediction of the future motion of surrounding road users

is essential for the safe operation of an autonomous vehicle

(AV). Road scenarios such as intersections, merges and

roundabouts require signiﬁcant interaction between agents

in the scene, where agent behaviour is inﬂuenced by the

presence of nearby agents, as well as reactions to actions that

other agents take. In order to support planning, a predictor

needs to estimate the future states of the surrounding road

users based on observations of their recent history, and to

estimate the risk of conﬂict for possible ego actions.

A planning system used in interactive scenarios needs to

consider different possible actions that other vehicles may

take, and the futures that result from different actions. In

order to explore these futures, a supporting predictor needs to

be computationally fast, and to provide accurate predictions

that represent the expected distribution of future states of

each agent. Many combinations of actions may be possible,

so an interactive predictor needs to be fast in order to allow

different futures to be explored.

Existing predictors addressing this task have encoded pre-

dictions using a ﬁxed number of mode samples, for example

using 6 predicted trajectories encoded as center positions [1],

1Five AI Ltd, UK. anthony.knittel@five.ai

2School of Informatics, University of Edinburgh, Edinburgh, UK

0.8 0.2

Fig. 1. Top: Use of k-mode samples (red, k=2) under-represents the

distribution of future positions (black). This prevents effective planning

by underestimating states which are reasonably likely to occur. Bottom: A

GMM encoding, with associated mode weights, provides a more accurate

representation of the full distribution by covering a wider range of samples.

0.9

0.1

Fig. 2. Top: Optimising for closest-mode evaluations can allow unrealistic

predictions to be over-represented. For an instance of data (black dot), the

closest predicted mode (red) is evaluated while additional modes (blue) can

predict unrealistic behaviours without penalty. Unlikely predicted modes

interfere with planning, for example causing an emergency break to avoid a

predicted collision that is unlikely. Bottom: Optimising for both closest-

mode and probabilistic evaluations penalises unlikely predictions, while

minimising over- and under-representation.

[2], [3]. These are evaluated using minimum average- or

ﬁnal-displacement error (minADE/FDE) and miss-rate (MR)

(see Section IV-B). These measures compare the closest

predicted mode with the ground-truth, and are important for

demonstrating that predictions closely capture distinct modes

of behaviour observed in the data.

A limitation of this sample-based encoding is that it

does not represent the full distribution of expected future

positions, and as such many variations are under-represented

(Fig. 1). A further limitation is that probabilities of predicted

modes are not considered. When training a model based on

closest-mode evaluations, additional predicted modes (other

than the closest) do not affect scoring, which allows the

predictor to predict behaviour modes that are unlikely to

occur. Each predicted mode has equal weight, which results

in over representation of unlikely predictions (Fig. 2).

arXiv:2210.06106v2 [cs.RO] 8 Mar 2023

0.6

0.4

0.6

0.4

Fig. 3. A merge scenario produces a bi-modal distribution (black sam-

ples). Optimising closest-mode (minADE/FDE) evaluations favours diverse

predictions (green), while probabilistic (predRMS) evaluations favour pre-

dictions close to the mean (red), that minimise the penalty of incorrect

mode estimates. Solving both requires diverse predictions with the ability

to accurately estimate mode probabilities.

These limitations can be addressed using a Gaussian

Mixture Model (GMM), which represents the full predicted

distribution, along with probability estimates of each mode.

This is preferred over increasing the number of samples, as

GMMs provide a compact encoding of the distribution and

a practical means of evaluating the probability distribution.

Previous methods [4], [5] have used GMMs on the NGSIM

dataset, which are evaluated using negative-log-likelihood

(NLL) evaluations. Further methods have used mode proba-

bility estimates [6], [7] which are evaluated using predicted-

mode RMS (predRMS) evaluations (see Section IV-B).

Probabilistic and closest-mode evaluations provide com-

plimentary measures that are more informative than either

alone, and are analogous to precision and recall in binary

classiﬁcation. We argue that an effective predictor for inter-

active scenarios needs to optimise both measures, to demon-

strate that it is able to closely capture distinct behaviour

modes, while also accurately representing probabilities. This

is a challenging task as different evaluation measures are

supported by contradictory prediction strategies. Closest-

mode evaluations (minADE/FDE/MR) favour diverse pre-

dictions, while probabilistic evaluations (predRMS, NLL)

favour conservative predictions close to the mean of expected

behaviours, where the cost of incorrect mode estimates is

minimised (Figure 3). Optimising both evaluation approaches

together demonstrates accurate multi-modal prediction, and

reduces the over-representation of unlikely predictions seen

in Figure 2.

To that end, we present DiPA (Diverse and Probabilisti-

cally Accurate) – a fast method for predicting in interactive

scenarios using a GMM encoding, that is able to optimise

both objectives together, by producing a diverse set of predic-

tions with accurate probability estimates. This allows distinct

behaviours to be accurately modelled, while producing an

accurate representation of the full trajectory distribution. This

improves over previous methods [1], [3] using closest-mode

evaluations on the INTERACTION dataset [8], and improves

over previous methods [7], [4] using probabilistic evaluations

on NGSIM [9]. DiPA also improves over a baseline method

(Multiple-Futures Prediction (MFP)) [5] when comparing

both closest-mode and probabilistic measures together. This

demonstrates a predictor that is suitable for supporting an

AV planner in interactive scenarios.

Beyond highlighting the importance of evaluating predic-

tors with both closest-mode and probabilistic evaluations, the

key contributions are: 1) a fast prediction architecture with

a ﬂexible representation that processes agent interactions

in wide-ranging road layouts, that produces high accuracy

predictions on interactive scenarios, 2) a training regime that

supports a diverse set of predicted modes using a GMM-

based spatial distribution, with accurate probability esti-

mates, and 3) a revision to the NLL measure for evaluating

GMM predictions, to correct for an important limitation.

II. RELATED WORK

A number of different structures have been used for

prediction of agents in road scenes, including graph-, goal-

and regression-based methods.

StarNet [1] represents the scene and agents using vector-

based graphs, and uses a combined representation of agents

within their own reference frame and from the points of view

of other agents. Further graph-based methods such as [10],

[3], [11] combine map information and agent positions into a

common representation, commonly processed with a Graph

Neural Network [12] in an encoder-decoder framework.

These methods allow encoding the static layout of the scene

and various agents in a generalisable way, and have shown

good results on closest-mode prediction.

Goal-based methods [13], [14], [15], [16], [17] identify

a number of potential future targets that each agent may

head towards, determine likelihoods of each, and produce

predicted trajectories towards those goals. Flash [7] uses

a combination of Bayesian inverse-planning and mixture-

density networks to produce accurate predictions of trajec-

tories in highway driving scenarios. Goal-based methods

use the map to inform trajectory generation, and can use

kinematically-sound trajectory generators. However, this can

lead to limited diversity on other factors such as motion

proﬁle and path variations compared to data-driven methods.

Regression-based methods use representations that directly

map observations to predicted outputs. SAMMP [4] produces

joint predictions of the spatial distribution of vehicles, using

a multi-head self-attention function to capture interactions

between agents. Multiple-Futures Prediction (MFP) [5] mod-

els the joint futures of a number of interacting agents, using

learnt latent variables for generating predicted future modes.

Mersch et al. [18] present a temporal-convolution method

for predicting interacting vehicles in a highway scenario

where neighbouring agents are assigned speciﬁc roles based

on relative positions to a central agent. These regression-

based methods can be fast and accurate, but may have

limited generalisability to different layouts when role-based

representation of inputs is used.

Existing interactive prediction using the INTERACTION

dataset have demonstrated good results based on closest-

mode evaluations (minADE / FDE / MR) [1], [2], [3]. These

have typically used a prediction encoding using a ﬁxed

number of modes, each represented as a trajectory sample.

Optimising closest-mode evaluations produces diverse pre-

dictions, which closely capture distinct modes of behaviour.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DiPA:ProbabilisticMulti-ModalInteractivePredictionforAutonomousDrivingAnthonyKnittel1,MajdHawasly1,StefanoV.Albrecht1;2,JohnRedford1,SubramanianRamamoorthy1;2AbstractAccuratepredictionisimportantforoperatinganautonomousvehicleininteractivescenarios.Predictionmustbefast,tosupportmultiplerequestsfrom...

展开>> 收起<<

DiPA Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving Anthony Knittel1 Majd Hawasly1 Stefano V . Albrecht12 John Redford1 Subramanian Ramamoorthy12.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DiPA Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving Anthony Knittel1 Majd Hawasly1 Stefano V . Albrecht12 John Redford1 Subramanian Ramamoorthy12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: