DiPA Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving Anthony Knittel1 Majd Hawasly1 Stefano V . Albrecht12 John Redford1 Subramanian Ramamoorthy12

2025-04-26 0 0 1.1MB 8 页 10玖币
侵权投诉
DiPA: Probabilistic Multi-Modal Interactive Prediction for Autonomous
Driving
Anthony Knittel1, Majd Hawasly1, Stefano V. Albrecht1,2, John Redford1, Subramanian Ramamoorthy1,2
Abstract Accurate prediction is important for operating an
autonomous vehicle in interactive scenarios. Prediction must
be fast, to support multiple requests from a planner exploring
a range of possible futures. The generated predictions must
accurately represent the probabilities of predicted trajectories,
while also capturing different modes of behaviour (such as
turning left vs continuing straight at a junction). To this
end, we present DiPA, an interactive predictor that addresses
these challenging requirements. Previous interactive prediction
methods use an encoding of k-mode-samples, which under-
represents the full distribution. Other methods optimise closest-
mode evaluations, which test whether one of the predictions
is similar to the ground-truth, but allow additional unlikely
predictions to occur, over-representing unlikely predictions.
DiPA addresses these limitations by using a Gaussian-Mixture-
Model to encode the full distribution, and optimising predictions
using both probabilistic and closest-mode measures. These
objectives respectively optimise probabilistic accuracy and the
ability to capture distinct behaviours, and there is a challenging
trade-off between them. We are able to solve both together using
a novel training regime. DiPA achieves new state-of-the-art
performance on the INTERACTION and NGSIM datasets, and
improves over the baseline (MFP) when both closest-mode and
probabilistic evaluations are used. This demonstrates effective
prediction for supporting a planner on interactive scenarios.
I. INTRODUCTION
Prediction of the future motion of surrounding road users
is essential for the safe operation of an autonomous vehicle
(AV). Road scenarios such as intersections, merges and
roundabouts require significant interaction between agents
in the scene, where agent behaviour is influenced by the
presence of nearby agents, as well as reactions to actions that
other agents take. In order to support planning, a predictor
needs to estimate the future states of the surrounding road
users based on observations of their recent history, and to
estimate the risk of conflict for possible ego actions.
A planning system used in interactive scenarios needs to
consider different possible actions that other vehicles may
take, and the futures that result from different actions. In
order to explore these futures, a supporting predictor needs to
be computationally fast, and to provide accurate predictions
that represent the expected distribution of future states of
each agent. Many combinations of actions may be possible,
so an interactive predictor needs to be fast in order to allow
different futures to be explored.
Existing predictors addressing this task have encoded pre-
dictions using a fixed number of mode samples, for example
using 6 predicted trajectories encoded as center positions [1],
1Five AI Ltd, UK. anthony.knittel@five.ai
2School of Informatics, University of Edinburgh, Edinburgh, UK
0.8 0.2
Fig. 1. Top: Use of k-mode samples (red, k=2) under-represents the
distribution of future positions (black). This prevents effective planning
by underestimating states which are reasonably likely to occur. Bottom: A
GMM encoding, with associated mode weights, provides a more accurate
representation of the full distribution by covering a wider range of samples.
0.9
0.1
Fig. 2. Top: Optimising for closest-mode evaluations can allow unrealistic
predictions to be over-represented. For an instance of data (black dot), the
closest predicted mode (red) is evaluated while additional modes (blue) can
predict unrealistic behaviours without penalty. Unlikely predicted modes
interfere with planning, for example causing an emergency break to avoid a
predicted collision that is unlikely. Bottom: Optimising for both closest-
mode and probabilistic evaluations penalises unlikely predictions, while
minimising over- and under-representation.
[2], [3]. These are evaluated using minimum average- or
final-displacement error (minADE/FDE) and miss-rate (MR)
(see Section IV-B). These measures compare the closest
predicted mode with the ground-truth, and are important for
demonstrating that predictions closely capture distinct modes
of behaviour observed in the data.
A limitation of this sample-based encoding is that it
does not represent the full distribution of expected future
positions, and as such many variations are under-represented
(Fig. 1). A further limitation is that probabilities of predicted
modes are not considered. When training a model based on
closest-mode evaluations, additional predicted modes (other
than the closest) do not affect scoring, which allows the
predictor to predict behaviour modes that are unlikely to
occur. Each predicted mode has equal weight, which results
in over representation of unlikely predictions (Fig. 2).
arXiv:2210.06106v2 [cs.RO] 8 Mar 2023
0.6
0.4
0.6
0.4
Fig. 3. A merge scenario produces a bi-modal distribution (black sam-
ples). Optimising closest-mode (minADE/FDE) evaluations favours diverse
predictions (green), while probabilistic (predRMS) evaluations favour pre-
dictions close to the mean (red), that minimise the penalty of incorrect
mode estimates. Solving both requires diverse predictions with the ability
to accurately estimate mode probabilities.
These limitations can be addressed using a Gaussian
Mixture Model (GMM), which represents the full predicted
distribution, along with probability estimates of each mode.
This is preferred over increasing the number of samples, as
GMMs provide a compact encoding of the distribution and
a practical means of evaluating the probability distribution.
Previous methods [4], [5] have used GMMs on the NGSIM
dataset, which are evaluated using negative-log-likelihood
(NLL) evaluations. Further methods have used mode proba-
bility estimates [6], [7] which are evaluated using predicted-
mode RMS (predRMS) evaluations (see Section IV-B).
Probabilistic and closest-mode evaluations provide com-
plimentary measures that are more informative than either
alone, and are analogous to precision and recall in binary
classification. We argue that an effective predictor for inter-
active scenarios needs to optimise both measures, to demon-
strate that it is able to closely capture distinct behaviour
modes, while also accurately representing probabilities. This
is a challenging task as different evaluation measures are
supported by contradictory prediction strategies. Closest-
mode evaluations (minADE/FDE/MR) favour diverse pre-
dictions, while probabilistic evaluations (predRMS, NLL)
favour conservative predictions close to the mean of expected
behaviours, where the cost of incorrect mode estimates is
minimised (Figure 3). Optimising both evaluation approaches
together demonstrates accurate multi-modal prediction, and
reduces the over-representation of unlikely predictions seen
in Figure 2.
To that end, we present DiPA (Diverse and Probabilisti-
cally Accurate) – a fast method for predicting in interactive
scenarios using a GMM encoding, that is able to optimise
both objectives together, by producing a diverse set of predic-
tions with accurate probability estimates. This allows distinct
behaviours to be accurately modelled, while producing an
accurate representation of the full trajectory distribution. This
improves over previous methods [1], [3] using closest-mode
evaluations on the INTERACTION dataset [8], and improves
over previous methods [7], [4] using probabilistic evaluations
on NGSIM [9]. DiPA also improves over a baseline method
(Multiple-Futures Prediction (MFP)) [5] when comparing
both closest-mode and probabilistic measures together. This
demonstrates a predictor that is suitable for supporting an
AV planner in interactive scenarios.
Beyond highlighting the importance of evaluating predic-
tors with both closest-mode and probabilistic evaluations, the
key contributions are: 1) a fast prediction architecture with
a flexible representation that processes agent interactions
in wide-ranging road layouts, that produces high accuracy
predictions on interactive scenarios, 2) a training regime that
supports a diverse set of predicted modes using a GMM-
based spatial distribution, with accurate probability esti-
mates, and 3) a revision to the NLL measure for evaluating
GMM predictions, to correct for an important limitation.
II. RELATED WORK
A number of different structures have been used for
prediction of agents in road scenes, including graph-, goal-
and regression-based methods.
StarNet [1] represents the scene and agents using vector-
based graphs, and uses a combined representation of agents
within their own reference frame and from the points of view
of other agents. Further graph-based methods such as [10],
[3], [11] combine map information and agent positions into a
common representation, commonly processed with a Graph
Neural Network [12] in an encoder-decoder framework.
These methods allow encoding the static layout of the scene
and various agents in a generalisable way, and have shown
good results on closest-mode prediction.
Goal-based methods [13], [14], [15], [16], [17] identify
a number of potential future targets that each agent may
head towards, determine likelihoods of each, and produce
predicted trajectories towards those goals. Flash [7] uses
a combination of Bayesian inverse-planning and mixture-
density networks to produce accurate predictions of trajec-
tories in highway driving scenarios. Goal-based methods
use the map to inform trajectory generation, and can use
kinematically-sound trajectory generators. However, this can
lead to limited diversity on other factors such as motion
profile and path variations compared to data-driven methods.
Regression-based methods use representations that directly
map observations to predicted outputs. SAMMP [4] produces
joint predictions of the spatial distribution of vehicles, using
a multi-head self-attention function to capture interactions
between agents. Multiple-Futures Prediction (MFP) [5] mod-
els the joint futures of a number of interacting agents, using
learnt latent variables for generating predicted future modes.
Mersch et al. [18] present a temporal-convolution method
for predicting interacting vehicles in a highway scenario
where neighbouring agents are assigned specific roles based
on relative positions to a central agent. These regression-
based methods can be fast and accurate, but may have
limited generalisability to different layouts when role-based
representation of inputs is used.
Existing interactive prediction using the INTERACTION
dataset have demonstrated good results based on closest-
mode evaluations (minADE / FDE / MR) [1], [2], [3]. These
have typically used a prediction encoding using a fixed
number of modes, each represented as a trajectory sample.
Optimising closest-mode evaluations produces diverse pre-
dictions, which closely capture distinct modes of behaviour.
摘要:

DiPA:ProbabilisticMulti-ModalInteractivePredictionforAutonomousDrivingAnthonyKnittel1,MajdHawasly1,StefanoV.Albrecht1;2,JohnRedford1,SubramanianRamamoorthy1;2Abstract—Accuratepredictionisimportantforoperatinganautonomousvehicleininteractivescenarios.Predictionmustbefast,tosupportmultiplerequestsfrom...

展开>> 收起<<
DiPA Probabilistic Multi-Modal Interactive Prediction for Autonomous Driving Anthony Knittel1 Majd Hawasly1 Stefano V . Albrecht12 John Redford1 Subramanian Ramamoorthy12.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.1MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注