Exploiting map information for self-supervised learning in motion forecasting Caio Azevedo12 Thomas Gilles13 Stefano Sabatini1 Dzmitry Tsishkou1

2025-04-27 0 0 653.88KB 7 页 10玖币
侵权投诉
Exploiting map information for self-supervised learning
in motion forecasting
Caio Azevedo1,2, Thomas Gilles1,3, Stefano Sabatini1, Dzmitry Tsishkou1
Abstract Inspired by recent developments regarding the
application of self-supervised learning (SSL), we devise an
auxiliary task for trajectory prediction that takes advantage
of map-only information such as graph connectivity with the
intent of improving map comprehension and generalization. We
apply this auxiliary task through two frameworks - multitasking
and pretraining. In either framework we observe significant
improvement of our baseline in metrics such as minFDE6
(as much as 20.3%) and MissRate6(as much as 33.3%), as
well as a richer comprehension of map features demonstrated
by different training configurations. The results obtained were
consistent in all three data sets used for experiments: Argoverse,
Interaction and NuScenes. We also submit our new pretrained
model’s results to the Interaction challenge and achieve 1st
place with respect to minFDE6and minADE6.
I. INTRODUCTION
SSL has seen widespread application in areas such as
natural language processing and computer vision, where
large-scale unlabeled data sets are freely available. However,
due to the lack of such large-scale data sets in motion
forecasting and its difficulties in terms of data collection,
only recently has SSL received attention as a means of
improving performance in the prediction and planning field
[1], [2]. There are still, therefore, many paths to explore in
regards to new auxiliary tasks, the different frameworks in
which these can be coupled with the trajectory prediction task
and the optimal performance that can be achieved through
the tuning of the resulting new hyperparameters.
We try therefore to analyze the impact of one specially
designed auxiliary task on our current models to gain a
deeper understanding of the effects of SSL in motion fore-
casting, in particular through the usage of freely available
1IoV team, Paris Research Center, Huawei Technologies France
2Ecole Polytechnique, Computer Science Department
3MINES ParisTech, PSL University, Center for robotics
map information. More specifically, we show that in either a
multitasking framework or a pretraining and then fine-tuning
framework this auxiliary task is able to improve metrics.
Furthermore, we remark that using map features such as
graph connectivity in SSL helps the model understand better
the road geometry and thus restrict the predictions to more
probable paths.
II. RELATED WORK
Learning-based methods are very effective for motion
forecasting, whether they extend traditional physics-based
techniques [3], [4] or rely on fully data-driven pipelines
[5]. Their efficiency comes from their ease of modelling
cross-data relationships, mostly for agent interaction [6], [7]
or map understanding [8], [9]. These relationships can be
synthesized by a wide array of techniques, namely graphs
[10]–[12], attention [13]–[16] or other pooling methods
[17]–[19].
However, due to hidden driver variables such as des-
tination and driving style, the future trajectory can have
multiple possibilities. Therefore trajectory prediction output
is required to be multi-modal, and some methods leverage
priors from either stored trajectory sets [20], [21] or the map
graph [22], [23] to yield multiple proposals. Many works use
the Winner-Take-All loss [8], [19], [24], but as multi-modal
supervision is impossible, this loss is inefficient and can only
train one modality at a time. More recently, many works
have tried to improved their multi-modal prediction by using
model ensembles for clustering [25], [26] or for transfer
learning [27], but these methods remain very expensive
training time-wise, as they require to train N times more
models, and in some case to infer all of them.
Pretraining, and more specifically self-supervision, has
been prevalent in many widely explored learning fields
Fig. 1: Design of Map Trajectories pretraining task.
arXiv:2210.04672v1 [cs.CV] 10 Oct 2022
such as NLP [28] and vision [29], [30]. In other tasks of
autonomous driving such as control, some methods design
synthetic ground truths [31] in order to help the learning
of their driving agents. Concurrent to this work, PreTraM
[32] applies contrastive learning between local map rasters
and trajectories to reinforce the learned relationship between
both, while SSL-Lanes [1] uses a multitasking framework
to analyze the performance improvement of four auxiliary
tasks.
III. METHOD
We proceed as following. First, we describe our base
prediction model, on top of which we will apply the aux-
iliary task in different frameworks. We continue with the
description of each element of this task, and finally detail its
application in each of the frameworks, either pretraining or
multitask.
A. Base trajectory prediction model
Our encoder is the same as in the GOHOME model
[33], which uses attention and graph convolutions to update
agent features with map information and to model agent
interactions. However, instead of decoding a heatmap of the
probability density of the final position of the tracked agent,
we like most of the state-of-the-art decode the kfull trajec-
tory predictions directly through a multi-layer perceptron and
the probability logits of each prediction separately through
a simple linear layer.
The trajectory loss function in this case selects the closest
prediction to the ground-truth by the usual winner-takes-all
method according to minFDEk. Once the closest prediction
is selected, we apply a smooth L1 loss [34] between the N
points in it and the Npoints in the ground-truth, and finally
average the results to get the main trajectory loss Ltraj .
There is also, however, a loss associated to the prediction
probabilities Lprob which in our case is the same as the one
used in TPCN [19]. In the end we combine these losses to
have Lmain =Ltraj +Lprob.
B. Design of the Map Trajectories auxiliary task
Our aim is to design an auxiliary task that uses map
information as a way of improving map comprehension.
However, such a task needs to share important features with
the main goal of trajectory prediction in order to avoid
forgetting the information learned [35] or having conflicting
information from both tasks. One essential feature of motion
forecasting is multi-modality, and a natural approach to
incorporate it to the SSL task while exploiting the map is
to make the network predict, given a starting position, all
trajectories that an agent in that position could take in a
given time horizon. Each trajectory can be split into two parts
- the past and the future. All trajectories generated for the
same starting position share the same past, but their possible
futures vary.
1) Map exploration: Each data set provides its HD-map
as a graph that consists of lanelets (nodes) and edges. The
lanelets represent sections of the roads 10 to 20 meters long
on average. They contain the succession of coordinates of
the center line of its associated lane segment, also giving its
direction. The edges of the graph represent the connectivity
of the lanelets. For clarity, in the following we differentiate
a "path" – a sequence of lanelets that may be travelled by an
agent, from a "trajectory" – a precise sequence of coordinates
taken in regular time intervals and of definite length.
The pretraining data consists of a list of such lanelets along
with the local graph each is a part of, allowing the easy
choice of a starting position from which the all the possible
paths will be built using the connectivity of the graph every
time a sample is taken.
From the starting lanelet, we do a depth-first search to
find all possible paths. We stop the search once a maximum
distance is reached, or when there are no more successor
nodes. We store the paths found as lists of lanelet IDs and
for each of these lists we concatenate their respective center
lines to create a "guide-line" for the possible trajectories.
2) Synthetic speeds and accelerations: In order to bring
the network input as close as possible to that of the main
trajectory prediction task, we create from sampled values
of speed and acceleration the ground-truths of the possible
trajectories and a synthetic past history.
We sample the agent’s initial speed from a uniform distri-
bution, which allows the model to adapt to the different speed
distributions found in each data set more easily. In a certain
percentage of samples, we also take a random acceleration
from a Laplace distribution, centered on 0 and with a scale
chosen to fit the empirical data. This acceleration is kept
constant throughout the past, but to create diversity in the
future modalities, we add a random noise to the acceleration
in the future part of the trajectory, also taken from a Laplace
distribution of smaller scale.
3) Ground-truth and past generation: Using the guide-
line generated by the node paths, we interpolate its coor-
dinates to create a ground-truth trajectory that has constant
acceleration, consistent with the speed and acceleration sam-
pled. We take care to extrapolate the trajectory if there are
not enough points or to cut it off earlier whenever the desired
number of points is achieved, so that every possible trajectory
is an array of equal size. The past is generated in the same
way, except that we simply traverse the starting lanelet’s
predecessors instead of its successors for the node ID list,
and only generate one trajectory.
Finally, to simulate perception noise encountered in the
real data, we add an independent Gaussian noise to each
step of the past trajectory.
4) Network output: The network outputs npred predic-
tions of map trajectories. The loss function is based upon
the matching of predictions and ground-truths.
Most often, the number of actual ground-truths ngt is
less than npred, in which case some predictions are left un-
matched to any ground-truth. We run then into an assignment
problem of which predictions should be matched to a certain
摘要:

Exploitingmapinformationforself-supervisedlearninginmotionforecastingCaioAzevedo1;2,ThomasGilles1;3,StefanoSabatini1,DzmitryTsishkou1Abstract—Inspiredbyrecentdevelopmentsregardingtheapplicationofself-supervisedlearning(SSL),wedeviseanauxiliarytaskfortrajectorypredictionthattakesadvantageofmap-onlyin...

展开>> 收起<<
Exploiting map information for self-supervised learning in motion forecasting Caio Azevedo12 Thomas Gilles13 Stefano Sabatini1 Dzmitry Tsishkou1.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:653.88KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注