Exploiting map information for self-supervised learning in motion forecasting Caio Azevedo12 Thomas Gilles13 Stefano Sabatini1 Dzmitry Tsishkou1

2025-04-27 0 0 653.88KB 7 页 10玖币

侵权投诉

Exploiting map information for self-supervised learning

in motion forecasting

Caio Azevedo1,2, Thomas Gilles1,3, Stefano Sabatini1, Dzmitry Tsishkou1

Abstract— Inspired by recent developments regarding the

application of self-supervised learning (SSL), we devise an

auxiliary task for trajectory prediction that takes advantage

of map-only information such as graph connectivity with the

intent of improving map comprehension and generalization. We

apply this auxiliary task through two frameworks - multitasking

and pretraining. In either framework we observe signiﬁcant

improvement of our baseline in metrics such as minFDE6

(as much as 20.3%) and MissRate6(as much as 33.3%), as

well as a richer comprehension of map features demonstrated

by different training conﬁgurations. The results obtained were

consistent in all three data sets used for experiments: Argoverse,

Interaction and NuScenes. We also submit our new pretrained

model’s results to the Interaction challenge and achieve 1st

place with respect to minFDE6and minADE6.

I. INTRODUCTION

SSL has seen widespread application in areas such as

natural language processing and computer vision, where

large-scale unlabeled data sets are freely available. However,

due to the lack of such large-scale data sets in motion

forecasting and its difﬁculties in terms of data collection,

only recently has SSL received attention as a means of

improving performance in the prediction and planning ﬁeld

[1], [2]. There are still, therefore, many paths to explore in

regards to new auxiliary tasks, the different frameworks in

which these can be coupled with the trajectory prediction task

and the optimal performance that can be achieved through

the tuning of the resulting new hyperparameters.

We try therefore to analyze the impact of one specially

designed auxiliary task on our current models to gain a

deeper understanding of the effects of SSL in motion fore-

casting, in particular through the usage of freely available

1IoV team, Paris Research Center, Huawei Technologies France

2Ecole Polytechnique, Computer Science Department

3MINES ParisTech, PSL University, Center for robotics

map information. More speciﬁcally, we show that in either a

multitasking framework or a pretraining and then ﬁne-tuning

framework this auxiliary task is able to improve metrics.

Furthermore, we remark that using map features such as

graph connectivity in SSL helps the model understand better

the road geometry and thus restrict the predictions to more

probable paths.

II. RELATED WORK

Learning-based methods are very effective for motion

forecasting, whether they extend traditional physics-based

techniques [3], [4] or rely on fully data-driven pipelines

[5]. Their efﬁciency comes from their ease of modelling

cross-data relationships, mostly for agent interaction [6], [7]

or map understanding [8], [9]. These relationships can be

synthesized by a wide array of techniques, namely graphs

[10]–[12], attention [13]–[16] or other pooling methods

[17]–[19].

However, due to hidden driver variables such as des-

tination and driving style, the future trajectory can have

multiple possibilities. Therefore trajectory prediction output

is required to be multi-modal, and some methods leverage

priors from either stored trajectory sets [20], [21] or the map

graph [22], [23] to yield multiple proposals. Many works use

the Winner-Take-All loss [8], [19], [24], but as multi-modal

supervision is impossible, this loss is inefﬁcient and can only

train one modality at a time. More recently, many works

have tried to improved their multi-modal prediction by using

model ensembles for clustering [25], [26] or for transfer

learning [27], but these methods remain very expensive

training time-wise, as they require to train N times more

models, and in some case to infer all of them.

Pretraining, and more speciﬁcally self-supervision, has

been prevalent in many widely explored learning ﬁelds

Fig. 1: Design of Map Trajectories pretraining task.

arXiv:2210.04672v1 [cs.CV] 10 Oct 2022

such as NLP [28] and vision [29], [30]. In other tasks of

autonomous driving such as control, some methods design

synthetic ground truths [31] in order to help the learning

of their driving agents. Concurrent to this work, PreTraM

[32] applies contrastive learning between local map rasters

and trajectories to reinforce the learned relationship between

both, while SSL-Lanes [1] uses a multitasking framework

to analyze the performance improvement of four auxiliary

tasks.

III. METHOD

We proceed as following. First, we describe our base

prediction model, on top of which we will apply the aux-

iliary task in different frameworks. We continue with the

description of each element of this task, and ﬁnally detail its

application in each of the frameworks, either pretraining or

multitask.

A. Base trajectory prediction model

Our encoder is the same as in the GOHOME model

[33], which uses attention and graph convolutions to update

agent features with map information and to model agent

interactions. However, instead of decoding a heatmap of the

probability density of the ﬁnal position of the tracked agent,

we like most of the state-of-the-art decode the kfull trajec-

tory predictions directly through a multi-layer perceptron and

the probability logits of each prediction separately through

a simple linear layer.

The trajectory loss function in this case selects the closest

prediction to the ground-truth by the usual winner-takes-all

method according to minFDEk. Once the closest prediction

is selected, we apply a smooth L1 loss [34] between the N

points in it and the Npoints in the ground-truth, and ﬁnally

average the results to get the main trajectory loss Ltraj .

There is also, however, a loss associated to the prediction

probabilities Lprob which in our case is the same as the one

used in TPCN [19]. In the end we combine these losses to

have Lmain =Ltraj +Lprob.

B. Design of the Map Trajectories auxiliary task

Our aim is to design an auxiliary task that uses map

information as a way of improving map comprehension.

However, such a task needs to share important features with

the main goal of trajectory prediction in order to avoid

forgetting the information learned [35] or having conﬂicting

information from both tasks. One essential feature of motion

forecasting is multi-modality, and a natural approach to

incorporate it to the SSL task while exploiting the map is

to make the network predict, given a starting position, all

trajectories that an agent in that position could take in a

given time horizon. Each trajectory can be split into two parts

- the past and the future. All trajectories generated for the

same starting position share the same past, but their possible

futures vary.

1) Map exploration: Each data set provides its HD-map

as a graph that consists of lanelets (nodes) and edges. The

lanelets represent sections of the roads 10 to 20 meters long

on average. They contain the succession of coordinates of

the center line of its associated lane segment, also giving its

direction. The edges of the graph represent the connectivity

of the lanelets. For clarity, in the following we differentiate

a "path" – a sequence of lanelets that may be travelled by an

agent, from a "trajectory" – a precise sequence of coordinates

taken in regular time intervals and of deﬁnite length.

The pretraining data consists of a list of such lanelets along

with the local graph each is a part of, allowing the easy

choice of a starting position from which the all the possible

paths will be built using the connectivity of the graph every

time a sample is taken.

From the starting lanelet, we do a depth-ﬁrst search to

ﬁnd all possible paths. We stop the search once a maximum

distance is reached, or when there are no more successor

nodes. We store the paths found as lists of lanelet IDs and

for each of these lists we concatenate their respective center

lines to create a "guide-line" for the possible trajectories.

2) Synthetic speeds and accelerations: In order to bring

the network input as close as possible to that of the main

trajectory prediction task, we create from sampled values

of speed and acceleration the ground-truths of the possible

trajectories and a synthetic past history.

We sample the agent’s initial speed from a uniform distri-

bution, which allows the model to adapt to the different speed

distributions found in each data set more easily. In a certain

percentage of samples, we also take a random acceleration

from a Laplace distribution, centered on 0 and with a scale

chosen to ﬁt the empirical data. This acceleration is kept

constant throughout the past, but to create diversity in the

future modalities, we add a random noise to the acceleration

in the future part of the trajectory, also taken from a Laplace

distribution of smaller scale.

3) Ground-truth and past generation: Using the guide-

line generated by the node paths, we interpolate its coor-

dinates to create a ground-truth trajectory that has constant

acceleration, consistent with the speed and acceleration sam-

pled. We take care to extrapolate the trajectory if there are

not enough points or to cut it off earlier whenever the desired

number of points is achieved, so that every possible trajectory

is an array of equal size. The past is generated in the same

way, except that we simply traverse the starting lanelet’s

predecessors instead of its successors for the node ID list,

and only generate one trajectory.

Finally, to simulate perception noise encountered in the

real data, we add an independent Gaussian noise to each

step of the past trajectory.

4) Network output: The network outputs npred predic-

tions of map trajectories. The loss function is based upon

the matching of predictions and ground-truths.

Most often, the number of actual ground-truths ngt is

less than npred, in which case some predictions are left un-

matched to any ground-truth. We run then into an assignment

problem of which predictions should be matched to a certain

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Exploitingmapinformationforself-supervisedlearninginmotionforecastingCaioAzevedo1;2,ThomasGilles1;3,StefanoSabatini1,DzmitryTsishkou1AbstractInspiredbyrecentdevelopmentsregardingtheapplicationofself-supervisedlearning(SSL),wedeviseanauxiliarytaskfortrajectorypredictionthattakesadvantageofmap-onlyin...

展开>> 收起<<

Exploiting map information for self-supervised learning in motion forecasting Caio Azevedo12 Thomas Gilles13 Stefano Sabatini1 Dzmitry Tsishkou1.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exploiting map information for self-supervised learning in motion forecasting Caio Azevedo12 Thomas Gilles13 Stefano Sabatini1 Dzmitry Tsishkou1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: