Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects Felicia Ruppel12 Florian Faion1 Claudius Gl aser1and Klaus Dietmayer2

2025-04-30 0 0 469.92KB 5 页 10玖币

侵权投诉

Can Transformer Attention Spread Give Insights Into Uncertainty of

Detected and Tracked Objects?

Felicia Ruppel1,2, Florian Faion1, Claudius Gl¨

aser1and Klaus Dietmayer2

Abstract— Transformers have recently been utilized to per-

form object detection and tracking in the context of autonomous

driving. One unique characteristic of these models is that

attention weights are computed in each forward pass, giving

insights into the model’s interior, in particular, which part

of the input data it deemed interesting for the given task.

Such an attention matrix with the input grid is available for

each detected (or tracked) object in every transformer decoder

layer. In this work, we investigate the distribution of these

attention weights: How do they change through the decoder

layers and through the lifetime of a track? Can they be used

to infer additional information about an object, such as a

detection uncertainty? Especially in unstructured environments,

or environments that were not common during training, a

reliable measure of detection uncertainty is crucial to decide

whether the system can still be trusted or not.

I. INTRODUCTION

Object detection and tracking are essential tasks in a

perception pipeline for autonomous and automated driving.

Only with knowledge about surrounding objects, downstream

tasks, such as prediction and planning, are possible. In such

a system, where the cascading effects of perception errors

can be detrimental, it is very important to be able to quan-

tify the reliability of the detection and tracking output. In

object detection, uncertainty can stem from two sources [1]:

Epistemic uncertainty is caused by uncertainty of the model,

e.g. when an observation is made that was not present in

the training dataset. Unstructured and dynamic environments

can also cause such an uncertainty, as their versatility can

hardly be captured in a training dataset. Second, aleatoric

uncertainty stems from sensor noise, and also encompasses

uncertainty caused by low visibility and increased distance

from the sensor [1].

While state-of-the-art object detection methods have been

based on deep learning for many years, both with image

input [2] as well as on point clouds [3], [4], it is a recent

phenomenon that deep learning based models are also used

for joint tracking and detection [5], [6], [7]. Such trackers

aim to utilize the detector’s latent space to infer additional in-

formation about a tracked object, rather than relying on low-

dimensional bounding boxes as input. However, they have

the drawback that they are unable to output an uncertainty,

as a conventional method would, e.g. tracking based on a

Kalman ﬁlter [8]. While deep learning based detectors and

trackers usually output a conﬁdence score or class probability

1Robert Bosch GmbH, Corporate Research, 71272 Renningen, Germany,

{firstname.lastname}@de.bosch.com

2Institute of Measurement, Control and Microtechnology, Ulm Univer-

sity, Germany, {firstname.lastname}@uni-ulm.de

Fig. 1. Example of estimated bounding boxes with their respective attention

covariance matrices, pictured as ellipses. Ground truth boxes are denoted by

dotted grey lines, while estimated boxes, attention weights, attention mean

and ellipses are colored. Excerpt from the birds-eye-view grid at a distance

of 30 to 50 meters from the ego vehicle.

score per estimated object, these generally can not be used

as a reliable uncertainty measure, but additional measures

are necessary to capture uncertainty [9].

One approach towards joint object detection and tracking

is the usage of transformer models [10], which were able

to achieve state-of-the-art results in some domains [11],

[6]. Transformers are based on attention, i.e. the interaction

between input tokens, which is why these models allow for

a unique insight into their reasoning: One can visualize the

attention matrices that are computed in each model forward

pass and investigate which part of the input data was used

to generate a certain output. In previous work, we developed

a transformer based model for detection and tracking [7] in

the context of autonomous driving that operates on (lidar)

point clouds. An example of visualized attention weights

from the tracking model are pictured in Figure 1. In empirical

observations, a more focused attention tends to lead to a more

accurate detection. Therefore, we investigate whether the

attention weight distribution can give insights into a detection

uncertainty in this paper. An uncertainty indicator would be

very valuable towards the ability to use transformer based

arXiv:2210.14391v1 [cs.CV] 26 Oct 2022

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CanTransformerAttentionSpreadGiveInsightsIntoUncertaintyofDetectedandTrackedObjects?FeliciaRuppel1;2,FlorianFaion1,ClaudiusGl¨aser1andKlausDietmayer2AbstractTransformershaverecentlybeenutilizedtoper-formobjectdetectionandtrackinginthecontextofautonomousdriving.Oneuniquecharacteristicofthesemodelsis...

展开>> 收起<<

Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects Felicia Ruppel12 Florian Faion1 Claudius Gl aser1and Klaus Dietmayer2.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects Felicia Ruppel12 Florian Faion1 Claudius Gl aser1and Klaus Dietmayer2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: