Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects Felicia Ruppel12 Florian Faion1 Claudius Gl aser1and Klaus Dietmayer2

2025-04-30 0 0 469.92KB 5 页 10玖币
侵权投诉
Can Transformer Attention Spread Give Insights Into Uncertainty of
Detected and Tracked Objects?
Felicia Ruppel1,2, Florian Faion1, Claudius Gl¨
aser1and Klaus Dietmayer2
Abstract Transformers have recently been utilized to per-
form object detection and tracking in the context of autonomous
driving. One unique characteristic of these models is that
attention weights are computed in each forward pass, giving
insights into the model’s interior, in particular, which part
of the input data it deemed interesting for the given task.
Such an attention matrix with the input grid is available for
each detected (or tracked) object in every transformer decoder
layer. In this work, we investigate the distribution of these
attention weights: How do they change through the decoder
layers and through the lifetime of a track? Can they be used
to infer additional information about an object, such as a
detection uncertainty? Especially in unstructured environments,
or environments that were not common during training, a
reliable measure of detection uncertainty is crucial to decide
whether the system can still be trusted or not.
I. INTRODUCTION
Object detection and tracking are essential tasks in a
perception pipeline for autonomous and automated driving.
Only with knowledge about surrounding objects, downstream
tasks, such as prediction and planning, are possible. In such
a system, where the cascading effects of perception errors
can be detrimental, it is very important to be able to quan-
tify the reliability of the detection and tracking output. In
object detection, uncertainty can stem from two sources [1]:
Epistemic uncertainty is caused by uncertainty of the model,
e.g. when an observation is made that was not present in
the training dataset. Unstructured and dynamic environments
can also cause such an uncertainty, as their versatility can
hardly be captured in a training dataset. Second, aleatoric
uncertainty stems from sensor noise, and also encompasses
uncertainty caused by low visibility and increased distance
from the sensor [1].
While state-of-the-art object detection methods have been
based on deep learning for many years, both with image
input [2] as well as on point clouds [3], [4], it is a recent
phenomenon that deep learning based models are also used
for joint tracking and detection [5], [6], [7]. Such trackers
aim to utilize the detector’s latent space to infer additional in-
formation about a tracked object, rather than relying on low-
dimensional bounding boxes as input. However, they have
the drawback that they are unable to output an uncertainty,
as a conventional method would, e.g. tracking based on a
Kalman filter [8]. While deep learning based detectors and
trackers usually output a confidence score or class probability
1Robert Bosch GmbH, Corporate Research, 71272 Renningen, Germany,
{firstname.lastname}@de.bosch.com
2Institute of Measurement, Control and Microtechnology, Ulm Univer-
sity, Germany, {firstname.lastname}@uni-ulm.de
Fig. 1. Example of estimated bounding boxes with their respective attention
covariance matrices, pictured as ellipses. Ground truth boxes are denoted by
dotted grey lines, while estimated boxes, attention weights, attention mean
and ellipses are colored. Excerpt from the birds-eye-view grid at a distance
of 30 to 50 meters from the ego vehicle.
score per estimated object, these generally can not be used
as a reliable uncertainty measure, but additional measures
are necessary to capture uncertainty [9].
One approach towards joint object detection and tracking
is the usage of transformer models [10], which were able
to achieve state-of-the-art results in some domains [11],
[6]. Transformers are based on attention, i.e. the interaction
between input tokens, which is why these models allow for
a unique insight into their reasoning: One can visualize the
attention matrices that are computed in each model forward
pass and investigate which part of the input data was used
to generate a certain output. In previous work, we developed
a transformer based model for detection and tracking [7] in
the context of autonomous driving that operates on (lidar)
point clouds. An example of visualized attention weights
from the tracking model are pictured in Figure 1. In empirical
observations, a more focused attention tends to lead to a more
accurate detection. Therefore, we investigate whether the
attention weight distribution can give insights into a detection
uncertainty in this paper. An uncertainty indicator would be
very valuable towards the ability to use transformer based
arXiv:2210.14391v1 [cs.CV] 26 Oct 2022
摘要:

CanTransformerAttentionSpreadGiveInsightsIntoUncertaintyofDetectedandTrackedObjects?FeliciaRuppel1;2,FlorianFaion1,ClaudiusGl¨aser1andKlausDietmayer2Abstract—Transformershaverecentlybeenutilizedtoper-formobjectdetectionandtrackinginthecontextofautonomousdriving.Oneuniquecharacteristicofthesemodelsis...

展开>> 收起<<
Can Transformer Attention Spread Give Insights Into Uncertainty of Detected and Tracked Objects Felicia Ruppel12 Florian Faion1 Claudius Gl aser1and Klaus Dietmayer2.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:469.92KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注