ENSEMBLEMOT A STEP TOWARDS ENSEMBLE LEARNING OF MULTIPLE OBJECT TRACKING Yunhao Du1 Zihang Liu1 Fei Su12

2025-04-24 0 0 1.35MB 5 页 10玖币
侵权投诉
ENSEMBLEMOT: A STEP TOWARDS ENSEMBLE LEARNING OF MULTIPLE OBJECT
TRACKING
Yunhao Du1, Zihang Liu1, Fei Su1,2
1Beijing University of Posts and Telecommunications
2Beijing Key Laboratory of Network System and Network Culture, China
{dyh bupt,henry0820,sufei}@bupt.edu.cn
ABSTRACT
Multiple Object Tracking (MOT) has rapidly progressed in
recent years. Existing works tend to design a single tracking
algorithm to perform both detection and association. Though
ensemble learning has been exploited in many tasks, i.e, clas-
sification and object detection, it hasn’t been studied in the
MOT task, which is mainly caused by its complexity and
evaluation metrics. In this paper, we propose a simple but
effective ensemble method for MOT, called EnsembleMOT,
which merges multiple tracking results from various trackers
with spatio-temporal constraints. Meanwhile, several post-
processing procedures are applied to filter out abnormal re-
sults. Our method is model-independent and doesn’t need
the learning procedure. What’s more, it can easily work in
conjunction with other algorithms, e.g., tracklets interpola-
tion. Experiments on the MOT17 dataset demonstrate the ef-
fectiveness of the proposed method. Codes are available at
https://github.com/dyhBUPT/EnsembleMOT.
Index TermsMultiple Object Tracking, Ensemble
Learning
1. INTRODUCTION
Multiple Object Tracking (MOT) aims to detect and track all
specific classes of objects frame by frame, which plays an es-
sential role in video analysis and understanding. In the past
few years, the MOT task is dominated by the tracking-by-
detection (TBD) paradigm [3,4], which performs detection
per frame and formulates the MOT problem as a data associa-
tion task. Recently, some works integrate the detector and em-
bedding model (i.e., appearance or motion embedding) into a
unified framework, which can benefit from multi-task learn-
ing and tend to achieve a better speed-accuracy trade-off [1,
5].
Ensemble learning [6] generally refers to training and/or
combining multiple models, which is widely used in machine
learning [7,8,9,10] and computer vision [11,12,13,14].
For example, for image classification, Wortsman et al. pro-
poses Model Soups to average weights of multiple models to
improve the classification accuracy [11]. To estimate more
stable and accurate pseudo labels for semi-supervised image
classification, Temporal Ensembling [12] aggregates the pre-
dictions of multiple previous network evaluations into an en-
semble prediction. For the object detection task, Soft-NMS
[13] and WBF [14] are widely used to combine results from
multiple detectors.
Ensemble methods are also used in several MOT works.
Peng et al. proposes the Layer-wise Aggregation Discrimina-
tive Model (LADM) [15], which uses the weighted average
of predictions from three softmax layers to judge whether a
detection box represents a person or not. However, it works in
the detection procedure, and is essentially not for the tracking
algorithm. Inspired by SoftNMS, TrackNMS is designed in
GIAOTracker [16] to fuse multiple tracking results. It first
sorts trajectories by the average confidence scores, and then
performs non-maximum suppression (NMS) based on the
temporal IoU. Though it is designed for combining multi-
ple trackers, it is evaluated by the score-based metrics mAP
[17], in which redundant low-score results can benefit perfor-
mance. Instead, the instance-based metrics, i.e., MOTA [18],
IDF1[19] and HOTA [20], are more common and reasonable
evaluation metrics for the MOT task.
To sum up, ensemble methods used in the MOT task are
still not well exploited. We summarize the reasons as follow-
ing:
MOT is a complex downstream task. The diversity and
complexity of various tracking algorithms makes it dif-
ficult to design a general and effective ensemble algo-
rithm.
The tracking results are temporal sequences, not just
classification scores or detection bounding boxes (bboxes).
Therefore, intuitive methods like voting can’t be di-
rectly applied.
The widely used metrics are instance-based. Compared
with score-based metrics (e.g., mAP) in image classifi-
cation and object detection, the instance-based metrics
have no tolerance for redundant results, which intro-
duces greater risk to ensemble methods.
arXiv:2210.05278v2 [cs.CV] 17 Feb 2023
摘要:

ENSEMBLEMOT:ASTEPTOWARDSENSEMBLELEARNINGOFMULTIPLEOBJECTTRACKINGYunhaoDu1,ZihangLiu1,FeiSu1;21BeijingUniversityofPostsandTelecommunications2BeijingKeyLaboratoryofNetworkSystemandNetworkCulture,Chinafdyhbupt,henry0820,sufeig@bupt.edu.cnABSTRACTMultipleObjectTracking(MOT)hasrapidlyprogressedinrecentye...

收起<<
ENSEMBLEMOT A STEP TOWARDS ENSEMBLE LEARNING OF MULTIPLE OBJECT TRACKING Yunhao Du1 Zihang Liu1 Fei Su12.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:1.35MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注