ENSEMBLEMOT A STEP TOWARDS ENSEMBLE LEARNING OF MULTIPLE OBJECT TRACKING Yunhao Du1 Zihang Liu1 Fei Su12

2025-04-24 1 0 1.35MB 5 页 10玖币

侵权投诉

ENSEMBLEMOT: A STEP TOWARDS ENSEMBLE LEARNING OF MULTIPLE OBJECT

TRACKING

Yunhao Du1, Zihang Liu1, Fei Su1,2

1Beijing University of Posts and Telecommunications

2Beijing Key Laboratory of Network System and Network Culture, China

{dyh bupt,henry0820,sufei}@bupt.edu.cn

ABSTRACT

Multiple Object Tracking (MOT) has rapidly progressed in

recent years. Existing works tend to design a single tracking

algorithm to perform both detection and association. Though

ensemble learning has been exploited in many tasks, i.e, clas-

siﬁcation and object detection, it hasn’t been studied in the

MOT task, which is mainly caused by its complexity and

evaluation metrics. In this paper, we propose a simple but

effective ensemble method for MOT, called EnsembleMOT,

which merges multiple tracking results from various trackers

with spatio-temporal constraints. Meanwhile, several post-

processing procedures are applied to ﬁlter out abnormal re-

sults. Our method is model-independent and doesn’t need

the learning procedure. What’s more, it can easily work in

conjunction with other algorithms, e.g., tracklets interpola-

tion. Experiments on the MOT17 dataset demonstrate the ef-

fectiveness of the proposed method. Codes are available at

https://github.com/dyhBUPT/EnsembleMOT.

Index Terms—Multiple Object Tracking, Ensemble

Learning

1. INTRODUCTION

Multiple Object Tracking (MOT) aims to detect and track all

speciﬁc classes of objects frame by frame, which plays an es-

sential role in video analysis and understanding. In the past

few years, the MOT task is dominated by the tracking-by-

detection (TBD) paradigm [3,4], which performs detection

per frame and formulates the MOT problem as a data associa-

tion task. Recently, some works integrate the detector and em-

bedding model (i.e., appearance or motion embedding) into a

uniﬁed framework, which can beneﬁt from multi-task learn-

ing and tend to achieve a better speed-accuracy trade-off [1,

5].

Ensemble learning [6] generally refers to training and/or

combining multiple models, which is widely used in machine

learning [7,8,9,10] and computer vision [11,12,13,14].

For example, for image classiﬁcation, Wortsman et al. pro-

poses Model Soups to average weights of multiple models to

improve the classiﬁcation accuracy [11]. To estimate more

stable and accurate pseudo labels for semi-supervised image

classiﬁcation, Temporal Ensembling [12] aggregates the pre-

dictions of multiple previous network evaluations into an en-

semble prediction. For the object detection task, Soft-NMS

[13] and WBF [14] are widely used to combine results from

multiple detectors.

Ensemble methods are also used in several MOT works.

Peng et al. proposes the Layer-wise Aggregation Discrimina-

tive Model (LADM) [15], which uses the weighted average

of predictions from three softmax layers to judge whether a

detection box represents a person or not. However, it works in

the detection procedure, and is essentially not for the tracking

algorithm. Inspired by SoftNMS, TrackNMS is designed in

GIAOTracker [16] to fuse multiple tracking results. It ﬁrst

sorts trajectories by the average conﬁdence scores, and then

performs non-maximum suppression (NMS) based on the

temporal IoU. Though it is designed for combining multi-

ple trackers, it is evaluated by the score-based metrics mAP

[17], in which redundant low-score results can beneﬁt perfor-

mance. Instead, the instance-based metrics, i.e., MOTA [18],

IDF1[19] and HOTA [20], are more common and reasonable

evaluation metrics for the MOT task.

To sum up, ensemble methods used in the MOT task are

still not well exploited. We summarize the reasons as follow-

ing:

• MOT is a complex downstream task. The diversity and

complexity of various tracking algorithms makes it dif-

ﬁcult to design a general and effective ensemble algo-

rithm.

• The tracking results are temporal sequences, not just

classiﬁcation scores or detection bounding boxes (bboxes).

Therefore, intuitive methods like voting can’t be di-

rectly applied.

• The widely used metrics are instance-based. Compared

with score-based metrics (e.g., mAP) in image classiﬁ-

cation and object detection, the instance-based metrics

have no tolerance for redundant results, which intro-

duces greater risk to ensemble methods.

arXiv:2210.05278v2 [cs.CV] 17 Feb 2023

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ENSEMBLEMOT:ASTEPTOWARDSENSEMBLELEARNINGOFMULTIPLEOBJECTTRACKINGYunhaoDu1,ZihangLiu1,FeiSu1;21BeijingUniversityofPostsandTelecommunications2BeijingKeyLaboratoryofNetworkSystemandNetworkCulture,Chinafdyhbupt,henry0820,sufeig@bupt.edu.cnABSTRACTMultipleObjectTracking(MOT)hasrapidlyprogressedinrecentye...

展开>> 收起<<

ENSEMBLEMOT A STEP TOWARDS ENSEMBLE LEARNING OF MULTIPLE OBJECT TRACKING Yunhao Du1 Zihang Liu1 Fei Su12.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

ENSEMBLEMOT A STEP TOWARDS ENSEMBLE LEARNING OF MULTIPLE OBJECT TRACKING Yunhao Du1 Zihang Liu1 Fei Su12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: