CD-FSOD A BENCHMARK FOR CROSS-DOMAIN FEW-SHOT OBJECT DETECTION Wuti Xiong Center for Machine Vision and Signal Analysis University of Oulu Finland

2025-04-27 0 0 2.25MB 5 页 10玖币

侵权投诉

CD-FSOD: A BENCHMARK FOR CROSS-DOMAIN FEW-SHOT OBJECT DETECTION

Wuti Xiong

Center for Machine Vision and Signal Analysis, University of Oulu, Finland

wuti.xiong@oulu.fi

ABSTRACT

In this paper, we propose a study of the cross-domain few-

shot object detection (CD-FSOD) benchmark, consisting of

image data from a diverse data domain. On the proposed

benchmark, we evaluate state-of-art FSOD approaches, in-

cluding meta-learning FSOD approaches and ﬁne-tuning

FSOD approaches. The results show that these methods tend

to fall, and even underperform the naive ﬁne-tuning model.

We analyze the reasons for their failure and introduce a strong

baseline that uses a mutually-beneﬁcial manner to alleviate

the overﬁtting problem. Our approach is remarkably superior

to existing approaches by signiﬁcant margins (2.0% on av-

erage) on the proposed benchmark. Our code is available at

https://github.com/FSOD/CD-FSOD.

Index Terms—Few-shot Object Detection, Cross-domain.

1. INTRODUCTION

Few-shot object detection (FSOD) aims to detect novel

classes of objects with a few annotated instances. In the

previous FSOD setting [1,2], a detector is pre-training on the

source dataset consisting of base classes and then transferred

into the target dataset consisting of novel classes with few

instances, where base classes and novel classes are disjoint

but share similar data domains. However, this underlying

assumption does not apply to some real-world scenarios be-

cause it is difﬁcult or impossible to collect a sufﬁcient amount

of data in these domains. This leads to a new FSOD prob-

lem, where the detector must resort to pre-training in the

base classes from a different domain. In these cases, even

humans have trouble recognizing new categories that vary

too greatly between examples or differ from prior experience

[3,4]. Thus, ﬁnding new approaches to tackle the problem

remains a challenging but desirable goal.

Although conventional FSOD benchmarks [1,2] are well

established, no works study FSOD across different domains.

To ﬁll this gap, In this paper, we introduce the study of Cross-

Domain Few-Shot Object Detection (CD-FSOD) benchmark

(As shown in Figure 1), which covers three target datasets:

ArTaxOr [6], UODD [7] and DIOR [8]. On the proposed

benchmark, we conduct extensive experiments to evaluate ex-

isting FSOD approaches (including meta-learning approaches

Source Domain

Decreasing similarity to MS COCO

Target Domains (Disjoint label spaces)

MS COCO ArTaxOr UODD DIOR

Fig. 1: The CD-FSOD benchmark. MS COCO [5] is used

for source training, and domains of varying dissimilarity from

natural images are used for target evaluation.

[2,9,10] and ﬁne-tuning approaches [11,12,13]). The results

show that existing FSOD approaches can not achieve satis-

factory performance and even underperform the naive ﬁne-

tuning model due to freezing parameters. Even without freez-

ing parameters, ﬁne-tuning methods struggle to outperform

the naive transfer model while meta-learning methods still

fail. This ﬁnding shows that existing FSOD methods cannot

work for CD-FSOD, and there is an urgent need to develop

new methods.

Besides, we introduce a novel distillation-based baseline,

which enable a “ﬂywheel effect” that the student and teacher

can mutually reinforce each other so that both get better and

better as the training goes on. Speciﬁcally, EMA (Exponential

Moving Average) enables the teacher model to ensemble the

student models in different time steps. The student’s weights

are optimized by the distillation loss between the pseudo-

labels generated by the teacher and the predictions by the stu-

dents on the same image. Our approach outperforms existing

FSOD approaches by a large margin on the proposed bench-

mark. In summary, our main contributions are as follows:

(1)we established the CD-FSOD benchmark, where there is

a very large domain difference between the base and target

datasets; 2) on the proposed benchmark, we evaluate existing

FSOD approaches, and analyze the reasons for their failure;

3) we introduce a strong baseline that achieves state-of-the-art

performance on the proposed benchmark.

arXiv:2210.05311v3 [cs.CV] 3 May 2023

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CD-FSOD:ABENCHMARKFORCROSS-DOMAINFEW-SHOTOBJECTDETECTIONWutiXiongCenterforMachineVisionandSignalAnalysis,UniversityofOulu,Finlandwuti.xiong@oulu.fiABSTRACTInthispaper,weproposeastudyofthecross-domainfew-shotobjectdetection(CD-FSOD)benchmark,consistingofimagedatafromadiversedatadomain.Ontheproposedbe...

展开>> 收起<<

CD-FSOD A BENCHMARK FOR CROSS-DOMAIN FEW-SHOT OBJECT DETECTION Wuti Xiong Center for Machine Vision and Signal Analysis University of Oulu Finland.pdf

共5页,预览1页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

CD-FSOD A BENCHMARK FOR CROSS-DOMAIN FEW-SHOT OBJECT DETECTION Wuti Xiong Center for Machine Vision and Signal Analysis University of Oulu Finland

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: