Self-Supervised Pretraining on Satellite Imagery a Case Study on Label-Efﬁcient Vehicle Detection

2025-05-03 0 0 3.11MB 9 页 10玖币

侵权投诉

Self-Supervised Pretraining on Satellite

Imagery: a Case Study on Label-Efﬁcient

Vehicle Detection

1st Jules Bourcier

Preligens (ex-Earthcube)

Paris, France

Inria, Univ. Grenoble Alpes,

CNRS, Grenoble INP, LJK

Grenoble, France

jules.bourcier@preligens.com

2nd Thomas Floquet

Preligens (ex-Earthcube)

Paris, France

MINES Paris - PSL University

Paris, France

thomas.ﬂoquet@preligens.com

3rd Gohar Dashyan

Preligens (ex-Earthcube)

Paris, France

gohar.dashyan@preligens.com

4th Tugdual Ceillier

Preligens (ex-Earthcube)

Paris, France

tugdual.ceillier@preligens.com

5th Karteek Alahari

Inria, Univ. Grenoble Alpes,

CNRS, Grenoble INP, LJK

Grenoble, France

6th Jocelyn Chanussot

Inria, Univ. Grenoble Alpes,

CNRS, Grenoble INP, LJK

Grenoble, France

Abstract—In defense-related remote sensing appli-

cations, such as vehicle detection on satellite imagery,

supervised learning requires a huge number of la-

beled examples to reach operational performances.

Such data are challenging to obtain as it requires

military experts, and some observables are intrinsi-

cally rare. This limited labeling capability, as well

as the large number of unlabeled images available

due to the growing number of sensors, make object

detection on remote sensing imagery highly relevant

for self-supervised learning. We study in-domain self-

supervised representation learning for object detection

on very high resolution optical satellite imagery, that

is yet poorly explored. For the ﬁrst time to our

knowledge, we study the problem of label efﬁciency

on this task. We use the large land use classiﬁcation

dataset Functional Map of the World to pretrain

representations with an extension of the Momentum

Contrast framework. We then investigate this model’s

transferability on a real-world task of ﬁne-grained

vehicle detection and classiﬁcation on Preligens pro-

prietary data, which is designed to be representative

of an operational use case of strategic site surveillance.

We show that our in-domain self-supervised learning

model is competitive with ImageNet pretraining, and

outperforms it in the low-label regime.

Index Terms—deep learning, computer vision, re-

mote sensing, self-supervised learning, object detec-

tion, land use classiﬁcation, label-efﬁcient learning

I. INTRODUCTION

Very high resolution (VHR) satellite imagery is

one of the key data from which geospatial intel-

ligence can be gathered. It is an essential tool to

detect and identify a wide range of objects, on very

large areas and on a very frequent basis. Recently,

we have seen the multiplication of available sensors,

which has led to a large increase in the volume

of data available. This makes it very challenging

for human analysts to exploit these data without re-

sorting to automatic solutions. Deep learning tech-

niques today have been highly effective to perform

such tasks. However, training those models requires

very large labeled datasets. Annotating objects of

interest in VHR images can prove to be very

costly, being both difﬁcult and time-consuming, and

requiring ﬁne domain expertise. In speciﬁc contexts

such as in geospatial intelligence, the targets can

arXiv:2210.11815v1 [cs.CV] 21 Oct 2022

be intrinsically rare, difﬁcult to localize and to

identify accurately. This makes the acquisition of

thousands of examples impractical, as is typically

required for classic supervised deep learning meth-

ods to generalize. Consequently, a major challenge

is the development of label-efﬁcient approaches, i.e.

models that are able to learn with few annotated

examples.

To reduce the number of training samples for

difﬁcult vision tasks such as object detection, trans-

fer learning of pretrained neural networks is used

extensively. The idea is to reuse a network trained

upstream on a large, diverse source dataset. Ima-

geNet [18] has become the de facto standard for

pretraining: due to its large-scale and genericity,

ImageNet-pretrained models show to be adaptable

beyond their source domain, including remote sens-

ing imagery [16]. Nonetheless, the domain gap be-

tween ImageNet and remote sensing domains brings

questions about the limitations of this transfer when

there are very few samples on the task at hand,

e.g. the detection of rare observables from satellite

images. To ﬁt the distributions of downstream tasks

with maximum efﬁciency, one would ideally use

generic in-domain representations, obtained by pre-

training on large amounts of remote sensing data.

This is infeasible in the remote sensing domain due

to the difﬁculty of curating and labeling these data

at the scale of ImageNet. However, imaging satel-

lites provide an ever-growing amount of unlabeled

data, which makes it highly relevant for learning

visual representations in an unsupervised way.

Self-supervised learning (SSL) has recently

emerged as an effective paradigm for learning repre-

sentations on unlabeled data. It uses unlabeled data

as a supervision signal, by solving a pretext task on

these input data, in order to learn semantic represen-

tations. A model trained in a self-supervised fashion

can then be transferred using the same methods as

a network pretrained on a downstream supervised

task. In the last two years, SSL has shown impres-

sive results that closed the gap or even outperformed

supervised learning for multiple benchmarks [2],

[3], [7], [8]. Recently, SSL has been applied in the

remote sensing domain to exploit readily-available

unlabeled data, and was shown to reduce or even

close the gap with transfer from ImageNet [1], [15],

[25]. Nonetheless, the capacity of these methods to

generalize from few labels was not been explored on

the important problem of object detection in VHR

satellite images.

In this paper, we explore in-domain self-

supervised representation learning for the task of

object detection on VHR optical satellite imagery.

We use the large land use classiﬁcation dataset

Functional Map of the World (fMoW) [5] to pretrain

representations using the unsupervised framework

of MoCo [8]. We then investigate the transferability

on a difﬁcult real-world task of ﬁne-grained vehicle

detection on proprietary data, which is designed

to be representative of an operational use case of

strategic site surveillance. Our contributions are:

• We apply a method based on MoCo with

temporal positives [1] to learn self-supervised

representations of remote sensing images, that

we improve using (i) additional augmentations

for rotational invariance; (ii) a ﬁxed loss func-

tion that removes the false temporal negatives

in the learning process.

• We investigate the beneﬁt of in-domain self-

supervised pretraining as a function of the

annotation effort, using different budgets of

annotated instances for detecting vehicles.

• We show that our method is better than or

at least competitive with supervised ImageNet

pretraining, despite using no upstream labels

and 3× less upstream data.

Furthermore, our in-domain SSL model is more

label-efﬁcient than ImageNet: when using very

limited annotations budgets ('20 images totalling

'12k observables), we outperform ImageNet pre-

training by 4 points AP on vehicle detection and

0.5 point mAP on joint detection and classiﬁcation.

II. RELATED WORK

A. Self supervised representation learning

SSL methods use unlabeled data to learn repre-

sentations that are transferable to downstream tasks

(e.g. image classiﬁcation or object detection) for

which annotated data samples are insufﬁcient. In

recent years, these methods have been successfully

applied to computer vision with impressive results

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Self-SupervisedPretrainingonSatelliteImagery:aCaseStudyonLabel-EfcientVehicleDetection1stJulesBourcierPreligens(ex-Earthcube)Paris,FranceInria,Univ.GrenobleAlpes,CNRS,GrenobleINP,LJKGrenoble,Francejules.bourcier@preligens.com2ndThomasFloquetPreligens(ex-Earthcube)Paris,FranceMINESParis-PSLUniversit...

展开>> 收起<<

Self-Supervised Pretraining on Satellite Imagery a Case Study on Label-Efﬁcient Vehicle Detection.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Self-Supervised Pretraining on Satellite Imagery a Case Study on Label-Efﬁcient Vehicle Detection

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: