Self-Supervised Pretraining on Satellite Imagery a Case Study on Label-Efficient Vehicle Detection

2025-05-03 0 0 3.11MB 9 页 10玖币
侵权投诉
Self-Supervised Pretraining on Satellite
Imagery: a Case Study on Label-Efficient
Vehicle Detection
1st Jules Bourcier
Preligens (ex-Earthcube)
Paris, France
Inria, Univ. Grenoble Alpes,
CNRS, Grenoble INP, LJK
Grenoble, France
jules.bourcier@preligens.com
2nd Thomas Floquet
Preligens (ex-Earthcube)
Paris, France
MINES Paris - PSL University
Paris, France
thomas.floquet@preligens.com
3rd Gohar Dashyan
Preligens (ex-Earthcube)
Paris, France
gohar.dashyan@preligens.com
4th Tugdual Ceillier
Preligens (ex-Earthcube)
Paris, France
tugdual.ceillier@preligens.com
5th Karteek Alahari
Inria, Univ. Grenoble Alpes,
CNRS, Grenoble INP, LJK
Grenoble, France
6th Jocelyn Chanussot
Inria, Univ. Grenoble Alpes,
CNRS, Grenoble INP, LJK
Grenoble, France
Abstract—In defense-related remote sensing appli-
cations, such as vehicle detection on satellite imagery,
supervised learning requires a huge number of la-
beled examples to reach operational performances.
Such data are challenging to obtain as it requires
military experts, and some observables are intrinsi-
cally rare. This limited labeling capability, as well
as the large number of unlabeled images available
due to the growing number of sensors, make object
detection on remote sensing imagery highly relevant
for self-supervised learning. We study in-domain self-
supervised representation learning for object detection
on very high resolution optical satellite imagery, that
is yet poorly explored. For the first time to our
knowledge, we study the problem of label efficiency
on this task. We use the large land use classification
dataset Functional Map of the World to pretrain
representations with an extension of the Momentum
Contrast framework. We then investigate this model’s
transferability on a real-world task of fine-grained
vehicle detection and classification on Preligens pro-
prietary data, which is designed to be representative
of an operational use case of strategic site surveillance.
We show that our in-domain self-supervised learning
model is competitive with ImageNet pretraining, and
outperforms it in the low-label regime.
Index Terms—deep learning, computer vision, re-
mote sensing, self-supervised learning, object detec-
tion, land use classification, label-efficient learning
I. INTRODUCTION
Very high resolution (VHR) satellite imagery is
one of the key data from which geospatial intel-
ligence can be gathered. It is an essential tool to
detect and identify a wide range of objects, on very
large areas and on a very frequent basis. Recently,
we have seen the multiplication of available sensors,
which has led to a large increase in the volume
of data available. This makes it very challenging
for human analysts to exploit these data without re-
sorting to automatic solutions. Deep learning tech-
niques today have been highly effective to perform
such tasks. However, training those models requires
very large labeled datasets. Annotating objects of
interest in VHR images can prove to be very
costly, being both difficult and time-consuming, and
requiring fine domain expertise. In specific contexts
such as in geospatial intelligence, the targets can
arXiv:2210.11815v1 [cs.CV] 21 Oct 2022
be intrinsically rare, difficult to localize and to
identify accurately. This makes the acquisition of
thousands of examples impractical, as is typically
required for classic supervised deep learning meth-
ods to generalize. Consequently, a major challenge
is the development of label-efficient approaches, i.e.
models that are able to learn with few annotated
examples.
To reduce the number of training samples for
difficult vision tasks such as object detection, trans-
fer learning of pretrained neural networks is used
extensively. The idea is to reuse a network trained
upstream on a large, diverse source dataset. Ima-
geNet [18] has become the de facto standard for
pretraining: due to its large-scale and genericity,
ImageNet-pretrained models show to be adaptable
beyond their source domain, including remote sens-
ing imagery [16]. Nonetheless, the domain gap be-
tween ImageNet and remote sensing domains brings
questions about the limitations of this transfer when
there are very few samples on the task at hand,
e.g. the detection of rare observables from satellite
images. To fit the distributions of downstream tasks
with maximum efficiency, one would ideally use
generic in-domain representations, obtained by pre-
training on large amounts of remote sensing data.
This is infeasible in the remote sensing domain due
to the difficulty of curating and labeling these data
at the scale of ImageNet. However, imaging satel-
lites provide an ever-growing amount of unlabeled
data, which makes it highly relevant for learning
visual representations in an unsupervised way.
Self-supervised learning (SSL) has recently
emerged as an effective paradigm for learning repre-
sentations on unlabeled data. It uses unlabeled data
as a supervision signal, by solving a pretext task on
these input data, in order to learn semantic represen-
tations. A model trained in a self-supervised fashion
can then be transferred using the same methods as
a network pretrained on a downstream supervised
task. In the last two years, SSL has shown impres-
sive results that closed the gap or even outperformed
supervised learning for multiple benchmarks [2],
[3], [7], [8]. Recently, SSL has been applied in the
remote sensing domain to exploit readily-available
unlabeled data, and was shown to reduce or even
close the gap with transfer from ImageNet [1], [15],
[25]. Nonetheless, the capacity of these methods to
generalize from few labels was not been explored on
the important problem of object detection in VHR
satellite images.
In this paper, we explore in-domain self-
supervised representation learning for the task of
object detection on VHR optical satellite imagery.
We use the large land use classification dataset
Functional Map of the World (fMoW) [5] to pretrain
representations using the unsupervised framework
of MoCo [8]. We then investigate the transferability
on a difficult real-world task of fine-grained vehicle
detection on proprietary data, which is designed
to be representative of an operational use case of
strategic site surveillance. Our contributions are:
We apply a method based on MoCo with
temporal positives [1] to learn self-supervised
representations of remote sensing images, that
we improve using (i) additional augmentations
for rotational invariance; (ii) a fixed loss func-
tion that removes the false temporal negatives
in the learning process.
We investigate the benefit of in-domain self-
supervised pretraining as a function of the
annotation effort, using different budgets of
annotated instances for detecting vehicles.
• We show that our method is better than or
at least competitive with supervised ImageNet
pretraining, despite using no upstream labels
and 3× less upstream data.
Furthermore, our in-domain SSL model is more
label-efficient than ImageNet: when using very
limited annotations budgets ('20 images totalling
'12k observables), we outperform ImageNet pre-
training by 4 points AP on vehicle detection and
0.5 point mAP on joint detection and classification.
II. RELATED WORK
A. Self supervised representation learning
SSL methods use unlabeled data to learn repre-
sentations that are transferable to downstream tasks
(e.g. image classification or object detection) for
which annotated data samples are insufficient. In
recent years, these methods have been successfully
applied to computer vision with impressive results
摘要:

Self-SupervisedPretrainingonSatelliteImagery:aCaseStudyonLabel-EfcientVehicleDetection1stJulesBourcierPreligens(ex-Earthcube)Paris,FranceInria,Univ.GrenobleAlpes,CNRS,GrenobleINP,LJKGrenoble,Francejules.bourcier@preligens.com2ndThomasFloquetPreligens(ex-Earthcube)Paris,FranceMINESParis-PSLUniversit...

展开>> 收起<<
Self-Supervised Pretraining on Satellite Imagery a Case Study on Label-Efficient Vehicle Detection.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:3.11MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注