1 Heterogeneous Feature Distillation Network for SAR Image Semantic Segmentation

2025-04-30 0 0 1.59MB 11 页 10玖币
侵权投诉
1
Heterogeneous Feature Distillation Network for
SAR Image Semantic Segmentation
Mengyu Gao and Qiulei Dong
Abstract—Semantic segmentation for SAR (Synthetic Aperture
Radar) images has attracted increasing attention in the remote
sensing community recently, due to SAR’s all-time and all-
weather imaging capability. However, SAR images are generally
more difficult to be segmented than their EO (Electro-Optical)
counterparts, since speckle noises and layovers are inevitably
involved in SAR images. To address this problem, we investi-
gate how to introduce EO features to assist the training of a
SAR-segmentation model, and propose a heterogeneous feature
distillation network for segmenting SAR images, called HFD-
Net, where a SAR-segmentation student model gains knowledge
from a pre-trained EO-segmentation teacher model. In the
proposed HFD-Net, both the student and teacher models employ
an identical architecture but different parameter configurations,
and a heterogeneous feature distillation model is explored for
transferring latent EO features from the teacher model to the
student model and then enhancing the ability of the student
model for SAR image segmentation. In addition, a heterogeneous
feature alignment module is explored to aggregate multi-scale
features for segmentation in each of the student model and
teacher model. Extensive experimental results on two public
datasets demonstrate that the proposed HFD-Net outperforms
seven state-of-the-art SAR image semantic segmentation methods.
Index Terms—SAR image semantic segmentation, heteroge-
neous feature distillation, heterogeneous feature alignment.
I. INTRODUCTION
SAR (Synthetic Aperture Radar) image semantic segmenta-
tion, which aims at automatically assigning a label to each
pixel, is a challenging topic in the remote sensing community.
In recent years, it has played an important role in various tasks,
such as oil spill detection [5], glacial melting detection [12],
geological hazard assessment [23], etc.
According to whether deep neural networks (DNN) are
used or not, the existing semantic segmentation methods for
SAR images could be divided into two categories: traditional
segmentation methods and DNN-based segmentation methods.
Traditional SAR segmentation methods usually utilize some
traditional machine learning techniques to segment SAR im-
ages, such as markov random field [2] and probabilistic model
[37], while DNN-based SAR segmentation methods generally
employ various neural network architectures for handling the
SAR segmentation task [3], [9], [22], [33], [44].
Corresponding author: Qiulei Dong
Mengyu Gao and Qiulei Dong are with the National Laboratory of
Pattern Recognition, Institute of Automation, Chinese Academy of Sciences,
Beijing 100190, China, also with the School of Artificial Intelligence,
University of Chinese Academy of Sciences, Beijing 100049, China, and
also with the Center for Excellence in Brain Science and Intelligence
Technology, Chinese Academy of Sciences, Beijing 100190, China (e-mail:
gaomengyu2021@ia.ac.cn; qldong@nlpr.ia.ac.cn).
Fig. 1. EO images’ influence on the performance of the typical multi-modal
method MCANet [28] for segmenting buildings, which is trained with 2401
pairs of SAR and EO images from the SpaceNet6 dataset [47]. MCANet is
tested by utilizing two kinds of input image pairs respectively, including (i)
a pair of SAR image and its corresponding EO image (denoted in red dotted
line); (ii) a pair of SAR image and an all-black image which indicates no
EO scene image is available (denoted in blue dotted line): (a) A testing SAR
image; (b) The corresponding EO image to (a); (c) Predicted segmentation
map with the pair of SAR (a) and EO (b) images; (d) An all-black image;
(e) Predicted segmentation map with the pair of SAR (a) and all-black (e)
images; (f) Ground truth segmentation map. As is seen, once EO image is
not used, the segmentation performance of MCANet degraded significantly.
Recently, due to the rapid development of deep learning,
DNN-based SAR segmentation methods have attracted in-
creasing attention. A straightforward strategy is to fine-tune
the existing EO (Electro-Optical) segmentation networks with
SAR images [3], [44], inspired by the DNN’s success in
segmenting EO images [14], [42]. However, such a strategy
only has limited effectiveness, due to the fact that SAR and
EO images have different imaging mechanisms. To allevi-
ate this problem, some works [26], [48] for SAR semantic
segmentation have been proposed by utilizing the intrinsic
characteristics in SAR images. But it is still difficult for
these methods to achieve competitive segmentation results in
comparison to the EO image segmentation results by some
state-of-the-art EO segmentation methods [6], [29] because of
two main reasons: (i) SAR images generally contain speckle
noises and layovers and (ii) EO images have more abundant
textures than SAR images.
The above issue further encourages researchers to design
multi-modal methods for simultaneously segmenting both
SAR and EO images [33], [43]. These multi-modal methods
use pairs of SAR and EO images not only for training but also
for testing. However, once they segment SAR images singly
without EO images, their segmentation performances would be
degraded significantly. Fig. 1 illustrates this issue by evaluating
arXiv:2210.08988v1 [cs.CV] 17 Oct 2022
2
a typical multi-modal method MCANet [28] for segmenting
buildings from the public dataset SpaceNet6 [47] (more results
by the other multi-modal methods [4], [20] in Fig. 5 are also
consistent with those by MCANet [28] in this figure). As seen
from this figure, MCANet [28] is firstly trained with 2401 pairs
of SAR and EO images from the SpaceNet6 dataset [47]. Then,
it is not only tested by utilizing a pair of SAR and EO images
as inputs, but also tested by utilizing a pair of SAR and all-
black images as inputs, indicating that only SAR images are
available for testing in this case. The segmentation result by
utilizing the input SAR image without its corresponding EO
image in Fig. 1(e) is significantly worse than that by utilizing
a pair of SAR and EO images in Fig. 1(c). Here, it has to
be further pointed out that it is not technically hard to collect
pairs of SAR and EO images for off-line training, but in many
real testing scenarios (e.g., night, cloud cover, etc.), it is only
available to capture SAR images, but generally impossible to
capture SAR images and their corresponding clear EO images
simultaneously. This motivates us to investigate the following
problem: How to jointly use SAR and EO images to train such
a SAR segmentation network that could segment SAR images
singly without EO images more effectively?
To address this problem, in this paper, we propose a
Heterogeneous Feature Distillation Network for single SAR
semantic segmentation, called HFD-Net, which is inspired by
the knowledge transfer ability of the knowledge distillation
technique in other visual tasks, such as object classification
[55], object detection [21], etc. The proposed HFD-Net, which
aims to learn heterogeneous features for segmenting SAR
images, consisting of a pre-trained teacher model for EO
image segmentation, a student model for SAR image segmen-
tation, and a designed heterogeneous feature distillation model
for knowledge transfer. The student model has an identical
architecture as its teacher model but a different parameter
configuration, and the heterogeneous feature distillation model
is designed for transferring latent EO features from the teacher
model to the student model so that the performance of the
student model on SAR image segmentation could be boosted.
Moreover, it is noted that various multi-scale feature aggre-
gation techniques have demonstrated their effectiveness for
learning more discriminative features in some other tasks (such
as monocular depth estimation [57] and visual localization
[31]), however, all these feature aggregation techniques could
only aggregate homogeneous features, but fail to deal with
heterogeneous features. Unlike them, a heterogeneous feature
alignment module is further designed in this work to aggregate
multi-scale heterogeneous features in the student model, which
is also used to aggregate multi-scale homogeneous features in
the teacher model.
In sum, our main contributions include:
We propose the HFD-Net for SAR image semantic seg-
mentation, which could learn heterogeneous EO/SAR
features through knowledge distillation. To our best
knowledge, this work is the first attempt to segment SAR
images with distilled heterogeneous features in the remote
sensing field.
We explore the heterogeneous feature distillation model
in the proposed HFD-Net, which could automatically
transfer knowledge from the EO-segmentation teacher
model to the SAR-segmentation student model.
We explore the heterogeneous feature alignment module.
Unlike the existing feature aggregation techniques [17]
[18], this module could not only aggregate multi-scale ho-
mogeneous features, but also multi-scale heterogeneous
features.
The rest of this paper is organized as follows: Section
II gives a review of the related work on DNN-based SAR
semantic segmentation. Section III describes the proposed
HFD-Net in detail. Extensive experimental results are reported
in Section IV. We conclude the paper in Section V.
II. RELATED WORK
In this section, we review the single-modal and multi-modal
DNN-based methods for SAR image semantic segmentation
respectively.
A. Single-modal Methods for SAR Semantic Segmentation
Single-modal methods for SAR semantic segmentation use
SAR images singly for both training and testing. Many early
DNN-based single-modal methods focused on fine-tuning the
existing EO segmentation networks with SAR images [3],
[10], [27], [36], [39] and [50]. Bianchi et al. [3] adopted
a Fully Convolutional Network (FCN) [46] to detect the
presence of avalanches by segmenting SAR images with snow
avalanche. Pham et al. [39] utilized SegNet [1] for pixel-
wise classification over very high resolution airborne PolSAR
images. Tom at al. [50] addressed lake ice detection using
Sentinel-1 SAR data by DeepLabv3+ [7]. Holzmann et al. [16]
introduced an attention mechanism into the typical U-Net [42]
for segmenting SAR images and Davari et al. [8] employed a
distance map in U-Net [42] to add contextual information.
Unlike the above methods that directly utilized the existing
EO segmentation networks, some methods investigated new
network architectures by introducing intrinsic characteristics
in SAR images [25], [32], [40], [51], [53] and [54]. Wang
et al. [53] proposed HR-SAR-Net under pyramid structure
with atrous convolution to extract magnitude information and
phase information separately in SAR images. Liu et al. [32]
developed a dark spot detection method based on super-pixels
deeper graph convolutional networks (SGDCN) to smooth
SAR image noises. Ristea et al. [40] applied sub-aperture
decomposition (SD) algorithm as a preprocessing stage for
an unsupervised oceanic SAR segmentation model to bring
additional information over the original vignette.
B. Multi-modal Methods for SAR Semantic Segmentation
The existing multi-modal methods for SAR semantic seg-
mentation use multi-modal data (e.g., pairs of EO and SAR
images) together for both training and testing [4], [11], [20],
[28], [49] and [52], considering that multi-modal data gener-
ally has more abundant information than single-modal data.
Sun et al. [49] employed building footprints to learn multi-
level visual features and normalize the features for predicting
3
building masks in SAR images. Li et al. [28] designed a multi-
modal cross attention network (MCANet) to extract multi-
scale attention maps by fusing SAR and EO images. Cha
et al. [4] formulated multi-modal representation learning in
contrastive multi-view coding by considering three modalities
(i.e., EO image, SAR image, and label mask) as different
data augmentation techniques. Jain et al. [20] proposed a
self-supervised method to learn invariant feature embeddings
between SAR images and multi-spectral images.
It is worth noting that it is not a hard job to collect pairs
of SAR and EO images for training these multi-modal seg-
mentation methods offline, however, when these multi-modal
methods are used for testing in many real cases (e.g., night,
cloud cover, etc.) where no clear EO images but only SAR
images could be captured, their performances would become
significantly worse as illustrated in Fig. 1 and discussed in
Section I. Unlike these multi-modal segmentation methods in
literature, the proposed HFD-Net in this work focuses on a
novel segmentation configuration, where pairs of SAR and EO
images are used for network training, but only SAR images
without EO images are used for testing.
III. METHODOLOGY
In this section, we propose the Heterogeneous Feature
Distillation Network (HFD-Net) for SAR image semantic
segmentation, where the heterogeneous feature distillation
model is explored for heterogeneous feature transfer and the
heterogeneous feature alignment module is explored for multi-
scale feature aggregation. Firstly, the architecture of the HFD-
Net is described. Then, we present the heterogeneous feature
distillation model and the heterogeneous feature alignment
module respectively in detail. Finally, the model training and
total loss function are introduced.
A. Architecture
The HFD-Net, whose architecture is shown in Fig. 2,
consists of a pre-trained teacher model for segmenting EO
images, a student model for segmenting SAR images, and
a designed heterogeneous feature distillation model (HFDM)
for transferring latent EO features from the teacher model
to the student model. The teacher model takes EO images
as its inputs, while the student model takes SAR images as
its inputs. The two models have an identical architecture,
which is composed of a backbone segmentation network and
a designed heterogeneous feature alignment module (HFAM)
for multi-scale feature aggregation, but different parameter
configurations. Here, we simply use the DeepLabv3+ [7] as the
backbone segmentation network, which consists of a typical
ResNet-101 [13] encoder and a three-block decoder.
At the training stage, the teacher model, whose inputs are
EO scene images, is firstly trained by implementing an EO
image segmentation task, and it is expected to extract EO
features that reserve some semantic information of the input
scene images. And the parameters of this teacher model would
be fixed after it has been trained. Then, the student model
with SAR images as inputs is trained by implementing a
SAR image segmentation task. In this training process of the
student model, both the ground truth segmentation maps and
the learned EO features from the teacher model are utilized
jointly as the supervision information to guide the student
model to learn heterogeneous features, by simultaneously
minimizing the basic segmentation loss LSand the designed
heterogeneous distillation loss LD.
At the testing stage, only the student model is used for
segmenting an arbitrary input SAR image without its corre-
sponding EO image. In the following subsections, the HFDM
and HFAM would be introduced respectively.
B. Heterogeneous Feature Distillation Model
The heterogeneous feature distillation model (HFDM) is
designed to transfer latent EO features which reserve semantic
information from the teacher model to the student model for
segmenting SAR images. It is noted that both the teacher
and student models in the existing knowledge distillation
techniques [15] and [41] generally deal with an identical task
with homogeneous images. Unlike these works, the teacher
and student models in the HFD-Net focus on two similar but
different tasks (one is EO image segmentation, while the other
one is SAR image segmentation), hence, we design a special
architecture with a heterogeneous distillation loss term for the
HFDM so that the EO knowledge from the teacher model
could be distilled to the SAR-segmentation student model, as
shown in Fig. 3.
As seen from Fig. 3, the HFDM consists of two Sigmoid
operators, two Tsoftmax operators and a designed heteroge-
neous distillation loss LD. Given an input pair of EO and
SAR images, we denote the extracted set of EO feature maps
from the D3block in the teacher model as qt={qt
c|qt
c
RH×W, c = 1,2, ..., Cq}where Cqis the number of channels
in the third decoder block D3and {H, W }is the size of each
feature map qt
c. Similarly, we also denote the extracted set of
SAR feature maps from the D3block in the student model as
qs={qs
c|qs
cRH×W, c = 1,2, ..., Cq}. The HFDM is used
to enforce the student model to output such SAR features qs
that are as similar to the extracted EO features qtfrom the
teacher model as possible.
Firstly, two identical Sigmoid operators are implemented
to normalize the elements in each feature map qc(indicating
both qt
cand qs
c) into (0,1) respectively. Then, each normalized
feature map ¯qc(indicating both ¯qt
cin ¯qtand ¯qs
cin ¯qs) is
transformed into a pixel-level probability map ˆqc(indicating
both ˆqt
cin ˆqtand ˆqs
cin ˆqs) respectively by implementing the
following Tsoftmax operator:
ˆqc= Tsoftmax(¯qc;T) = exp( ¯qc
T)
Cq
P
j=1
exp( ¯qj
T)
(1)
where Tis a preseted temperature constant (here, the Tsoftmax
function degrades to the commonly used Softmax function
when Tis set to 1, as illustrated in [15]). Finally, the
heterogeneous distillation loss LD, which measures the dif-
ference between the obtained pixel-level probability maps, is
designed to distill latent features from the EO-segmentation
teacher model to the SAR-segmentation student model. The
摘要:

1HeterogeneousFeatureDistillationNetworkforSARImageSemanticSegmentationMengyuGaoandQiuleiDongAbstract—SemanticsegmentationforSAR(SyntheticApertureRadar)imageshasattractedincreasingattentionintheremotesensingcommunityrecently,duetoSAR'sall-timeandall-weatherimagingcapability.However,SARimagesaregener...

展开>> 收起<<
1 Heterogeneous Feature Distillation Network for SAR Image Semantic Segmentation.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.59MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注