1 Heterogeneous Feature Distillation Network for SAR Image Semantic Segmentation

2025-04-30 0 0 1.59MB 11 页 10玖币

侵权投诉

Heterogeneous Feature Distillation Network for

SAR Image Semantic Segmentation

Mengyu Gao and Qiulei Dong

Abstract—Semantic segmentation for SAR (Synthetic Aperture

Radar) images has attracted increasing attention in the remote

sensing community recently, due to SAR’s all-time and all-

weather imaging capability. However, SAR images are generally

more difﬁcult to be segmented than their EO (Electro-Optical)

counterparts, since speckle noises and layovers are inevitably

involved in SAR images. To address this problem, we investi-

gate how to introduce EO features to assist the training of a

SAR-segmentation model, and propose a heterogeneous feature

distillation network for segmenting SAR images, called HFD-

Net, where a SAR-segmentation student model gains knowledge

from a pre-trained EO-segmentation teacher model. In the

proposed HFD-Net, both the student and teacher models employ

an identical architecture but different parameter conﬁgurations,

and a heterogeneous feature distillation model is explored for

transferring latent EO features from the teacher model to the

student model and then enhancing the ability of the student

model for SAR image segmentation. In addition, a heterogeneous

feature alignment module is explored to aggregate multi-scale

features for segmentation in each of the student model and

teacher model. Extensive experimental results on two public

datasets demonstrate that the proposed HFD-Net outperforms

seven state-of-the-art SAR image semantic segmentation methods.

Index Terms—SAR image semantic segmentation, heteroge-

neous feature distillation, heterogeneous feature alignment.

I. INTRODUCTION

SAR (Synthetic Aperture Radar) image semantic segmenta-

tion, which aims at automatically assigning a label to each

pixel, is a challenging topic in the remote sensing community.

In recent years, it has played an important role in various tasks,

such as oil spill detection [5], glacial melting detection [12],

geological hazard assessment [23], etc.

According to whether deep neural networks (DNN) are

used or not, the existing semantic segmentation methods for

SAR images could be divided into two categories: traditional

segmentation methods and DNN-based segmentation methods.

Traditional SAR segmentation methods usually utilize some

traditional machine learning techniques to segment SAR im-

ages, such as markov random ﬁeld [2] and probabilistic model

[37], while DNN-based SAR segmentation methods generally

employ various neural network architectures for handling the

SAR segmentation task [3], [9], [22], [33], [44].

Corresponding author: Qiulei Dong

Mengyu Gao and Qiulei Dong are with the National Laboratory of

Pattern Recognition, Institute of Automation, Chinese Academy of Sciences,

Beijing 100190, China, also with the School of Artiﬁcial Intelligence,

University of Chinese Academy of Sciences, Beijing 100049, China, and

also with the Center for Excellence in Brain Science and Intelligence

Technology, Chinese Academy of Sciences, Beijing 100190, China (e-mail:

gaomengyu2021@ia.ac.cn; qldong@nlpr.ia.ac.cn).

Fig. 1. EO images’ inﬂuence on the performance of the typical multi-modal

method MCANet [28] for segmenting buildings, which is trained with 2401

pairs of SAR and EO images from the SpaceNet6 dataset [47]. MCANet is

tested by utilizing two kinds of input image pairs respectively, including (i)

a pair of SAR image and its corresponding EO image (denoted in red dotted

line); (ii) a pair of SAR image and an all-black image which indicates no

EO scene image is available (denoted in blue dotted line): (a) A testing SAR

image; (b) The corresponding EO image to (a); (c) Predicted segmentation

map with the pair of SAR (a) and EO (b) images; (d) An all-black image;

(e) Predicted segmentation map with the pair of SAR (a) and all-black (e)

images; (f) Ground truth segmentation map. As is seen, once EO image is

not used, the segmentation performance of MCANet degraded signiﬁcantly.

Recently, due to the rapid development of deep learning,

DNN-based SAR segmentation methods have attracted in-

creasing attention. A straightforward strategy is to ﬁne-tune

the existing EO (Electro-Optical) segmentation networks with

SAR images [3], [44], inspired by the DNN’s success in

segmenting EO images [14], [42]. However, such a strategy

only has limited effectiveness, due to the fact that SAR and

EO images have different imaging mechanisms. To allevi-

ate this problem, some works [26], [48] for SAR semantic

segmentation have been proposed by utilizing the intrinsic

characteristics in SAR images. But it is still difﬁcult for

these methods to achieve competitive segmentation results in

comparison to the EO image segmentation results by some

state-of-the-art EO segmentation methods [6], [29] because of

two main reasons: (i) SAR images generally contain speckle

noises and layovers and (ii) EO images have more abundant

textures than SAR images.

The above issue further encourages researchers to design

multi-modal methods for simultaneously segmenting both

SAR and EO images [33], [43]. These multi-modal methods

use pairs of SAR and EO images not only for training but also

for testing. However, once they segment SAR images singly

without EO images, their segmentation performances would be

degraded signiﬁcantly. Fig. 1 illustrates this issue by evaluating

arXiv:2210.08988v1 [cs.CV] 17 Oct 2022

a typical multi-modal method MCANet [28] for segmenting

buildings from the public dataset SpaceNet6 [47] (more results

by the other multi-modal methods [4], [20] in Fig. 5 are also

consistent with those by MCANet [28] in this ﬁgure). As seen

from this ﬁgure, MCANet [28] is ﬁrstly trained with 2401 pairs

of SAR and EO images from the SpaceNet6 dataset [47]. Then,

it is not only tested by utilizing a pair of SAR and EO images

as inputs, but also tested by utilizing a pair of SAR and all-

black images as inputs, indicating that only SAR images are

available for testing in this case. The segmentation result by

utilizing the input SAR image without its corresponding EO

image in Fig. 1(e) is signiﬁcantly worse than that by utilizing

a pair of SAR and EO images in Fig. 1(c). Here, it has to

be further pointed out that it is not technically hard to collect

pairs of SAR and EO images for off-line training, but in many

real testing scenarios (e.g., night, cloud cover, etc.), it is only

available to capture SAR images, but generally impossible to

capture SAR images and their corresponding clear EO images

simultaneously. This motivates us to investigate the following

problem: How to jointly use SAR and EO images to train such

a SAR segmentation network that could segment SAR images

singly without EO images more effectively?

To address this problem, in this paper, we propose a

Heterogeneous Feature Distillation Network for single SAR

semantic segmentation, called HFD-Net, which is inspired by

the knowledge transfer ability of the knowledge distillation

technique in other visual tasks, such as object classiﬁcation

[55], object detection [21], etc. The proposed HFD-Net, which

aims to learn heterogeneous features for segmenting SAR

images, consisting of a pre-trained teacher model for EO

image segmentation, a student model for SAR image segmen-

tation, and a designed heterogeneous feature distillation model

for knowledge transfer. The student model has an identical

architecture as its teacher model but a different parameter

conﬁguration, and the heterogeneous feature distillation model

is designed for transferring latent EO features from the teacher

model to the student model so that the performance of the

student model on SAR image segmentation could be boosted.

Moreover, it is noted that various multi-scale feature aggre-

gation techniques have demonstrated their effectiveness for

learning more discriminative features in some other tasks (such

as monocular depth estimation [57] and visual localization

[31]), however, all these feature aggregation techniques could

only aggregate homogeneous features, but fail to deal with

heterogeneous features. Unlike them, a heterogeneous feature

alignment module is further designed in this work to aggregate

multi-scale heterogeneous features in the student model, which

is also used to aggregate multi-scale homogeneous features in

the teacher model.

In sum, our main contributions include:

•We propose the HFD-Net for SAR image semantic seg-

mentation, which could learn heterogeneous EO/SAR

features through knowledge distillation. To our best

knowledge, this work is the ﬁrst attempt to segment SAR

images with distilled heterogeneous features in the remote

sensing ﬁeld.

•We explore the heterogeneous feature distillation model

in the proposed HFD-Net, which could automatically

transfer knowledge from the EO-segmentation teacher

model to the SAR-segmentation student model.

•We explore the heterogeneous feature alignment module.

Unlike the existing feature aggregation techniques [17]

[18], this module could not only aggregate multi-scale ho-

mogeneous features, but also multi-scale heterogeneous

features.

The rest of this paper is organized as follows: Section

II gives a review of the related work on DNN-based SAR

semantic segmentation. Section III describes the proposed

HFD-Net in detail. Extensive experimental results are reported

in Section IV. We conclude the paper in Section V.

II. RELATED WORK

In this section, we review the single-modal and multi-modal

DNN-based methods for SAR image semantic segmentation

respectively.

A. Single-modal Methods for SAR Semantic Segmentation

Single-modal methods for SAR semantic segmentation use

SAR images singly for both training and testing. Many early

DNN-based single-modal methods focused on ﬁne-tuning the

existing EO segmentation networks with SAR images [3],

[10], [27], [36], [39] and [50]. Bianchi et al. [3] adopted

a Fully Convolutional Network (FCN) [46] to detect the

presence of avalanches by segmenting SAR images with snow

avalanche. Pham et al. [39] utilized SegNet [1] for pixel-

wise classiﬁcation over very high resolution airborne PolSAR

images. Tom at al. [50] addressed lake ice detection using

Sentinel-1 SAR data by DeepLabv3+ [7]. Holzmann et al. [16]

introduced an attention mechanism into the typical U-Net [42]

for segmenting SAR images and Davari et al. [8] employed a

distance map in U-Net [42] to add contextual information.

Unlike the above methods that directly utilized the existing

EO segmentation networks, some methods investigated new

network architectures by introducing intrinsic characteristics

in SAR images [25], [32], [40], [51], [53] and [54]. Wang

et al. [53] proposed HR-SAR-Net under pyramid structure

with atrous convolution to extract magnitude information and

phase information separately in SAR images. Liu et al. [32]

developed a dark spot detection method based on super-pixels

deeper graph convolutional networks (SGDCN) to smooth

SAR image noises. Ristea et al. [40] applied sub-aperture

decomposition (SD) algorithm as a preprocessing stage for

an unsupervised oceanic SAR segmentation model to bring

additional information over the original vignette.

B. Multi-modal Methods for SAR Semantic Segmentation

The existing multi-modal methods for SAR semantic seg-

mentation use multi-modal data (e.g., pairs of EO and SAR

images) together for both training and testing [4], [11], [20],

[28], [49] and [52], considering that multi-modal data gener-

ally has more abundant information than single-modal data.

Sun et al. [49] employed building footprints to learn multi-

level visual features and normalize the features for predicting

building masks in SAR images. Li et al. [28] designed a multi-

modal cross attention network (MCANet) to extract multi-

scale attention maps by fusing SAR and EO images. Cha

et al. [4] formulated multi-modal representation learning in

contrastive multi-view coding by considering three modalities

(i.e., EO image, SAR image, and label mask) as different

data augmentation techniques. Jain et al. [20] proposed a

self-supervised method to learn invariant feature embeddings

between SAR images and multi-spectral images.

It is worth noting that it is not a hard job to collect pairs

of SAR and EO images for training these multi-modal seg-

mentation methods ofﬂine, however, when these multi-modal

methods are used for testing in many real cases (e.g., night,

cloud cover, etc.) where no clear EO images but only SAR

images could be captured, their performances would become

signiﬁcantly worse as illustrated in Fig. 1 and discussed in

Section I. Unlike these multi-modal segmentation methods in

literature, the proposed HFD-Net in this work focuses on a

novel segmentation conﬁguration, where pairs of SAR and EO

images are used for network training, but only SAR images

without EO images are used for testing.

III. METHODOLOGY

In this section, we propose the Heterogeneous Feature

Distillation Network (HFD-Net) for SAR image semantic

segmentation, where the heterogeneous feature distillation

model is explored for heterogeneous feature transfer and the

heterogeneous feature alignment module is explored for multi-

scale feature aggregation. Firstly, the architecture of the HFD-

Net is described. Then, we present the heterogeneous feature

distillation model and the heterogeneous feature alignment

module respectively in detail. Finally, the model training and

total loss function are introduced.

A. Architecture

The HFD-Net, whose architecture is shown in Fig. 2,

consists of a pre-trained teacher model for segmenting EO

images, a student model for segmenting SAR images, and

a designed heterogeneous feature distillation model (HFDM)

for transferring latent EO features from the teacher model

to the student model. The teacher model takes EO images

as its inputs, while the student model takes SAR images as

its inputs. The two models have an identical architecture,

which is composed of a backbone segmentation network and

a designed heterogeneous feature alignment module (HFAM)

for multi-scale feature aggregation, but different parameter

conﬁgurations. Here, we simply use the DeepLabv3+ [7] as the

backbone segmentation network, which consists of a typical

ResNet-101 [13] encoder and a three-block decoder.

At the training stage, the teacher model, whose inputs are

EO scene images, is ﬁrstly trained by implementing an EO

image segmentation task, and it is expected to extract EO

features that reserve some semantic information of the input

scene images. And the parameters of this teacher model would

be ﬁxed after it has been trained. Then, the student model

with SAR images as inputs is trained by implementing a

SAR image segmentation task. In this training process of the

student model, both the ground truth segmentation maps and

the learned EO features from the teacher model are utilized

jointly as the supervision information to guide the student

model to learn heterogeneous features, by simultaneously

minimizing the basic segmentation loss LSand the designed

heterogeneous distillation loss LD.

At the testing stage, only the student model is used for

segmenting an arbitrary input SAR image without its corre-

sponding EO image. In the following subsections, the HFDM

and HFAM would be introduced respectively.

B. Heterogeneous Feature Distillation Model

The heterogeneous feature distillation model (HFDM) is

designed to transfer latent EO features which reserve semantic

information from the teacher model to the student model for

segmenting SAR images. It is noted that both the teacher

and student models in the existing knowledge distillation

techniques [15] and [41] generally deal with an identical task

with homogeneous images. Unlike these works, the teacher

and student models in the HFD-Net focus on two similar but

different tasks (one is EO image segmentation, while the other

one is SAR image segmentation), hence, we design a special

architecture with a heterogeneous distillation loss term for the

HFDM so that the EO knowledge from the teacher model

could be distilled to the SAR-segmentation student model, as

shown in Fig. 3.

As seen from Fig. 3, the HFDM consists of two Sigmoid

operators, two Tsoftmax operators and a designed heteroge-

neous distillation loss LD. Given an input pair of EO and

SAR images, we denote the extracted set of EO feature maps

from the D3block in the teacher model as qt={qt

c|qt

c∈

RH×W, c = 1,2, ..., Cq}where Cqis the number of channels

in the third decoder block D3and {H, W }is the size of each

feature map qt

c. Similarly, we also denote the extracted set of

SAR feature maps from the D3block in the student model as

qs={qs

c|qs

c∈RH×W, c = 1,2, ..., Cq}. The HFDM is used

to enforce the student model to output such SAR features qs

that are as similar to the extracted EO features qtfrom the

teacher model as possible.

Firstly, two identical Sigmoid operators are implemented

to normalize the elements in each feature map qc(indicating

both qt

cand qs

c) into (0,1) respectively. Then, each normalized

feature map ¯qc(indicating both ¯qt

cin ¯qtand ¯qs

cin ¯qs) is

transformed into a pixel-level probability map ˆqc(indicating

both ˆqt

cin ˆqtand ˆqs

cin ˆqs) respectively by implementing the

following Tsoftmax operator:

ˆqc= Tsoftmax(¯qc;T) = exp( ¯qc

j=1

exp( ¯qj

(1)

where Tis a preseted temperature constant (here, the Tsoftmax

function degrades to the commonly used Softmax function

when Tis set to 1, as illustrated in [15]). Finally, the

heterogeneous distillation loss LD, which measures the dif-

ference between the obtained pixel-level probability maps, is

designed to distill latent features from the EO-segmentation

teacher model to the SAR-segmentation student model. The

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1HeterogeneousFeatureDistillationNetworkforSARImageSemanticSegmentationMengyuGaoandQiuleiDongAbstractSemanticsegmentationforSAR(SyntheticApertureRadar)imageshasattractedincreasingattentionintheremotesensingcommunityrecently,duetoSAR'sall-timeandall-weatherimagingcapability.However,SARimagesaregener...

展开>> 收起<<

1 Heterogeneous Feature Distillation Network for SAR Image Semantic Segmentation.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Heterogeneous Feature Distillation Network for SAR Image Semantic Segmentation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: