1 LeNo Adversarial Robust Salient Object Detection Networks with Learnable Noise

2025-04-30 0 0 1.23MB 8 页 10玖币

侵权投诉

LeNo: Adversarial Robust Salient Object Detection

Networks with Learnable Noise

He Wang1,2, Lin Wan1, He Tang1

1School of Software Engineering, Huazhong University of Science and Technology

2School of Cyber Science and Engineering, Huazhong University of Science and Technology

Abstract—Pixel-wise prediction with deep neural network has

become an effective paradigm for salient object detection (SOD)

and achieved remarkable performance. However, very few SOD

models are robust against adversarial attacks which are visually

imperceptible for human visual attention. The previous work

robust saliency (ROSA) shufﬂes the pre-segmented superpixels

and then reﬁnes the coarse saliency map by the densely connected

conditional random ﬁeld (CRF). Different from ROSA that relies

on various pre- and post-processings, this paper proposes a light-

weight Learnable Noise (LeNo) to defend adversarial attacks

for SOD models. LeNo preserves accuracy of SOD models

on both adversarial and clean images, as well as inference

speed. In general, LeNo consists of a simple shallow noise and

noise estimation that embedded in the encoder and decoder of

arbitrary SOD networks respectively. Inspired by the center prior

of human visual attention mechanism, we initialize the shallow

noise with a cross-shaped gaussian distribution for better defense

against adversarial attacks. Instead of adding additional network

components for post-processing, the proposed noise estimation

modiﬁes only one channel of the decoder. With the deeply-

supervised noise-decoupled training on state-of-the-art RGB and

RGB-D SOD networks, LeNo outperforms previous works not

only on adversarial images but also on clean images, which

contributes stronger robustness for SOD. Our code is available

at https://github.com/ssecv/LeNo.

I. INTRODUCTION

The progresses of deep neural networks (DNNs) have sig-

niﬁcantly promoted the development of down-stream computer

vision tasks such as image recognition [15], semantic segmen-

tation [4], object detection [36] and salient object detection

[45]. These data-driven models are usually trained with ex-

tensive input images and the annotations. Previous stuides [2]

and [21] have shown that the saliency detection networks are

fragile to adversarial attacks and the performance signiﬁcantly

decreases with even imperceptible perturbations. As shown

in Fig. 1(a), the SOD network fails to detect the salient

object of input image with adversarial perturbation, even if

the salient object of the adversarial image is still obvious and

can be easily distinguished from the background. As shown

in Table 1, without modifying the ground truth annotations,

under adversarial attacks, GateNet [45] obtains only .2741 Fβ

on ECSSD dataset and BBSNet [12] obtains only .3397 Fβon

NJU2K dataset, the Fβdrops by .6697 and .5677 respectively.

This phenomenon indicates that state-of-the-art SOD models

are easy to be fooled by adversarial attacks, even with the

clean auxiliary depth map. The robustness of SOD models

The corresponding author is He Tang.

Fig. 1. SOD adversarial defense comparison. (a) Original GateNet without

any defense. (b) ROSA defense which adds three components to the front,

middle and back of the network respectively. (c) The proposed LeNo defense,

it embeds a lightweight learnable shallow noise and noise estimation and

balances the performance of the clean image and adversarial image.

seriously concerns the security and perception gap between

human and neural networks, because human visual system is

hard to recognize the adversarial perturbations of the input

image.

As shown in Fig. 1(b), recent study ROSA [21] has proposed

a robust salient object detection framework that incorporates

a segment-wise shielding (SWS) component in front of the

backbone, a context-aware restoration (CAR) component after

the decoder and a bilateral ﬁlter between input image and

densely connected CRF. It performs well on 4 RGB SOD

benchmark datasets with 3 SOD networks. The core idea

of ROSA is to introduce another generic noise to destroy

subtle curve-like pattern of adversarial perturbations. SWS

component divides an image into superpixels by SLIC [1]

and shufﬂes the pixels within a superpixel randomly, it breaks

adversarial pertubation by introducing random noise. The

CAR component reﬁnes the saliency detection with a densely

connected CRFasRNN [47]. However, the noise introduced

by SWS is random and not learnable, making accuracy of the

SOD model drop obviously on clean images, e.g., Fig. 1(b).

Moreover, the ROSA is heavy and requires over 1 second

arXiv:2210.15392v2 [cs.CV] 7 Dec 2022

additional time at inference stage. Thus it may retard some

real-time SOD models.

In order to handle the aforementioned limitations, this paper

proposes a learnable noise against adversarial attacks of SOD

networks. The learnable noise consists of a shallow noise and

a noise estimation. Different from SWS that introduces noise

directly to the input image, the shallow noise inserts a noisy

layer between stem and stage 1 of the backbone. It introduces

noise in feature-level, thereby the noise is learnable and able to

balance between learning clean images and adversarial images.

Inspired from the image denoising method [14], we propose a

lightweight noise estimation component to reﬁne the feature of

adversarial images. Our shallow noise and noise estimation are

embedded in the encoder and decoder respectively, allowing

parallel computation. Furthermore, the noise estimation only

affects one channel of the decoder. Consequently, our defense

method introduces much less extra time and performs better

than ROSA, see Fig. 1(b) and (c).

Our main contributions can be summarized as three-folds:

1) We launch adversarial attacks on both state-of-the-art

RGB and RGB-D SOD models successfully. Experi-

mental results verify that a wide range of existing SOD

models are sensitive to adversarial perturbations.

2) We propose a simple but efﬁcient learnable noise (LeNo)

which hardly modiﬁes the original SOD network struc-

ture. It consists of a plug-and-play shallow noise and

noise estimation. It is parallel computing and hardly

inﬂuences the inference speed.

3) With the deeply-supervised noise-decoupled training

scheme, the proposed defense method promotes adver-

sarial robustness of extensive RGB and RGB-D SOD

networks. The experimental results show that our pro-

posed defense method outperforms previous works not

only on adversarial images but also clean images.

II. RELATED WORKS

A. Salient Object Detection

An impressive mechanism of human vision system is the

internal process that quickly scans the global image to obtain

region of interest. In the ﬁeld of computer vision, this task

is referred to as Salient Object Detection. It plays a key role

in a range of real-world applications, such as medical image

segmentation [8], [39], camouﬂaged object detection [7], etc.

Although signiﬁcant progress has been made in the past several

years [23], [35], [44], there is still room for improvement

when faced with challenging factors, such as complicated

backgrounds or varying lighting conditions in the scenes. One

way to overcome such challenges is to employ depth maps,

which provides complementary spatial information and have

become easier to capture due to the ready availability of depth

sensors. Recently, RGB-D based salient object detection has

gained increasing attention, and various methods have been

developed [3], [9]. Early RGB-D based salient object detection

models tended to extract handcrafted features and then fused

the RGB image and depth map. Despite the effectiveness

of traditional methods using handcrafted features, their low-

level features tend to limit generalization ability, and they

lack the high-level reasoning required for complex scenes. To

address these limitations, several deep learning based RGB-

D salient object detection methods [10] have been developed,

with improved performance.

B. Adversarial Attacks

Existing adversarial attacks consist of several groups,

one-step gradient-based methods; iterative methods [6];

optimization-based methods [42]; and generative networks

[34], [46] based methods.

1) FGSM: Fast Gradient Sign Method (FGSM) [13] is an

efﬁcient single-step adversarial attack method. Given vector-

ized input xand corresponding target label y, FGSM alters

each element of xalong the direction of its gradient w.r.t the

inference loss ∂L/∂x. The generation of adversarial example

ˆx(i.e., perturbed input) can be described as:

ˆx=x+·sgn (∇xL(g(x;θ), y)) ,(1)

where is the perturbation constraint that determines the attack

strength. g(x;θ)computes the output of DNN paramterized by

θ.sgn (·)is the sign function.

2) PGD: Projected Gradient Descent (PGD) [29] is a multi-

step variant of FGSM, which is one of the strongest L∞

adversarial example generation algorithm. With ˆxt=1 =xas

the initialization, the iterative update of perturbed data ˆxin

iteration t can be expressed as:

ˆxt= ΠP(x)(ˆxt−1+a·sgn (∇xL(g(ˆxt−1;θ), y))) ,(2)

where P(x)is the projection space which is bounded by x±,

and ais the step size. [29] also propose that PGD is a universal

adversary among all the ﬁrst-order adversaries.

3) ROSA: ROSA [21] is an iterative gradient-based

pipeline. It is the ﬁrst adversarial attack on the state-of-the-

art salient object detection models. They try to make the

predictions of all pixels in xgo wrong. In each iteration t,

supposing that adversarial sample ˆxfrom previous time step

or initialization is prepared, the adversarial sample is updated

as:

ˆx0=x, ˆxt+1 = ˆxt+pt,(3)

t=X

i∈St

[∇ˆxtgi,1−yi(ˆxt;θ)− ∇ˆxtgi,yi(ˆxt;θ)] .(4)

Here, ptdenotes the adversarial perturbation computed at t-th

step, it is obtained by normalization as α·p0

t/kp0

tk∞where α

is a ﬁxed step length, i denotes one pixel in x,Stdenotes the

set of pixels that gcan still classify correctly and yidenotes

two categories: salient and nonsalient.

C. Defenses Against Adversarial Attacks

Many researchers resort to randomization schemes [5], [27]

for mitigating the effects of adversarial perturbations in the

input/feature domain. The intuition behind this type of defense

is that DNNs are always robust to random perturbations.

A randomization based defense attempts to randomize the

adversarial effects into random effects, which are not a concern

for most DNNs. Some of them also add noise to the network

as we do, but their noise is random and not learnable, like

ROSA.

Previous works in feature denoising [25] attempts to al-

leviate the effects of adversarial perturbations on high-level

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1LeNo:AdversarialRobustSalientObjectDetectionNetworkswithLearnableNoiseHeWang1,2,LinWan1,HeTang11SchoolofSoftwareEngineering,HuazhongUniversityofScienceandTechnology2SchoolofCyberScienceandEngineering,HuazhongUniversityofScienceandTechnologyAbstractPixel-wisepredictionwithdeepneuralnetworkhasbecome...

展开>> 收起<<

1 LeNo Adversarial Robust Salient Object Detection Networks with Learnable Noise.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 LeNo Adversarial Robust Salient Object Detection Networks with Learnable Noise

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: