1 LeNo Adversarial Robust Salient Object Detection Networks with Learnable Noise

2025-04-30 0 0 1.23MB 8 页 10玖币
侵权投诉
1
LeNo: Adversarial Robust Salient Object Detection
Networks with Learnable Noise
He Wang1,2, Lin Wan1, He Tang1
1School of Software Engineering, Huazhong University of Science and Technology
2School of Cyber Science and Engineering, Huazhong University of Science and Technology
Abstract—Pixel-wise prediction with deep neural network has
become an effective paradigm for salient object detection (SOD)
and achieved remarkable performance. However, very few SOD
models are robust against adversarial attacks which are visually
imperceptible for human visual attention. The previous work
robust saliency (ROSA) shuffles the pre-segmented superpixels
and then refines the coarse saliency map by the densely connected
conditional random field (CRF). Different from ROSA that relies
on various pre- and post-processings, this paper proposes a light-
weight Learnable Noise (LeNo) to defend adversarial attacks
for SOD models. LeNo preserves accuracy of SOD models
on both adversarial and clean images, as well as inference
speed. In general, LeNo consists of a simple shallow noise and
noise estimation that embedded in the encoder and decoder of
arbitrary SOD networks respectively. Inspired by the center prior
of human visual attention mechanism, we initialize the shallow
noise with a cross-shaped gaussian distribution for better defense
against adversarial attacks. Instead of adding additional network
components for post-processing, the proposed noise estimation
modifies only one channel of the decoder. With the deeply-
supervised noise-decoupled training on state-of-the-art RGB and
RGB-D SOD networks, LeNo outperforms previous works not
only on adversarial images but also on clean images, which
contributes stronger robustness for SOD. Our code is available
at https://github.com/ssecv/LeNo.
I. INTRODUCTION
The progresses of deep neural networks (DNNs) have sig-
nificantly promoted the development of down-stream computer
vision tasks such as image recognition [15], semantic segmen-
tation [4], object detection [36] and salient object detection
[45]. These data-driven models are usually trained with ex-
tensive input images and the annotations. Previous stuides [2]
and [21] have shown that the saliency detection networks are
fragile to adversarial attacks and the performance significantly
decreases with even imperceptible perturbations. As shown
in Fig. 1(a), the SOD network fails to detect the salient
object of input image with adversarial perturbation, even if
the salient object of the adversarial image is still obvious and
can be easily distinguished from the background. As shown
in Table 1, without modifying the ground truth annotations,
under adversarial attacks, GateNet [45] obtains only .2741 Fβ
on ECSSD dataset and BBSNet [12] obtains only .3397 Fβon
NJU2K dataset, the Fβdrops by .6697 and .5677 respectively.
This phenomenon indicates that state-of-the-art SOD models
are easy to be fooled by adversarial attacks, even with the
clean auxiliary depth map. The robustness of SOD models
The corresponding author is He Tang.
Fig. 1. SOD adversarial defense comparison. (a) Original GateNet without
any defense. (b) ROSA defense which adds three components to the front,
middle and back of the network respectively. (c) The proposed LeNo defense,
it embeds a lightweight learnable shallow noise and noise estimation and
balances the performance of the clean image and adversarial image.
seriously concerns the security and perception gap between
human and neural networks, because human visual system is
hard to recognize the adversarial perturbations of the input
image.
As shown in Fig. 1(b), recent study ROSA [21] has proposed
a robust salient object detection framework that incorporates
a segment-wise shielding (SWS) component in front of the
backbone, a context-aware restoration (CAR) component after
the decoder and a bilateral filter between input image and
densely connected CRF. It performs well on 4 RGB SOD
benchmark datasets with 3 SOD networks. The core idea
of ROSA is to introduce another generic noise to destroy
subtle curve-like pattern of adversarial perturbations. SWS
component divides an image into superpixels by SLIC [1]
and shuffles the pixels within a superpixel randomly, it breaks
adversarial pertubation by introducing random noise. The
CAR component refines the saliency detection with a densely
connected CRFasRNN [47]. However, the noise introduced
by SWS is random and not learnable, making accuracy of the
SOD model drop obviously on clean images, e.g., Fig. 1(b).
Moreover, the ROSA is heavy and requires over 1 second
arXiv:2210.15392v2 [cs.CV] 7 Dec 2022
2
additional time at inference stage. Thus it may retard some
real-time SOD models.
In order to handle the aforementioned limitations, this paper
proposes a learnable noise against adversarial attacks of SOD
networks. The learnable noise consists of a shallow noise and
a noise estimation. Different from SWS that introduces noise
directly to the input image, the shallow noise inserts a noisy
layer between stem and stage 1 of the backbone. It introduces
noise in feature-level, thereby the noise is learnable and able to
balance between learning clean images and adversarial images.
Inspired from the image denoising method [14], we propose a
lightweight noise estimation component to refine the feature of
adversarial images. Our shallow noise and noise estimation are
embedded in the encoder and decoder respectively, allowing
parallel computation. Furthermore, the noise estimation only
affects one channel of the decoder. Consequently, our defense
method introduces much less extra time and performs better
than ROSA, see Fig. 1(b) and (c).
Our main contributions can be summarized as three-folds:
1) We launch adversarial attacks on both state-of-the-art
RGB and RGB-D SOD models successfully. Experi-
mental results verify that a wide range of existing SOD
models are sensitive to adversarial perturbations.
2) We propose a simple but efficient learnable noise (LeNo)
which hardly modifies the original SOD network struc-
ture. It consists of a plug-and-play shallow noise and
noise estimation. It is parallel computing and hardly
influences the inference speed.
3) With the deeply-supervised noise-decoupled training
scheme, the proposed defense method promotes adver-
sarial robustness of extensive RGB and RGB-D SOD
networks. The experimental results show that our pro-
posed defense method outperforms previous works not
only on adversarial images but also clean images.
II. RELATED WORKS
A. Salient Object Detection
An impressive mechanism of human vision system is the
internal process that quickly scans the global image to obtain
region of interest. In the field of computer vision, this task
is referred to as Salient Object Detection. It plays a key role
in a range of real-world applications, such as medical image
segmentation [8], [39], camouflaged object detection [7], etc.
Although significant progress has been made in the past several
years [23], [35], [44], there is still room for improvement
when faced with challenging factors, such as complicated
backgrounds or varying lighting conditions in the scenes. One
way to overcome such challenges is to employ depth maps,
which provides complementary spatial information and have
become easier to capture due to the ready availability of depth
sensors. Recently, RGB-D based salient object detection has
gained increasing attention, and various methods have been
developed [3], [9]. Early RGB-D based salient object detection
models tended to extract handcrafted features and then fused
the RGB image and depth map. Despite the effectiveness
of traditional methods using handcrafted features, their low-
level features tend to limit generalization ability, and they
lack the high-level reasoning required for complex scenes. To
address these limitations, several deep learning based RGB-
D salient object detection methods [10] have been developed,
with improved performance.
B. Adversarial Attacks
Existing adversarial attacks consist of several groups,
one-step gradient-based methods; iterative methods [6];
optimization-based methods [42]; and generative networks
[34], [46] based methods.
1) FGSM: Fast Gradient Sign Method (FGSM) [13] is an
efficient single-step adversarial attack method. Given vector-
ized input xand corresponding target label y, FGSM alters
each element of xalong the direction of its gradient w.r.t the
inference loss L/∂x. The generation of adversarial example
ˆx(i.e., perturbed input) can be described as:
ˆx=x+·sgn (xL(g(x;θ), y)) ,(1)
where is the perturbation constraint that determines the attack
strength. g(x;θ)computes the output of DNN paramterized by
θ.sgn (·)is the sign function.
2) PGD: Projected Gradient Descent (PGD) [29] is a multi-
step variant of FGSM, which is one of the strongest L
adversarial example generation algorithm. With ˆxt=1 =xas
the initialization, the iterative update of perturbed data ˆxin
iteration t can be expressed as:
ˆxt= ΠP(x)(ˆxt1+a·sgn (xL(g(ˆxt1;θ), y))) ,(2)
where P(x)is the projection space which is bounded by x±,
and ais the step size. [29] also propose that PGD is a universal
adversary among all the first-order adversaries.
3) ROSA: ROSA [21] is an iterative gradient-based
pipeline. It is the first adversarial attack on the state-of-the-
art salient object detection models. They try to make the
predictions of all pixels in xgo wrong. In each iteration t,
supposing that adversarial sample ˆxfrom previous time step
or initialization is prepared, the adversarial sample is updated
as:
ˆx0=x, ˆxt+1 = ˆxt+pt,(3)
p0
t=X
iSt
[ˆxtgi,1yi(ˆxt;θ)− ∇ˆxtgi,yi(ˆxt;θ)] .(4)
Here, ptdenotes the adversarial perturbation computed at t-th
step, it is obtained by normalization as α·p0
t/kp0
tkwhere α
is a fixed step length, i denotes one pixel in x,Stdenotes the
set of pixels that gcan still classify correctly and yidenotes
two categories: salient and nonsalient.
C. Defenses Against Adversarial Attacks
Many researchers resort to randomization schemes [5], [27]
for mitigating the effects of adversarial perturbations in the
input/feature domain. The intuition behind this type of defense
is that DNNs are always robust to random perturbations.
A randomization based defense attempts to randomize the
adversarial effects into random effects, which are not a concern
for most DNNs. Some of them also add noise to the network
as we do, but their noise is random and not learnable, like
ROSA.
Previous works in feature denoising [25] attempts to al-
leviate the effects of adversarial perturbations on high-level
摘要:

1LeNo:AdversarialRobustSalientObjectDetectionNetworkswithLearnableNoiseHeWang1,2,LinWan1,HeTang11SchoolofSoftwareEngineering,HuazhongUniversityofScienceandTechnology2SchoolofCyberScienceandEngineering,HuazhongUniversityofScienceandTechnologyAbstract—Pixel-wisepredictionwithdeepneuralnetworkhasbecome...

展开>> 收起<<
1 LeNo Adversarial Robust Salient Object Detection Networks with Learnable Noise.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:1.23MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注