LP-BFGS ATTACK AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH LIMITED PIXELS Jiebao Zhang1 Wenhua Qian1 Rencan Nie1 Jinde Cao2 Dan Xu1

2025-05-06 0 0 3.74MB 15 页 10玖币
侵权投诉
LP-BFGS ATTACK: AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH
LIMITED PIXELS
Jiebao Zhang1, Wenhua Qian1, Rencan Nie1, Jinde Cao2, Dan Xu1
1School of Information Science and Engineering, Yunnan University, Kunming 650500, China
2School of Mathematics, Southeast University, Nanjing 210096, China
ABSTRACT
Deep neural networks are vulnerable to adversarial attacks.
Most L0-norm based white-box attacks craft perturbations by
the gradient of models to the input. Since the computation
cost and memory limitation of calculating the Hessian matrix,
the application of Hessian or approximate Hessian in white-
box attacks is gradually shelved. In this work, we note that
the sparsity requirement on perturbations naturally lends it-
self to the usage of Hessian information. We study the attack
performance and computation cost of the attack method based
on the Hessian with a limited number of perturbation pixels.
Specifically, we propose the Limited Pixel BFGS (LP-BFGS)
attack method by incorporating the perturbation pixel selec-
tion strategy and the BFGS algorithm. Pixels with top-k at-
tribution scores calculated by the Integrated Gradient method
are regarded as optimization variables of the LP-BFGS attack.
Experimental results across different networks and datasets
demonstrate that our approach has comparable attack ability
with reasonable computation in different numbers of pertur-
bation pixels compared with existing solutions.
Index TermsAdversarial examples, adversarial at-
tacks, deep neural networks, BFGS method
1. INTRODUCTION
Deep Neural Networks (DNNs) have surpassing performance
on the image classification task [1]. However, researchers
have found that DNNs are highly susceptible to small mali-
cious perturbations crafted by adversaries [2, 3]. Specifically,
malicious perturbations in original examples can significantly
harm the performance of DNNs. DNNs are therefore untrust-
worthy for security-sensitive tasks. Many adversarial attack
* Corresponding author: Wenhua Qian. Email: whqian@ynu.edu.cn.
This work was supported by the Research Foundation of Yunnan
Province No.202002AD08001, 202001BB050043, 2019FA044, National
Natural Science Foundation of China under Grants No.62162065, Provin-
cial Foundation for Leaders of Disciplines in Science and Technology
No.2019HB121, in part by the Postgraduate Research and Innovation Foun-
dation of Yunnan University (No.2021Y281, No.2021Z078), and in part by
the Postgraduate Practice and Innovation Foundation of Yunnan University
(No.2021Y179, No.2021Y171).
methods have been proposed to seek perturbations accord-
ing to the unique properties of DNNs and optimization tech-
niques.
Depending on the attacker’s knowledge of the target
model, adversarial attacks can be divided into two cate-
gories: white-box attacks and black-box attacks. White-box
attacks assume that attackers have detailed information about
the target model (e.g., the training data, model structure,
and model weight), and they can be further classified into
optimization-based attacks [2, 3, 4], single-step attacks [5, 6],
and iterative attacks [7, 8, 9, 10, 11, 12, 13]. Optimization-
based attacks formulate finding the optimal perturbation as
a box-constrained optimization problem. Szegedy et al. use
the quasi-Newton method, Limited-memory BFGS method
[14, 15], to solve the box-constrained problem, called L-
BFGS attack [3]. Compared with the L-BFGS attack, the
C&W [4] attack uses variable substitution to bypass the
box constraint and uses a more efficient objective function.
Furthermore, it uses the Adam optimizer [16] to find the
optimal perturbation. Single-step attacks are simple and ef-
ficient and can alleviate the high computation cost caused
by optimization-based attacks. Since the model is assumed
to be locally linear, perturbations in single-step attacks are
added directly along the gradient [5, 6]. Iterative attacks add
perturbations in multiple steps, achieving a tradeoff between
the computation and the attack performance. Black-box at-
tacks mean that attackers have little information about the
architecture and parameters of the target model. Compared
with white-box attacks, they also can achieve an equivalent
attack by querying the output (e.g., the confidence score or
final decision) of the model [17, 18, 19, 20].
Most existing white-box and black-box attacks have in
common the tendency to indiscriminately perturb all pixels
of images. But some research has shown that attackers can
achieve strong attack effects by perturbing certain regions or
pixels of original images. JSMA [12] selects one or more
pixels that play an important role in the model’s prediction
for modification at the current iteration. C&W [4] do iterative
executions of the L2distance attack to obtain perturbations
with minimal L0distance. SparseFool [13] exploits the low
mean curvature of the decision boundary to control the spar-
sity of the perturbations. OPA [17] is a score-based attack that
arXiv:2210.15446v2 [cs.CR] 7 Apr 2023
Fig. 1: The illustration that pixels in different locations play a different role in the loss.
Attribution magnitude
Pixels with top-k attribution scores
Original Immutable pixels Perturbation pixels
=+
Adversarial image Immutable pixels Perturbed pixels
= +
LP-BFGS
Optimization variable
Optimal solution
Fig. 2: The attack framework of LP-BFGS. Pixels with top-k attribution scores calculated by the Integrated Gradient method are
selected as optimization variables of LP-BFGS. The adversarial image combines immutable pixels with the optimal perturbation
obtained by LP-BFGS.
is based on the differential evolution algorithm to generate ad-
versarial perturbations in the black-box scenario. Sparse-RS
[20] is also a score-based black-box attack based on the ran-
dom search. Furthermore, some sparse perturbations can be
deployed in the physical world [21, 22, 23].
The sparsity of perturbations is indeed visible but does
not alter the semantic content [20]. Moreover, the demand for
sparsity paves the way for the usage of second-order gradient
information. A sizeable image will commonly bring in the
high-dimension optimization variable in the attack process.
The second-order gradient information of the loss function
w.r.t. the original image, namely the Hessian matrix, requires
expensive computational cost and memory budget. For ex-
ample, for a size 3×256 ×256 image, the size of the loss
function’s Hessian matrix on it is (3 ·216)×(3 ·216). This
matrix would require 144 gigabytes of memory space simul-
taneously, assuming each element in the matrix is a floating-
point data represented by 4 bytes. This makes it difficult to
use second-order gradient information for most attack algo-
rithms, and the high dimensionality of the optimization vari-
ables is a key factor hindering the use of Hessian or approxi-
mate Hessian information for attacks.
However, when the number of perturbed pixels is limited,
which pixels are more conducive to the attack requires a lot of
careful consideration. Perturbing pixels at different locations
may produce different attack effects. Fig. 1 shows the modi-
fied pixels at different locations have different impacts on the
model’s loss value.
In this work, we propose a limited-pixel BFGS attack (LP-
BFGS) method incorporating attribution scores of pixels and
approximation Hessian information. Specifically, for an im-
age, we use the integrated gradient algorithm to compute the
attribution score of each pixel with respect to the model’s de-
cision about the true label. We select some pixels with high
attribution scores as perturbation pixels to reduce the dimen-
sionality of the optimized variables and find adversarial exam-
ples under the guidance of approximate Hessian information,
as shown in Fig. 2. The main contributions of this work are
as follows:
We propose the LP-BFGS attack method by incorpo-
rating the perturbation pixel selection strategy and the
BFGS algorithm. LP-BFGS only perturbs some pixels
with the guidance of the Hessian.
We investigate the effect of loss functions and perturba-
tion pixel numbers on the performance of the LP-BFGS
attack and study the time cost of the LP-BFGS attack
family.
We conduct experiments across various datasets and
models to verify that the LP-BFGS attack can achieve
a comparable attack with an acceptable computation
cost.
2. RELATED WORK
2.1. Adversarial attacks
Various adversarial attacks have been proposed to seek per-
turbations according to the exclusive attributes of DNNs and
optimization techniques. Some of them investigate the robust-
ness of models against sparse perturbations. OPA selects can-
didate solutions generated by differential evolution as the ad-
versarial examples [17]. Sparse-RS applies components to be
perturbed and the corresponding perturbation values to form
the adversarial input [20]. Some white-box attacks have been
proposed.
L-BFGS attack. L-BFGS formulates finding an optimal
perturbation as a box constraint optimization problem:
min
rckrkp+f(x+r, t)s.t. x+r[0,1]n,(1)
where rdenotes the perturbation vector, xis original input,
tis the target label, and nis size of the image x.fis the
loss function customized by attackers, e.g., the cross entropy.
The constant cfinds a compromise between the perturbation
magnitude and the attack performance. The L-BFGS attack
uses the second-order quasi-Newton method [14, 15] to solve
this problem.
FGSM. FGSM directly adds perturbations along the gra-
dient direction of the model to the original examples since
models are assumed to be locally linear. Therefore, it would
avoid the high computation costs caused by the optimization
procedure in a straight and effective manner:
ˆx=x+·sign[xJ(θ, x, y)] (2)
where ˆxdenotes the adversarial example, is the magnitude
of the perturbation, Jis the loss function, and θis the param-
eter of the target model.
JSMA. JSMA is a L0-norm based white box targeted at-
tack method [12]. This attack is based on the greedy strategy
that one or more pixels with a high impact on the prediction
are selected for modification in each iteration. Concretely,
The JSMA attack calculates the attack saliency map by the
Jacobian matrix of the model, which reflects the importance
of pixels in the model prediction. The calculation method for
the saliency map Sis as follows:
S(x, t)i=
0,if Z(x)t
xi
<0or X
j6=t
Z(x)t
xi
>0
Z(x)t
xi
X
j6=t
Z(x)j
xi
, otherwise
(3)
where xdenotes the input, tis the target label, and Z(x)is
the logits.
C&W. Compared with the L-BFGS attack, the C&W at-
tack uses variable substitution to bypass the box constraint,
摘要:

LP-BFGSATTACK:ANADVERSARIALATTACKBASEDONTHEHESSIANWITHLIMITEDPIXELSJiebaoZhang1,WenhuaQian1,RencanNie1,JindeCao2,DanXu11SchoolofInformationScienceandEngineering,YunnanUniversity,Kunming650500,China2SchoolofMathematics,SoutheastUniversity,Nanjing210096,ChinaABSTRACTDeepneuralnetworksarevulnerabletoa...

展开>> 收起<<
LP-BFGS ATTACK AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH LIMITED PIXELS Jiebao Zhang1 Wenhua Qian1 Rencan Nie1 Jinde Cao2 Dan Xu1.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:3.74MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注