LP-BFGS ATTACK AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH LIMITED PIXELS Jiebao Zhang1 Wenhua Qian1 Rencan Nie1 Jinde Cao2 Dan Xu1

2025-05-06 0 0 3.74MB 15 页 10玖币

侵权投诉

LP-BFGS ATTACK: AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH

LIMITED PIXELS

Jiebao Zhang1, Wenhua Qian1∗, Rencan Nie1, Jinde Cao2, Dan Xu1

1School of Information Science and Engineering, Yunnan University, Kunming 650500, China

2School of Mathematics, Southeast University, Nanjing 210096, China

ABSTRACT

Deep neural networks are vulnerable to adversarial attacks.

Most L0-norm based white-box attacks craft perturbations by

the gradient of models to the input. Since the computation

cost and memory limitation of calculating the Hessian matrix,

the application of Hessian or approximate Hessian in white-

box attacks is gradually shelved. In this work, we note that

the sparsity requirement on perturbations naturally lends it-

self to the usage of Hessian information. We study the attack

performance and computation cost of the attack method based

on the Hessian with a limited number of perturbation pixels.

Speciﬁcally, we propose the Limited Pixel BFGS (LP-BFGS)

attack method by incorporating the perturbation pixel selec-

tion strategy and the BFGS algorithm. Pixels with top-k at-

tribution scores calculated by the Integrated Gradient method

are regarded as optimization variables of the LP-BFGS attack.

Experimental results across different networks and datasets

demonstrate that our approach has comparable attack ability

with reasonable computation in different numbers of pertur-

bation pixels compared with existing solutions.

Index Terms—Adversarial examples, adversarial at-

tacks, deep neural networks, BFGS method

1. INTRODUCTION

Deep Neural Networks (DNNs) have surpassing performance

on the image classiﬁcation task [1]. However, researchers

have found that DNNs are highly susceptible to small mali-

cious perturbations crafted by adversaries [2, 3]. Speciﬁcally,

malicious perturbations in original examples can signiﬁcantly

harm the performance of DNNs. DNNs are therefore untrust-

worthy for security-sensitive tasks. Many adversarial attack

* Corresponding author: Wenhua Qian. Email: whqian@ynu.edu.cn.

This work was supported by the Research Foundation of Yunnan

Province No.202002AD08001, 202001BB050043, 2019FA044, National

Natural Science Foundation of China under Grants No.62162065, Provin-

cial Foundation for Leaders of Disciplines in Science and Technology

No.2019HB121, in part by the Postgraduate Research and Innovation Foun-

dation of Yunnan University (No.2021Y281, No.2021Z078), and in part by

the Postgraduate Practice and Innovation Foundation of Yunnan University

(No.2021Y179, No.2021Y171).

methods have been proposed to seek perturbations accord-

ing to the unique properties of DNNs and optimization tech-

niques.

Depending on the attacker’s knowledge of the target

model, adversarial attacks can be divided into two cate-

gories: white-box attacks and black-box attacks. White-box

attacks assume that attackers have detailed information about

the target model (e.g., the training data, model structure,

and model weight), and they can be further classiﬁed into

optimization-based attacks [2, 3, 4], single-step attacks [5, 6],

and iterative attacks [7, 8, 9, 10, 11, 12, 13]. Optimization-

based attacks formulate ﬁnding the optimal perturbation as

a box-constrained optimization problem. Szegedy et al. use

the quasi-Newton method, Limited-memory BFGS method

[14, 15], to solve the box-constrained problem, called L-

BFGS attack [3]. Compared with the L-BFGS attack, the

C&W [4] attack uses variable substitution to bypass the

box constraint and uses a more efﬁcient objective function.

Furthermore, it uses the Adam optimizer [16] to ﬁnd the

optimal perturbation. Single-step attacks are simple and ef-

ﬁcient and can alleviate the high computation cost caused

by optimization-based attacks. Since the model is assumed

to be locally linear, perturbations in single-step attacks are

added directly along the gradient [5, 6]. Iterative attacks add

perturbations in multiple steps, achieving a tradeoff between

the computation and the attack performance. Black-box at-

tacks mean that attackers have little information about the

architecture and parameters of the target model. Compared

with white-box attacks, they also can achieve an equivalent

attack by querying the output (e.g., the conﬁdence score or

ﬁnal decision) of the model [17, 18, 19, 20].

Most existing white-box and black-box attacks have in

common the tendency to indiscriminately perturb all pixels

of images. But some research has shown that attackers can

achieve strong attack effects by perturbing certain regions or

pixels of original images. JSMA [12] selects one or more

pixels that play an important role in the model’s prediction

for modiﬁcation at the current iteration. C&W [4] do iterative

executions of the L2distance attack to obtain perturbations

with minimal L0distance. SparseFool [13] exploits the low

mean curvature of the decision boundary to control the spar-

sity of the perturbations. OPA [17] is a score-based attack that

arXiv:2210.15446v2 [cs.CR] 7 Apr 2023

Fig. 1: The illustration that pixels in different locations play a different role in the loss.

Attribution magnitude

Pixels with top-k attribution scores

Original Immutable pixels Perturbation pixels

Adversarial image Immutable pixels Perturbed pixels

= +

LP-BFGS

Optimization variable

Optimal solution

Fig. 2: The attack framework of LP-BFGS. Pixels with top-k attribution scores calculated by the Integrated Gradient method are

selected as optimization variables of LP-BFGS. The adversarial image combines immutable pixels with the optimal perturbation

obtained by LP-BFGS.

is based on the differential evolution algorithm to generate ad-

versarial perturbations in the black-box scenario. Sparse-RS

[20] is also a score-based black-box attack based on the ran-

dom search. Furthermore, some sparse perturbations can be

deployed in the physical world [21, 22, 23].

The sparsity of perturbations is indeed visible but does

not alter the semantic content [20]. Moreover, the demand for

sparsity paves the way for the usage of second-order gradient

information. A sizeable image will commonly bring in the

high-dimension optimization variable in the attack process.

The second-order gradient information of the loss function

w.r.t. the original image, namely the Hessian matrix, requires

expensive computational cost and memory budget. For ex-

ample, for a size 3×256 ×256 image, the size of the loss

function’s Hessian matrix on it is (3 ·216)×(3 ·216). This

matrix would require 144 gigabytes of memory space simul-

taneously, assuming each element in the matrix is a ﬂoating-

point data represented by 4 bytes. This makes it difﬁcult to

use second-order gradient information for most attack algo-

rithms, and the high dimensionality of the optimization vari-

ables is a key factor hindering the use of Hessian or approxi-

mate Hessian information for attacks.

However, when the number of perturbed pixels is limited,

which pixels are more conducive to the attack requires a lot of

careful consideration. Perturbing pixels at different locations

may produce different attack effects. Fig. 1 shows the modi-

ﬁed pixels at different locations have different impacts on the

model’s loss value.

In this work, we propose a limited-pixel BFGS attack (LP-

BFGS) method incorporating attribution scores of pixels and

approximation Hessian information. Speciﬁcally, for an im-

age, we use the integrated gradient algorithm to compute the

attribution score of each pixel with respect to the model’s de-

cision about the true label. We select some pixels with high

attribution scores as perturbation pixels to reduce the dimen-

sionality of the optimized variables and ﬁnd adversarial exam-

ples under the guidance of approximate Hessian information,

as shown in Fig. 2. The main contributions of this work are

as follows:

• We propose the LP-BFGS attack method by incorpo-

rating the perturbation pixel selection strategy and the

BFGS algorithm. LP-BFGS only perturbs some pixels

with the guidance of the Hessian.

• We investigate the effect of loss functions and perturba-

tion pixel numbers on the performance of the LP-BFGS

attack and study the time cost of the LP-BFGS attack

family.

• We conduct experiments across various datasets and

models to verify that the LP-BFGS attack can achieve

a comparable attack with an acceptable computation

cost.

2. RELATED WORK

2.1. Adversarial attacks

Various adversarial attacks have been proposed to seek per-

turbations according to the exclusive attributes of DNNs and

optimization techniques. Some of them investigate the robust-

ness of models against sparse perturbations. OPA selects can-

didate solutions generated by differential evolution as the ad-

versarial examples [17]. Sparse-RS applies components to be

perturbed and the corresponding perturbation values to form

the adversarial input [20]. Some white-box attacks have been

proposed.

L-BFGS attack. L-BFGS formulates ﬁnding an optimal

perturbation as a box constraint optimization problem:

min

rckrkp+f(x+r, t)s.t. x+r∈[0,1]n,(1)

where rdenotes the perturbation vector, xis original input,

tis the target label, and nis size of the image x.fis the

loss function customized by attackers, e.g., the cross entropy.

The constant cﬁnds a compromise between the perturbation

magnitude and the attack performance. The L-BFGS attack

uses the second-order quasi-Newton method [14, 15] to solve

this problem.

FGSM. FGSM directly adds perturbations along the gra-

dient direction of the model to the original examples since

models are assumed to be locally linear. Therefore, it would

avoid the high computation costs caused by the optimization

procedure in a straight and effective manner:

ˆx=x+·sign[∇xJ(θ, x, y)] (2)

where ˆxdenotes the adversarial example, is the magnitude

of the perturbation, Jis the loss function, and θis the param-

eter of the target model.

JSMA. JSMA is a L0-norm based white box targeted at-

tack method [12]. This attack is based on the greedy strategy

that one or more pixels with a high impact on the prediction

are selected for modiﬁcation in each iteration. Concretely,

The JSMA attack calculates the attack saliency map by the

Jacobian matrix of the model, which reﬂects the importance

of pixels in the model prediction. The calculation method for

the saliency map Sis as follows:

S(x, t)i=











0,if ∂Z(x)t

∂xi

<0or X

j6=t

∂Z(x)t

∂xi

∂Z(x)t

∂xi



X

j6=t

∂Z(x)j

∂xi



, otherwise

(3)

where xdenotes the input, tis the target label, and Z(x)is

the logits.

C&W. Compared with the L-BFGS attack, the C&W at-

tack uses variable substitution to bypass the box constraint,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LP-BFGSATTACK:ANADVERSARIALATTACKBASEDONTHEHESSIANWITHLIMITEDPIXELSJiebaoZhang1,WenhuaQian1,RencanNie1,JindeCao2,DanXu11SchoolofInformationScienceandEngineering,YunnanUniversity,Kunming650500,China2SchoolofMathematics,SoutheastUniversity,Nanjing210096,ChinaABSTRACTDeepneuralnetworksarevulnerabletoa...

展开>> 收起<<

LP-BFGS ATTACK AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH LIMITED PIXELS Jiebao Zhang1 Wenhua Qian1 Rencan Nie1 Jinde Cao2 Dan Xu1.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

LP-BFGS ATTACK AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH LIMITED PIXELS Jiebao Zhang1 Wenhua Qian1 Rencan Nie1 Jinde Cao2 Dan Xu1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: