
LP-BFGS ATTACK: AN ADVERSARIAL ATTACK BASED ON THE HESSIAN WITH
LIMITED PIXELS
Jiebao Zhang1, Wenhua Qian1∗, Rencan Nie1, Jinde Cao2, Dan Xu1
1School of Information Science and Engineering, Yunnan University, Kunming 650500, China
2School of Mathematics, Southeast University, Nanjing 210096, China
ABSTRACT
Deep neural networks are vulnerable to adversarial attacks.
Most L0-norm based white-box attacks craft perturbations by
the gradient of models to the input. Since the computation
cost and memory limitation of calculating the Hessian matrix,
the application of Hessian or approximate Hessian in white-
box attacks is gradually shelved. In this work, we note that
the sparsity requirement on perturbations naturally lends it-
self to the usage of Hessian information. We study the attack
performance and computation cost of the attack method based
on the Hessian with a limited number of perturbation pixels.
Specifically, we propose the Limited Pixel BFGS (LP-BFGS)
attack method by incorporating the perturbation pixel selec-
tion strategy and the BFGS algorithm. Pixels with top-k at-
tribution scores calculated by the Integrated Gradient method
are regarded as optimization variables of the LP-BFGS attack.
Experimental results across different networks and datasets
demonstrate that our approach has comparable attack ability
with reasonable computation in different numbers of pertur-
bation pixels compared with existing solutions.
Index Terms—Adversarial examples, adversarial at-
tacks, deep neural networks, BFGS method
1. INTRODUCTION
Deep Neural Networks (DNNs) have surpassing performance
on the image classification task [1]. However, researchers
have found that DNNs are highly susceptible to small mali-
cious perturbations crafted by adversaries [2, 3]. Specifically,
malicious perturbations in original examples can significantly
harm the performance of DNNs. DNNs are therefore untrust-
worthy for security-sensitive tasks. Many adversarial attack
* Corresponding author: Wenhua Qian. Email: whqian@ynu.edu.cn.
This work was supported by the Research Foundation of Yunnan
Province No.202002AD08001, 202001BB050043, 2019FA044, National
Natural Science Foundation of China under Grants No.62162065, Provin-
cial Foundation for Leaders of Disciplines in Science and Technology
No.2019HB121, in part by the Postgraduate Research and Innovation Foun-
dation of Yunnan University (No.2021Y281, No.2021Z078), and in part by
the Postgraduate Practice and Innovation Foundation of Yunnan University
(No.2021Y179, No.2021Y171).
methods have been proposed to seek perturbations accord-
ing to the unique properties of DNNs and optimization tech-
niques.
Depending on the attacker’s knowledge of the target
model, adversarial attacks can be divided into two cate-
gories: white-box attacks and black-box attacks. White-box
attacks assume that attackers have detailed information about
the target model (e.g., the training data, model structure,
and model weight), and they can be further classified into
optimization-based attacks [2, 3, 4], single-step attacks [5, 6],
and iterative attacks [7, 8, 9, 10, 11, 12, 13]. Optimization-
based attacks formulate finding the optimal perturbation as
a box-constrained optimization problem. Szegedy et al. use
the quasi-Newton method, Limited-memory BFGS method
[14, 15], to solve the box-constrained problem, called L-
BFGS attack [3]. Compared with the L-BFGS attack, the
C&W [4] attack uses variable substitution to bypass the
box constraint and uses a more efficient objective function.
Furthermore, it uses the Adam optimizer [16] to find the
optimal perturbation. Single-step attacks are simple and ef-
ficient and can alleviate the high computation cost caused
by optimization-based attacks. Since the model is assumed
to be locally linear, perturbations in single-step attacks are
added directly along the gradient [5, 6]. Iterative attacks add
perturbations in multiple steps, achieving a tradeoff between
the computation and the attack performance. Black-box at-
tacks mean that attackers have little information about the
architecture and parameters of the target model. Compared
with white-box attacks, they also can achieve an equivalent
attack by querying the output (e.g., the confidence score or
final decision) of the model [17, 18, 19, 20].
Most existing white-box and black-box attacks have in
common the tendency to indiscriminately perturb all pixels
of images. But some research has shown that attackers can
achieve strong attack effects by perturbing certain regions or
pixels of original images. JSMA [12] selects one or more
pixels that play an important role in the model’s prediction
for modification at the current iteration. C&W [4] do iterative
executions of the L2distance attack to obtain perturbations
with minimal L0distance. SparseFool [13] exploits the low
mean curvature of the decision boundary to control the spar-
sity of the perturbations. OPA [17] is a score-based attack that
arXiv:2210.15446v2 [cs.CR] 7 Apr 2023