Strength-Adaptive Adversarial Training STRENGTH -ADAPTIVE ADVERSARIAL TRAINING Chaojian Yu1Dawei Zhou2Li Shen3Jun Yu4

2025-05-02 0 0 1.54MB 11 页 10玖币
侵权投诉
Strength-Adaptive Adversarial Training
STRENGTH-ADAPTIVE ADVERSARIAL TRAINING
Chaojian Yu1Dawei Zhou2Li Shen3Jun Yu4
Bo Han5Mingming Gong6Nannan Wang2Tongliang Liu1
1TML Lab, Sydney AI Centre, The University of Sydney
2Xidian University
3JD Explore Academy
4University of Science and Technology of China
5Hong Kong Baptist University
6The University of Melbourne
{chyu8051,tongliang.liu}@sydney.edu.au, dwzhou.xidian@gmail.com,
mathshenli@gmail.com, harryjun@ustc.edu.cn, bhanml@comp.hkbu.edu.hk,
mingming.gong@unimelb.edu.au, nnwang@xidian.edu.cn
ABSTRACT
Adversarial training (AT) is proved to reliably improve network’s robustness
against adversarial data. However, current AT with a pre-specified perturbation
budget has limitations in learning a robust network. Firstly, applying a pre-
specified perturbation budget on networks of various model capacities will yield
divergent degree of robustness disparity between natural and robust accuracies,
which deviates from robust network’s desideratum. Secondly, the attack strength
of adversarial training data constrained by the pre-specified perturbation budget
fails to upgrade as the growth of network robustness, which leads to robust over-
fitting and further degrades the adversarial robustness. To overcome these limita-
tions, we propose Strength-Adaptive Adversarial Training (SAAT). Specifically,
the adversary employs an adversarial loss constraint to generate adversarial train-
ing data. Under this constraint, the perturbation budget will be adaptively adjusted
according to the training state of adversarial data, which can effectively avoid ro-
bust overfitting. Besides, SAAT explicitly constrains the attack strength of training
data through the adversarial loss, which manipulates model capacity scheduling
during training, and thereby can flexibly control the degree of robustness dispar-
ity and adjust the tradeoff between natural accuracy and robustness. Extensive
experiments show that our proposal boosts the robustness of adversarial training.
1 INTRODUCTION
Current deep neural networks (DNNs) achieve impressive breakthroughs on a variety of fields such
as computer vision (He et al., 2016), speech recognition (Wang et al., 2017), and NLP (Devlin et al.,
2018), but it is well-known that DNNs are vulnerable to adversarial data: small perturbations of the
input which are imperceptible to humans will cause wrong outputs (Szegedy et al., 2013; Goodfellow
et al., 2014). As countermeasures against adversarial data, adversarial training (AT) is a method for
hardening networks against adversarial attacks (Madry et al., 2017). AT trains the network using
adversarial data that are constrained by a pre-specified perturbation budget, which aims to obtain the
output network with the minimum adversarial risk of an sample to be wrongly classified under the
same perturbation budget. Across existing defense techniques, AT has been proved to be one of the
most effective and reliable methods against adversarial attacks (Athalye et al., 2018).
Although promising to improve the network’s robustness, AT with a pre-specified perturbation bud-
get still has limitations in learning a robust network. Firstly, the pre-specified perturbation budget is
inadaptable for networks of various model capacities, yielding divergent degree of robustness dis-
parity between natural and robust accuracies, which deviates from robust network’s desideratum.
Ideally, for a robust network, perturbing the attack budget within a small range should not cause
signifcant accuracy degradation. Unfortunately, the degree of robustness disparity is intractable for
AT with a pre-specified perturbation budget. In standard AT, there could be a prominent degree of
1
arXiv:2210.01288v1 [cs.LG] 4 Oct 2022
Strength-Adaptive Adversarial Training
84.65
46.12
0
10
20
30
40
50
60
70
80
90
0246810 12 14 16 18 20
Accuracy (%)
Perturbation budget*/255
Adversarial accuracy
Natural accuracy
(a)
0
10
20
30
40
50
60
70
80
90
100
0246810 12 14 16 18 20
Accuracy (%)
Perturbation budget (*/255)
=0 =8 =16 =24 =32
(b)
Best
Last
0
10
20
30
40
50
60
70
80
90
100
050 100 150 200
Accuracy (%)
Epoch
Natural training Adversarial training
Natural test Adversarial test
0
10
20
30
40
50
60
70
80
90
0246810 12 14 16 18 20
Accuracy (%)
Perturbation budget (*/255)
Best Last
7.12
7.89
8.35
7.43
(c)
Figure 1: Robustness evaluation on different test perturbation budgets of (a) standard AT; (b) AT
with different training pre-specified perturbation budgets. (c) The learning curve of standard AT
with pre-specified perturbation = 8/255 on PreAct ResNet-18 under `threat model and the
robustness evaluation of its “best” and “last” checkpoints.
robustness disparity in output networks. For instance, a standard PGD adversarially-trained PreAct
ResNet18 network has 84% natural accuracy and only 46% robust accuracy on CIFAR10 under `
threat model, as shown in Figure 1(a). Empirically, we have to increase the pre-specified perturba-
tion budget to allocate more model capacity for defense against adversarial attacks to mitigate the
degree of robustness disparity, as shown in Figure 1(b). However, the feasible range of perturbation
budget is different for networks with different model capacities. For example, AT with perturbation
budget = 40/255 will make PreAct ResNet-18 optimization collapse, while wide ResNet-34-10
can learn normally. In order to maintain a steady degree of robustness disparity, we have to find
separate perturbation budgets for each network with different model capacities. Therefore, it may
be pessimistic to use AT with a pre-specified perturbation budget to learn a robust network.
Secondly, the attack strength of adversarial training data constrained by the pre-specified pertur-
bation budget is gradually weakened as the growth of network robustness. During the training
process, adversarial training data are generated on the fly and are changed based on the updating of
the network. As the the network’s adversarial robustness continues to increase, the attack strength
of adversarial training data with the pre-specified perturbation budget is getting relatively weaker.
Given the limited network capacity, a degenerate or stagnant adversary accompanied by an evolving
network will easily cause training bias: adversarial training is more inclined to the defense against
weak strength attacks, and thereby erodes defenses on strong strength attacks, leading to the unde-
sirable robust overfitting, as shown in Figure 1(c). Moreover, compared with the “best” checkpoint
in AT with robust overfitting, the “last” checkpoint’s defense advantage in weak strength attack is
slight, while its defense disadvantage in strong strength attack is significant, which indicates that
robust overfitting not only exacerbates the degree of robustness disparity, but also further degrades
the adversarial robustness. Thus, it may be deficient to use adversarial data with a pre-specified
perturbation budget to train a robust network.
To overcome these limitations, we propose strength-adaptive adversarial training (SAAT), which
employs an adversarial loss constraint to generate adversarial training data. The adversarial pertur-
bation generated under this constraint is adaptive to the dynamic training schedule and networks
of various model capacities. Specifically, as adversarial training progresses, a larger perturbation
budget is required to satisfy the adversarial loss constraint since the network becomes more robust.
Thus, the perturbation budgets in our SAAT is adaptively adjusted according to the training state of
adversarial data, which restrains the training bias and effectively avoids robust overfitting. Besides,
SAAT explicitly constrains the attack strength of training data by the adversarial loss constraint,
which guides model capacity scheduling in adversarial training, and thereby can flexibly adjust the
tradeoff between natural accuracy and robustness, ensuring that the output network maintains a
steady degree of robustness disparity even under networks with different model capacities.
Our contributions are as follows. (a) In standard AT, we characterize the pessimism of adversary
with a pre-specified perturbation budget, which is due to the intractable robustness disparity and
undesirable robust overfitting (in Section 3.1). (b) We propose a new adversarial training method,
i.e., SAAT (its learning objective in Section 3.2 and its realization in Section 3.3). SAAT is a
general adversarial training method that can be easily converted to natural training or standard AT.
(c) Empirically, we find that adversarial training loss is well-correlated with the degree of robustness
disparity and robust generalization gap (in Section 4.2), which enables our SAAT to overcome the
issue of robust overfitting and flexibly adjust the tradeoff of adversarial training, leading to the
improved natural accuracy and robustness (in Section 4.3).
2
Strength-Adaptive Adversarial Training
2 PRELIMINARY AND RELATED WORK
In this section, we review the adversarial training method and related works.
2.1 ADVERSARIAL TRAINING
Learning objective. Let fθ,Xand `be the network fwith trainable model parameter θ, input
feature space, and loss function, respectively. Given a C-class dataset S={(xi, yi)}n
i=1, where
xi∈ X and yi∈ Y ={0,1, ..., C 1}as its associated label. In natural training, most machine
learning tasks could be formulated as solving the following optimization problem:
min
θ
1
n
n
X
i=1
`(fθ(xi), yi).(1)
The learning objective of natural training is to obtain the networks that have the minimum empirical
risk of a natural input to be wrongly classified. In adversarial training, the adversary adds the
adversarial perturbation to each sample, i.e., transform S={(xi, yi)}n
i=1 to S0={(x0
i=xi+
δi, yi)}n
i=1. The adversarial perturbation {δi}n
i=1 are constrained by a pre-specified budget, i.e.
{δ∆ : ||δ||p}, where pcan be 1,2,, etc. In order to defend such attack, standard
adversarial training (AT) (Madry et al., 2017) resort to solve the following objective function:
min
θ
1
n
n
X
i=1
max
δi`(fθ(xi+δi), yi).(2)
Note that the outer minimization remains the same as Eq.(1), and the inner maximization operator
can also be re-written as
δi= arg max
δi
`(fθ(xi+δi), yi),(3)
where x0
i=xi+δiis the most adversarial data within the perturbation budget . Standard AT
employs the most adversarial data generated according to Eq.(3) for updating the current model.
The learning objective of standard AT is to obtain the networks that have the minimum adversarial
risk of a input to be wrongly classified under the pre-specified perturbation budget.
Realizations. The objective functions of standard AT (Eq.(2)) is a composition of an inner maxi-
mization problem and an outer minimization problem, with one step generating adversarial data and
one step minimizing loss on the generated adversarial data w.r.t. the model parameters θ. For the
outer minimization problem, Stochastic Gradient Descent (SGD) (Bottou, 1999) and its variants are
widely used to optimize the model parameters (Rice et al., 2020). For the inner maximization prob-
lem, the Projected Gradient Descent (PGD) (Madry et al., 2017) is the most common approximation
method for generating adversarial perturbation, which can be viewed as a multi-step variant of Fast
Gradient Sign Method (FGSM) (Goodfellow et al., 2014). Given normal example x∈ X and step
size α > 0, PGD works as follows:
δk+1 = Π(α·signx`(f(x+δk), y) + δk), k N,(4)
where δkis adversarial perturbation at step k; and Πis the projection function that project the
adversarial perturbation back into the pre-specified budget if necessary.
2.2 RELATED WORK
Stopping criteria. There are different stopping criteria for PGD-based adversarial training. For
example, standard AT (Madry et al., 2017) employs a fixed number of iterations K, namely PGD-K,
which is commonly used in many outstanding adversarial training variants, such as TRADES (Zhang
et al., 2019), MART (Wang et al., 2019b), and RST (Carmon et al., 2019). Besides, some works
have further enhanced the PGD-K method by incorporating additional optimization mechanisms,
such as curriculum learning (Cai et al., 2018), FOSC (Wang et al., 2019a), and geometry reweight-
ing (Zhang et al., 2020b). On the other hand, some works adopt different PGD stopping criterion,
i.e., misclassification-aware criterion, which stops the iterations once the network misclassifies the
adversarial data. This misclassification-aware criterion is widely used in the emerging adversarial
training variants, such as FAT (Zhang et al., 2020a), MMA (Ding et al., 2018), IAAT (Balaji et al.,
3
摘要:

Strength-AdaptiveAdversarialTrainingSTRENGTH-ADAPTIVEADVERSARIALTRAININGChaojianYu1DaweiZhou2LiShen3JunYu4BoHan5MingmingGong6NannanWang2TongliangLiu11TMLLab,SydneyAICentre,TheUniversityofSydney2XidianUniversity3JDExploreAcademy4UniversityofScienceandTechnologyofChina5HongKongBaptistUniversity6TheUni...

展开>> 收起<<
Strength-Adaptive Adversarial Training STRENGTH -ADAPTIVE ADVERSARIAL TRAINING Chaojian Yu1Dawei Zhou2Li Shen3Jun Yu4.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.54MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注