Strength-Adaptive Adversarial Training STRENGTH -ADAPTIVE ADVERSARIAL TRAINING Chaojian Yu1Dawei Zhou2Li Shen3Jun Yu4

2025-05-02 0 0 1.54MB 11 页 10玖币

侵权投诉

Strength-Adaptive Adversarial Training

STRENGTH-ADAPTIVE ADVERSARIAL TRAINING

Chaojian Yu1Dawei Zhou2Li Shen3Jun Yu4

Bo Han5Mingming Gong6Nannan Wang2Tongliang Liu1

1TML Lab, Sydney AI Centre, The University of Sydney

2Xidian University

3JD Explore Academy

4University of Science and Technology of China

5Hong Kong Baptist University

6The University of Melbourne

{chyu8051,tongliang.liu}@sydney.edu.au, dwzhou.xidian@gmail.com,

mathshenli@gmail.com, harryjun@ustc.edu.cn, bhanml@comp.hkbu.edu.hk,

mingming.gong@unimelb.edu.au, nnwang@xidian.edu.cn

ABSTRACT

Adversarial training (AT) is proved to reliably improve network’s robustness

against adversarial data. However, current AT with a pre-speciﬁed perturbation

budget has limitations in learning a robust network. Firstly, applying a pre-

speciﬁed perturbation budget on networks of various model capacities will yield

divergent degree of robustness disparity between natural and robust accuracies,

which deviates from robust network’s desideratum. Secondly, the attack strength

of adversarial training data constrained by the pre-speciﬁed perturbation budget

fails to upgrade as the growth of network robustness, which leads to robust over-

ﬁtting and further degrades the adversarial robustness. To overcome these limita-

tions, we propose Strength-Adaptive Adversarial Training (SAAT). Speciﬁcally,

the adversary employs an adversarial loss constraint to generate adversarial train-

ing data. Under this constraint, the perturbation budget will be adaptively adjusted

according to the training state of adversarial data, which can effectively avoid ro-

bust overﬁtting. Besides, SAAT explicitly constrains the attack strength of training

data through the adversarial loss, which manipulates model capacity scheduling

during training, and thereby can ﬂexibly control the degree of robustness dispar-

ity and adjust the tradeoff between natural accuracy and robustness. Extensive

experiments show that our proposal boosts the robustness of adversarial training.

1 INTRODUCTION

Current deep neural networks (DNNs) achieve impressive breakthroughs on a variety of ﬁelds such

as computer vision (He et al., 2016), speech recognition (Wang et al., 2017), and NLP (Devlin et al.,

2018), but it is well-known that DNNs are vulnerable to adversarial data: small perturbations of the

input which are imperceptible to humans will cause wrong outputs (Szegedy et al., 2013; Goodfellow

et al., 2014). As countermeasures against adversarial data, adversarial training (AT) is a method for

hardening networks against adversarial attacks (Madry et al., 2017). AT trains the network using

adversarial data that are constrained by a pre-speciﬁed perturbation budget, which aims to obtain the

output network with the minimum adversarial risk of an sample to be wrongly classiﬁed under the

same perturbation budget. Across existing defense techniques, AT has been proved to be one of the

most effective and reliable methods against adversarial attacks (Athalye et al., 2018).

Although promising to improve the network’s robustness, AT with a pre-speciﬁed perturbation bud-

get still has limitations in learning a robust network. Firstly, the pre-speciﬁed perturbation budget is

inadaptable for networks of various model capacities, yielding divergent degree of robustness dis-

parity between natural and robust accuracies, which deviates from robust network’s desideratum.

Ideally, for a robust network, perturbing the attack budget within a small range should not cause

signifcant accuracy degradation. Unfortunately, the degree of robustness disparity is intractable for

AT with a pre-speciﬁed perturbation budget. In standard AT, there could be a prominent degree of

arXiv:2210.01288v1 [cs.LG] 4 Oct 2022

Strength-Adaptive Adversarial Training

84.65

46.12

0246810 12 14 16 18 20

Accuracy (%)

Perturbation budget（*/255）

Adversarial accuracy

Natural accuracy

(a)

100

0246810 12 14 16 18 20

Accuracy (%)

Perturbation budget (*/255)

=0 =8 =16 =24 =32

(b)

Best

Last

100

050 100 150 200

Accuracy (%)

Epoch

Natural training Adversarial training

Natural test Adversarial test

0246810 12 14 16 18 20

Accuracy (%)

Perturbation budget (*/255)

Best Last

7.12

7.89

8.35

7.43

(c)

Figure 1: Robustness evaluation on different test perturbation budgets of (a) standard AT; (b) AT

with different training pre-speciﬁed perturbation budgets. (c) The learning curve of standard AT

with pre-speciﬁed perturbation = 8/255 on PreAct ResNet-18 under `∞threat model and the

robustness evaluation of its “best” and “last” checkpoints.

robustness disparity in output networks. For instance, a standard PGD adversarially-trained PreAct

ResNet18 network has 84% natural accuracy and only 46% robust accuracy on CIFAR10 under `∞

threat model, as shown in Figure 1(a). Empirically, we have to increase the pre-speciﬁed perturba-

tion budget to allocate more model capacity for defense against adversarial attacks to mitigate the

degree of robustness disparity, as shown in Figure 1(b). However, the feasible range of perturbation

budget is different for networks with different model capacities. For example, AT with perturbation

budget = 40/255 will make PreAct ResNet-18 optimization collapse, while wide ResNet-34-10

can learn normally. In order to maintain a steady degree of robustness disparity, we have to ﬁnd

separate perturbation budgets for each network with different model capacities. Therefore, it may

be pessimistic to use AT with a pre-speciﬁed perturbation budget to learn a robust network.

Secondly, the attack strength of adversarial training data constrained by the pre-speciﬁed pertur-

bation budget is gradually weakened as the growth of network robustness. During the training

process, adversarial training data are generated on the ﬂy and are changed based on the updating of

the network. As the the network’s adversarial robustness continues to increase, the attack strength

of adversarial training data with the pre-speciﬁed perturbation budget is getting relatively weaker.

Given the limited network capacity, a degenerate or stagnant adversary accompanied by an evolving

network will easily cause training bias: adversarial training is more inclined to the defense against

weak strength attacks, and thereby erodes defenses on strong strength attacks, leading to the unde-

sirable robust overﬁtting, as shown in Figure 1(c). Moreover, compared with the “best” checkpoint

in AT with robust overﬁtting, the “last” checkpoint’s defense advantage in weak strength attack is

slight, while its defense disadvantage in strong strength attack is signiﬁcant, which indicates that

robust overﬁtting not only exacerbates the degree of robustness disparity, but also further degrades

the adversarial robustness. Thus, it may be deﬁcient to use adversarial data with a pre-speciﬁed

perturbation budget to train a robust network.

To overcome these limitations, we propose strength-adaptive adversarial training (SAAT), which

employs an adversarial loss constraint to generate adversarial training data. The adversarial pertur-

bation generated under this constraint is adaptive to the dynamic training schedule and networks

of various model capacities. Speciﬁcally, as adversarial training progresses, a larger perturbation

budget is required to satisfy the adversarial loss constraint since the network becomes more robust.

Thus, the perturbation budgets in our SAAT is adaptively adjusted according to the training state of

adversarial data, which restrains the training bias and effectively avoids robust overﬁtting. Besides,

SAAT explicitly constrains the attack strength of training data by the adversarial loss constraint,

which guides model capacity scheduling in adversarial training, and thereby can ﬂexibly adjust the

tradeoff between natural accuracy and robustness, ensuring that the output network maintains a

steady degree of robustness disparity even under networks with different model capacities.

Our contributions are as follows. (a) In standard AT, we characterize the pessimism of adversary

with a pre-speciﬁed perturbation budget, which is due to the intractable robustness disparity and

undesirable robust overﬁtting (in Section 3.1). (b) We propose a new adversarial training method,

i.e., SAAT (its learning objective in Section 3.2 and its realization in Section 3.3). SAAT is a

general adversarial training method that can be easily converted to natural training or standard AT.

disparity and robust generalization gap (in Section 4.2), which enables our SAAT to overcome the

issue of robust overﬁtting and ﬂexibly adjust the tradeoff of adversarial training, leading to the

improved natural accuracy and robustness (in Section 4.3).

Strength-Adaptive Adversarial Training

2 PRELIMINARY AND RELATED WORK

In this section, we review the adversarial training method and related works.

2.1 ADVERSARIAL TRAINING

Learning objective. Let fθ,Xand `be the network fwith trainable model parameter θ, input

feature space, and loss function, respectively. Given a C-class dataset S={(xi, yi)}n

i=1, where

xi∈ X and yi∈ Y ={0,1, ..., C −1}as its associated label. In natural training, most machine

learning tasks could be formulated as solving the following optimization problem:

min

i=1

`(fθ(xi), yi).(1)

The learning objective of natural training is to obtain the networks that have the minimum empirical

risk of a natural input to be wrongly classiﬁed. In adversarial training, the adversary adds the

adversarial perturbation to each sample, i.e., transform S={(xi, yi)}n

i=1 to S0={(x0

i=xi+

δi, yi)}n

i=1. The adversarial perturbation {δi}n

i=1 are constrained by a pre-speciﬁed budget, i.e.

{δ∈∆ : ||δ||p≤}, where pcan be 1,2,∞, etc. In order to defend such attack, standard

adversarial training (AT) (Madry et al., 2017) resort to solve the following objective function:

min

i=1

max

δi∈∆`(fθ(xi+δi), yi).(2)

Note that the outer minimization remains the same as Eq.(1), and the inner maximization operator

can also be re-written as

δi= arg max

δi∈∆

`(fθ(xi+δi), yi),(3)

where x0

i=xi+δiis the most adversarial data within the perturbation budget ∆. Standard AT

employs the most adversarial data generated according to Eq.(3) for updating the current model.

The learning objective of standard AT is to obtain the networks that have the minimum adversarial

risk of a input to be wrongly classiﬁed under the pre-speciﬁed perturbation budget.

Realizations. The objective functions of standard AT (Eq.(2)) is a composition of an inner maxi-

mization problem and an outer minimization problem, with one step generating adversarial data and

one step minimizing loss on the generated adversarial data w.r.t. the model parameters θ. For the

outer minimization problem, Stochastic Gradient Descent (SGD) (Bottou, 1999) and its variants are

widely used to optimize the model parameters (Rice et al., 2020). For the inner maximization prob-

lem, the Projected Gradient Descent (PGD) (Madry et al., 2017) is the most common approximation

method for generating adversarial perturbation, which can be viewed as a multi-step variant of Fast

Gradient Sign Method (FGSM) (Goodfellow et al., 2014). Given normal example x∈ X and step

size α > 0, PGD works as follows:

δk+1 = Π∆(α·sign∇x`(f(x+δk), y) + δk), k ∈N,(4)

where δkis adversarial perturbation at step k; and Π∆is the projection function that project the

adversarial perturbation back into the pre-speciﬁed budget ∆if necessary.

2.2 RELATED WORK

Stopping criteria. There are different stopping criteria for PGD-based adversarial training. For

example, standard AT (Madry et al., 2017) employs a ﬁxed number of iterations K, namely PGD-K,

which is commonly used in many outstanding adversarial training variants, such as TRADES (Zhang

et al., 2019), MART (Wang et al., 2019b), and RST (Carmon et al., 2019). Besides, some works

have further enhanced the PGD-K method by incorporating additional optimization mechanisms,

such as curriculum learning (Cai et al., 2018), FOSC (Wang et al., 2019a), and geometry reweight-

ing (Zhang et al., 2020b). On the other hand, some works adopt different PGD stopping criterion,

i.e., misclassiﬁcation-aware criterion, which stops the iterations once the network misclassiﬁes the

adversarial data. This misclassiﬁcation-aware criterion is widely used in the emerging adversarial

training variants, such as FAT (Zhang et al., 2020a), MMA (Ding et al., 2018), IAAT (Balaji et al.,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Strength-AdaptiveAdversarialTrainingSTRENGTH-ADAPTIVEADVERSARIALTRAININGChaojianYu1DaweiZhou2LiShen3JunYu4BoHan5MingmingGong6NannanWang2TongliangLiu11TMLLab,SydneyAICentre,TheUniversityofSydney2XidianUniversity3JDExploreAcademy4UniversityofScienceandTechnologyofChina5HongKongBaptistUniversity6TheUni...

展开>> 收起<<

Strength-Adaptive Adversarial Training STRENGTH -ADAPTIVE ADVERSARIAL TRAINING Chaojian Yu1Dawei Zhou2Li Shen3Jun Yu4.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Strength-Adaptive Adversarial Training STRENGTH -ADAPTIVE ADVERSARIAL TRAINING Chaojian Yu1Dawei Zhou2Li Shen3Jun Yu4

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: