Stability Analysis and Generalization Bounds of Adversarial Training Jiancong Xiao1 Yanbo Fan2y Ruoyu Sun13y Jue Wang2 Zhi-Quan Luo13

2025-05-03 0 0 1.46MB 27 页 10玖币

侵权投诉

Stability Analysis and Generalization Bounds of

Adversarial Training

Jiancong Xiao1,∗, Yanbo Fan2,†, Ruoyu Sun1,3,†, Jue Wang2, Zhi-Quan Luo1,3

1The Chinese University of Hong Kong, Shenzhen;

2Tencent AI Lab; 3Shenzhen Research Institute of Big Data

jiancongxiao@link.cuhk.edu.cn, fanyanbo0124@gmail.com,

sunruoyu@cuhk.edu.cn, arphid@gmail.com, luozq@cuhk.edu.cn

Abstract

In adversarial machine learning, deep neural networks can ﬁt the adversarial ex-

amples on the training dataset but have poor generalization ability on the test set.

This phenomenon is called robust overﬁtting, and it can be observed when adver-

sarially training neural nets on common datasets, including SVHN, CIFAR-10,

CIFAR-100, and ImageNet. In this paper, we study the robust overﬁtting issue of

adversarial training by using tools from uniform stability. One major challenge

is that the outer function (as a maximization of the inner function) is nonsmooth,

so the standard technique (e.g., (Hardt et al., 2016)) cannot be applied. Our ap-

proach is to consider η-approximate smoothness: we show that the outer function

satisﬁes this modiﬁed smoothness assumption with ηbeing a constant related to

the adversarial perturbation . Based on this, we derive stability-based gener-

alization bounds for stochastic gradient descent (SGD) on the general class of

η-approximate smooth functions, which covers the adversarial loss. Our results

suggest that robust test accuracy decreases in when Tis large, with a speed

between Ω(√T)and O(T ). This phenomenon is also observed in practice. Ad-

ditionally, we show that a few popular techniques for adversarial training (e.g.,

early stopping, cyclic learning rate, and stochastic weight averaging) are stability-

promoting in theory.

1 Introduction

Figure 1: Experiments of ad-

versarial training on CIFAR-

10.

Deep neural networks (DNNs) (Krizhevsky et al., 2012; Hochreiter

and Schmidhuber, 1997) have become successful in many machine

learning tasks and rarely suffered overﬁtting issues (Zhang et al.,

2021). A neural network model can be trained to achieve zero train-

ing error and generalize well to the unseen data. While in the setting

of adversarial training, robust overﬁtting is a dominant issue (Rice

et al., 2020). Speciﬁcally, robust overﬁtting characterizes a train-

ing procedure shown in Fig. 1. After a particular epoch, the robust

test accuracy (black line) starts to decrease, but the robust train-

ing accuracy (blue line) is still increasing. This phenomenon can

be observed in the experiments on common datasets, e.g., SVHN,

CIFAR-10, CIFAR-100, and ImageNet. Recent works (Gowal et al.,

2020; Rebufﬁ et al., 2021) mitigated the overﬁtting issue using reg-

ularization techniques such as stochastic weight averaging (SWA)

∗This work is done when Jiancong Xiao is a research intern at Tencent AI Lab, China.

†Corresponding Authors.

36th Conference on Neural Information Processing Systems (NeurIPS 2022).

arXiv:2210.00960v2 [cs.LG] 31 Oct 2022

and early stopping, but it still has a large generalization gap. Therefore, it is essential to study this

issue from a theoretical perspective. In this paper, we study the robust overﬁtting issue of adversarial

training by using tools from uniform stability.

Uniform stability analysis (Bousquet and Elisseeff, 2002) in learning problems has been intro-

duced to measure generalization gap instead of uniform convergence analysis such as classical VC-

dimension (Vapnik and Chervonenkis, 2015) and Rademacher complexity (Bartlett and Mendelson,

2002). Generalization gap can be bounded in terms of uniform argument stability (UAS). Formally,

UAS is the gap between the output parameters θof running an algorithm Aon two datasets Sand

S0differ in at most one sample, denoted as δ(S, S0) = kθ(S)−θ(S0)k. In a standard training prob-

lem with ntraining samples, assuming that the loss function is convex, L-Lipschitz and β-gradient

Lipschitz, running stochastic gradient descent (SGD) with step size α≤1/β for Tsteps, the UAS

is bounded by O(LαT/n)(Hardt et al., 2016). Since the generalization gap is controlled by the

number of samples n, this bound (at least partially) explains the good generalization of a standard

training problem. Beyond (Hardt et al., 2016), Bassily et al. (2020) considered the case that the

loss function is non-smooth. Without the β-gradient Lipschitz assumption, they showed that the

UAS is bounded by O(Lα√T+LαT /n)and provided a lower bound to show that the additional

term O(Lα√T)is unavoidable. In adversarial settings, two works have discussed the stability of

adversarial training to our knowledge.

Firstly, Farnia and Ozdaglar (2021) considered the stability of minimax problems. Their work in-

cludes a discussion on an algorithm called GDmax (gradient descent on the maximization of the

inner problem), which can be viewed as a general form of adversarial training. It provides the stabil-

ity of GDmax when the inner problem is further assumed to be strongly concave. Under the strongly

concave assumption, the outer function is smooth. Then, the generalization bound O(LαT/n)can

be applied in this case. However, the inner problem is not strongly concave in practice. The bound

O(LαT/n)does not match the poor generalization of adversarial training observed in practice.

Another stability analysis of adversarial training is the work of (Xing et al., 2021a). Before they

propose their algorithm, they use the stability bound O(Lα√T+LαT/n)(Bassily et al., 2020)

to characterize the issue of adversarial generalization. However, the bound is -independent. Let

us Consider two cases, →0and is large (e.g., 16/255). In the ﬁrst case, adversarial training is

very close to standard training and has good generalization ability. In the second case, adversarial

generalization is very poor. The bound is the same in these two different cases. Therefore, it cannot

provide an interpretation of adversarial generalization.

In summary, existing stability-based generalization bounds (Hardt et al., 2016; Bassily et al., 2020)

have limited interpretation on adversarial generalization. In this paper, we provide a new stability

analysis of adversarial training using the notion of η-approximate smoothness. We ﬁrst show that,

under the same assumptions as (Hardt et al., 2016; Xing et al., 2021a), even though the outer func-

tion (adversarial loss) is non-smooth, it is η-approximately smooth (see Deﬁnition 4.1), where ηis

a constant linearly depending on the gradient Lipschitz of the inner function and the perturbation

intensity . Then, we derive stability-based generalization bounds (Thm. 5.1 to 5.4) for stochastic

gradient descent (SGD) on this general class of η-approximate smooth functions, which covers the

adversarial loss. Our main result can be summarized in the following equation. Running SGD on ad-

versarial loss for Tsteps with step size α≤1/β, the excess risk, which is the sum of generalization

error and optimization error, satisﬁes

Excess Risk ≤ Egen +Eopt ≤

additional

z}|{

LηT α +

for standard training

z }| {

2L2T α

n+D2

T α +L2α

| {z }

for adversarial training

.(1.1)

The excess risk of adversarial training has an additional term LηT α. We also provide a lower

bound of UAS of running SGD on adversarial loss. Our results suggest that robust test accuracy

decreases in when Tis large, with a speed between Ω(√T)and O(T ). This phenomenon is

also observed in practice. It provides an understanding of robust overﬁtting from the perspective

of uniform stability. Additionally, we show that a few popular techniques for adversarial training

(e.g., early stopping, cyclic learning rate, and stochastic weight averaging) are stability-promoting

in theory and also empirically improve adversarial training. Experiments on SVHN, CIFAR-10,

CIFAR-100, and ImageNet conﬁrm our results. Our contributions are listed as follows:

• Main results: we derive stability-based generalization bounds for adversarial training us-

ing the notion of η-approximate smoothness. Based on this, we provide an analysis to

understand robust overﬁtting.

• We provide the stability analysis of a few popular techniques for adversarial training and

show that they are indeed stability-promoting.

• We provide experiments on SVHN, CIFAR-10, CIFAR-100, and ImageNet. The results

verify the generalization bounds.

• Technical contribution: we develop a set of properties of η-approximately smooth function,

which might be useful in other tasks.

The paper is organized as follows. After discussing the related work in Sec. 2, the rest of the paper

contains two parts. The ﬁrst part is the technical part to derive stability bounds. The second part is to

analyze robust overﬁtting. Speciﬁcally, in the ﬁrst part, Sec. 3 introduces the preliminary knowledge

about UAS. Sec. 4 provides the Lemma and properties of approximately smooth functions and Sec. 5

gives the stability bounds. In the second part, Sec. 6 analyzes the robust overﬁtting in the theoretical

settings and Sec. 7 presents the experiments.

2 Related Work

Adversarial Attacks and Defense. Starting from the work of (Szegedy et al., 2013), it has been

commonly realized that deep neural networks are highly susceptible to imperceptible corruptions to

the input data (Goodfellow et al., 2014; Carlini and Wagner, 2017; Madry et al., 2017). A series

of work aimed at training neural networks robust to such small perturbations (Wu et al., 2020;

Gowal et al., 2020; Zhang et al., 2020) and another line of work aimed at designing more powerful

adversarial attack algorithms (Athalye et al., 2018; Tramer et al., 2020; Fan et al., 2020; Xiao et al.,

2022c; Qin et al., 2022). A series of work considered adversarial robustness in black-box settings

(Chen et al., 2017; Qin et al., 2021). Semi-supervised learning has been used to improve adversarial

robustness (Carmon et al., 2019; Li et al., 2022). Fast adversarial training (Wong et al., 2020; Huang

et al., 2022) was introduced to save training time.

Adversarial Generalization. A series of work tried to explain adversarial generalization in the

uniform convergence framework, including VC-dimension (Attias et al., 2021; Montasser et al.,

2019) and Rademacher complexity (Khim and Loh, 2018; Yin et al., 2019; Awasthi et al., 2020; Xiao

et al., 2022a). Uniform algorithmic stability is another framework to study adversarial generalization

(Farnia and Ozdaglar, 2021; Xing et al., 2021a; Xiao et al., 2022b). The work of (Schmidt et al.,

2018; Raghunathan et al., 2019; Zhai et al., 2019) have shown that in some scenarios achieving

adversarial generalization requires more data. The work of (Xing et al., 2021b,c; Javanmard et al.,

2020) studied the generalization in the setting of adversarial linear regression. (Sinha et al., 2017)

studied the generalization of distributional robustness. The work of (Taheri et al., 2020; Javanmard

et al., 2020; Dan et al., 2020) analyzed adversarial generalization in Gaussian mixture models.

Uniform Stability. Stability is a classical approach to provide generalization bounds. It can be

traced back to the work of (Rogers and Wagner, 1978). After a few decades, it was well devel-

oped in analyzing the generalization bounds in statistical learning problems (Bousquet and Elisseeff,

2002). These bounds have been signiﬁcantly improved in a recent sequence of works (Feldman and

Vondrak, 2018, 2019). The work of (Chen et al., 2018) derived minimax lower bounds for excess

risk and discussed the optimal trade-off between stability and convergence. Ozdaglar et al. (2022)

considered the generalization metric of minimax optimizer.

3 Preliminaries of Stability

Consider the following setting of statistical learning. There is an unknown distribution Dover

examples from some space Z. We receive a sample dataset S={z1, . . . , zn}of nexamples drawn

i.i.d. from D.The population risk and empirical risk are deﬁned as:

RD(θ)def

=Ez∼D h(θ, z)and RS(θ)def

i=1

h(θ, zi),

respectively, where h(·,·)is the loss function.

Risk Decomposition. Let θ∗and ¯

θbe the optimal solution of RD(θ)and RS(θ)respectively. Then

for the algorithm output ˆ

θ=A(S), the excess risk can be decomposed as

RD(ˆ

θ)−RD(θ∗) = RD(ˆ

θ)−RS(ˆ

θ)

| {z }

Egen

+RS(ˆ

θ)−RS(¯

θ)

| {z }

Eopt

+RS(¯

θ)−RS(θ∗)

| {z }

≤0

+RS(θ∗)−RD(θ∗)

| {z }

E=0

To control the excess risk, we need to control the generalization gap Egen and the optimization gap

Eopt. In the rest of the paper, we use Egen and Eopt to denote the expectation of the generalization and

optimization gap. To bound the generalization gap of a model ˆ

θ=A(S)trained by a randomized

algorithm A, we employ the following notion of uniform stability.

Deﬁnition 3.1. A randomized algorithm Ais ε-uniformly stable if for all data sets S, S0∈ Znsuch

that Sand S0differ in at most one example, we have

sup

EA[h(A(S); z)−h(A(S0); z)] ≤ε . (3.1)

Here, the expectation is taken only over the internal randomness of A. We recall the important

theorem that uniform stability implies generalization in expectation (Hardt et al., 2016).

Theorem 3.1 (Generalization in expectation).Let Abe ε-uniformly stable. Then, the expected

generalization gap satisﬁes

|Egen|=|ES,A[RD[A(S)] −RS[A(S)]]| ≤ ε .

Therefore, we turn to the properties of iterative algorithms that control their uniform stability.

4 Stability of Adversarial Training

Adversarial Surrogate Loss. In adversarial training, we consider the following surrogate loss

h(θ;z) = max

kz−z0kp≤g(θ;z0),(4.1)

where g(θ;z)is the loss function of the standard counterpart, k·kpis the `p-norm, p≥1. Usually, g

can also be written in the form of `(fθ(x); y), where fθis the neural network to be trained and (x, y)

is the input-label pair. We assume the loss function gsatisﬁes the following smoothness assumption.

Assumption 4.1. The function gsatisﬁes the following Lipschitzian smoothness conditions:

kg(θ1, z)−g(θ2, z)k ≤ Lkθ1−θ2k,

k∇θg(θ1, z)− ∇θg(θ2, z)k ≤ Lθkθ1−θ2k,

k∇θg(θ, z1)− ∇θg(θ, z2)k ≤ Lzkz1−z2kp,

where k·kis Euclidean norm.

Assumption 4.1 assumes that the loss function is smooth, which are also used in the stability litera-

ture (Farnia and Ozdaglar, 2021; Xing et al., 2021a), as well as the convergence analysis literature

(Wang et al., 2019; Liu et al., 2020). While ReLU activation function is non-smooth, recent works

(Allen-Zhu et al., 2019; Du et al., 2019) showed that the loss function of over-parameterized DNNs

is semi-smooth. It helps justify Assumption 4.1. Under Assumption 4.1, the loss function of adver-

sarial training satisﬁes the following Lemma (Liu et al., 2020).

Lemma 4.1. Let hbe the adversarial loss deﬁned in Eq. (4.1) and gsatisﬁes Assumption 4.1.

∀θ1, θ2and ∀z∈ Z, the following properties hold.

1. (Lipschitz function.) kh(θ1, z)−h(θ2, z)k ≤ Lkθ1−θ2k.

2. For all subgradient d(θ, z)∈∂θh(θ, z), we have kd(θ1, z)−d(θ2, z)k ≤ Lθkθ1−θ2k+ 2Lz.

If we further assume that g(θ, z)is µ-strongly concave in z, the adversarial surrogate loss h(θ, z)

is also smooth (Sinha et al., 2017). Therefore, the uniform stability of adversarial training follows

(Hardt et al., 2016; Farnia and Ozdaglar, 2021), see Appendix B.3. However, g(θ, z)is non-strongly

concave in practice. The above results provide limited explanations of the poor generalization of

adversarial training. We discuss the generalization properties of adversarial training under Lemma

4.1.2.

4.1 Basic Properties of Approximate Smoothness

To simplify the notation, we use h(θ)as a shorthand notation of h(θ, z). To simplify the argu-

ment, we consider differentiable function h. The results can be extended to non-differentiable cases.

Lemma 4.1 motivates us to analyze a function with the following modiﬁed smoothness assumption,

which we call approximate smoothness assumption.

Deﬁnition 4.1. Let β > 0and η > 0. We say a differentiable function h(θ)is η-approximately

β-gradient Lipschitz, if ∀θ1and θ2, we have

k∇h(θ1)− ∇h(θ2)k ≤ βkθ1−θ2k+η.

In deﬁnition 4.1, ηcontrols the smoothness of the loss function h(·). When η= 0, the function his

gradient Lipschitz. When η→+∞,his a general non-smooth function. As our discussion before,

the adversarial surrogate loss is 2Lz-approximately smooth. As far as we know, this assumption is

rarely discussed in the optimization literature since it cannot improve the convergence rate from a

general non-smooth assumption. But it affects uniform stability, as we will discuss later. We need

to develop the basic properties of approximate smoothness ﬁrst.

Lemma 4.2. Assume that the function his η-approximately β-gradient Lipschitz. ∀θ1, θ2and ∀z∈

Z, the following properties hold.

1. (η-approximate descent lemma.)

h(θ1)−h(θ2)≤ ∇h(θ2)T(θ1−θ2) + β

2kθ1−θ2k2+ηkθ1−θ2k.

2. (η-approximately co-coercive.) Assume in addition that h(θ, z)is convex in θfor all z∈ Z. Let

[·]+= max(0,·). We have

h∇h(θ1)− ∇h(θ2), θ1−θ2i ≥ 1

β[k∇h(θ1)− ∇h(θ2)k − η]+2

We defer the proof to Appendix A. Note that the loss function is L-Lipschitz for every example z,

we have E|h(θ1, z)−h(θ2, z)| ≤ LEkθ1−θ2k, for all z∈ Z.To obtain the stability generalization

bounds, we need to analysis the difference kθT

1−θT

2k, where θT

1and θT

2are the outputs of running

SGD on adversarial surrogate loss for Titerations on two datasets with only one different sample.

Next we provide the recursive bounds under the approximate smoothness assumption.

Algorithms. We consider the stochastic gradient descent on the adversarial surrogate loss. i.e.,

θt+1 =θt−αt∇θh(θt, zit),(4.2)

where αtis the step size in iteration t,zitis the sample chosen in iteration t. We consider two

popular schemes for choosing the examples indices it.Sampling with replacement: One is to pick

it∼Uniform{1,··· , n}at each step. Fixed permutation: The other is to choose a random permu-

tation over {1,··· , n}and cycle through the examples repeatedly in the order determined by the

permutation. Our results hold for both variants.

Properties of Update Rules. We deﬁne Gα,z (θ) = θ−α∇h(θ, z)be the update rule of SGD.

The following lemma holds.

Lemma 4.3. Assume that the function his η-approximately β-gradient Lipschitz. ∀θ1, θ2and ∀z∈

Z,we have

1. (αη-approximately (1 + αβ)-expansive.) kGα,z (θ1)−Gα,z (θ2)k ≤ (1 + αβ)kθ1−θ2k+αη.

2. (αη-approximately non-expansive.) Assume in addition that h(θ, z)is convex in θfor all z∈ Z,

for α≤1/β, we have kGα,z (θ1)−Gα,z(θ2)k≤kθ1−θ2k+αη.

3. (αη-approximately (1 −αγ)-contractive.) Assume in addition that h(θ, z)is γ-strongly convex

in θfor all z∈ Z, for α≤1/β, we have kGα,z(θ1)−Gα,z(θ2)k ≤ (1 −αγ)kθ1−θ2k+αη.

The proof of Lemma 4.3 is based on Lemma 4.2 and is deferred to Appendix A. Lemma 4.3 provides

the recursive distance kθ1−θ2kfrom iteration tto t+ 1. Based on this, we can recursively derive

the distance kθT

1−θT

2k. Then, we can obtain the stability generalization bounds.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

StabilityAnalysisandGeneralizationBoundsofAdversarialTrainingJiancongXiao1;,YanboFan2;y,RuoyuSun1;3;y,JueWang2,Zhi-QuanLuo1;31TheChineseUniversityofHongKong,Shenzhen;2TencentAILab;3ShenzhenResearchInstituteofBigDatajiancongxiao@link.cuhk.edu.cn,fanyanbo0124@gmail.com,sunruoyu@cuhk.edu.cn,arphid@gma...

展开>> 收起<<

Stability Analysis and Generalization Bounds of Adversarial Training Jiancong Xiao1 Yanbo Fan2y Ruoyu Sun13y Jue Wang2 Zhi-Quan Luo13.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Stability Analysis and Generalization Bounds of Adversarial Training Jiancong Xiao1 Yanbo Fan2y Ruoyu Sun13y Jue Wang2 Zhi-Quan Luo13

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: