i.e., the validating robustness under multi-step attack, e.g.,
projected gradient descent (PGD) [27], suddenly drops to
zero whereas the training robustness against FGSM attack
keeps increasing. Later, it was found that catastrophic over-
fitting it not limited to FGSM based AT but also occurs in
diverse single-step AT methods [2]. Few attempts [2,24,34]
have been made to identify the underlying reason for catas-
trophic overfitting and thus to develop strategies to prevent
this failure. However, these work did not provide a funda-
mental reason for the problem and methods proposed are
computationally inefficient [26].
In this work, we first unfold the connection behind catas-
trophic overfitting and local linearity. Recall that single-step
adversary such as FGSM produces perturbations based on
liner approximation of the loss function. However, the gra-
dient masking causes the linear assumption to become unre-
liable, which results in the exclusion of strong attacks that
maximize the loss during training. As catastrophic over-
fitting typically happens in non-iterative AT, we perform a
comparison between single-step AT and multi-step AT for
our empirical study. Fig. 1shows that catastrophic over-
fitting coincides with the drastic change of local linearity
of the loss surface. To be more specific, the linearity ap-
proximation error of FGSM-AT abruptly increases in the
moment when catastrophic overfitting occurs, with the ro-
bust accuracy suddenly deteriorated within an epoch. On
the contrary, TRADES that generates adversarial examples
with multiple iterations, is able to maintain negligible value
of local linearity error during entire training.
Upon this observation, we attempt to make a rigorous
and comprehensive study on addressing catastrophic over-
fitting by retaining the local linearity of the loss function.
The proposed Stable and Efficient Adversarial Training
harnesses on the local linearity of models trained using
multi-step methods, and incorporates such salient property
into models trained using single-step AT. Fig. C.2 in Ap-
pendix shows that model resulting from the proposed SEAT
behaves in a strikingly similar way to that of TRADES, in-
dicating competitive robustness with multi-step AT.
Our main contributions are summarized as follows:
•We first empirically identify a clear correlation be-
tween catastrophic overfitting and local linearity in DNNs,
which sparks our theoretical analysis of the failure reason
behind FGSM-AT, motivating us to overcome the weakness.
•We propose a novel regularization, Stable and Efficient
Adversarial Training (SEAT), which prevents catastrophic
overfitting by explicitly punishing the violation of linearity
assumption to assure the validity of FGSM solution.
•We conduct a throughout experimental study and show
that the proposed SEAT consistently achieves superiority in
stability and adversarial robustness amongst existing single-
step AT, and is even comparable to most multi-step AT but
at a much lower cost. We also justify the effectiveness of
SEAT in different attack setups, and through loss surface
smoothness, and decision boundary distortions.
2. Related Work
2.1. Adversarial Robustness and Attack Strength
Adversarial training is widely regarded as the most ef-
fective defenses. According to the count of gradient prop-
agations involved in attack generation, methods can be
mainly grouped into multi-step AT [7,27] and single-step
AT [15,22]. Multi-step AT, such as PGD-AT [27], gener-
ally achieves robustness by training on strong perturbations
generated by iterative optimization. In more recent work,
TRADES [47] and AWP [44] yield enhanced robustness
with regard to a regularization, and [31] further improves
by a judicious choice of hyperparameters. Albeit empir-
ically the best performing method to train robust models,
multi-step AT is time-consuming.
The high cost of multi-step AT has motivated an alter-
nate, i.e., single-step AT, which proves to be efficient. It
trains by training with shared gradient computations [35];
or by using cheaper adversaries, such as FGSM [15,36]; or
by first using FGSM and later switching to PGD [40]. While
these single-step AT shows promising direction, their robust
performance is not on par with multi-step AT. Worse still, it
is prone to a serious problem of catastrophic overfitting,i.e.
after a few epochs of adversarial training, the robust accu-
racy of the model against PGD sharply decreases to 0%.
2.2. Adversarial Generalization and Flat Minima
There exists a large body of work investigating the corre-
lation between the flatness of local minima and the general-
ization performance of DNNs on natural samples [21,25]. It
has been empirically verified and commonly accepted that
flatter loss surface tends to yield better generalization, and
this understanding is further utilized to design regulariza-
tion(e.g., [13,20,42]).
Analogous connection has also be identified in adver-
sarial training scenario, where the flatness of loss surface
helps to improve robust generalization on adversarial sam-
ples [30]. Several well-recognized improvements of AT,
i.e., TRADES [47], MART [41], and RST [7], all implicitly
flatten the loss surface to improve the robust generalization.
Moreover, there are a line of works proposed explicit reg-
ularization to directly encourage the flatness of local min-
ima [33,44].
3. Methodology
Our aim is to develop a technique that resolves catas-
trophic overfitting so as to stabilize the single-step adver-
sarial training. In this section, we first theoretically ana-
lyze the pitfalls of present single-step AT methods. Then,
we provide theoretical justifications for our regularization,