
Strength-Adaptive Adversarial Training
84.65
46.12
0
10
20
30
40
50
60
70
80
90
0246810 12 14 16 18 20
Accuracy (%)
Perturbation budget(*/255)
Adversarial accuracy
Natural accuracy
(a)
0
10
20
30
40
50
60
70
80
90
100
0246810 12 14 16 18 20
Accuracy (%)
Perturbation budget (*/255)
=0 =8 =16 =24 =32
(b)
0
10
20
30
40
50
60
70
80
90
100
050 100 150 200
Epoch
Natural training Adversarial training
Natural test Adversarial test
0
10
20
30
40
50
60
70
80
90
0246810 12 14 16 18 20
Accuracy (%)
Perturbation budget (*/255)
Best Last
7.12
7.89
8.35
7.43
(c)
Figure 1: Robustness evaluation on different test perturbation budgets of (a) standard AT; (b) AT
with different training pre-specified perturbation budgets. (c) The learning curve of standard AT
with pre-specified perturbation = 8/255 on PreAct ResNet-18 under `∞threat model and the
robustness evaluation of its “best” and “last” checkpoints.
robustness disparity in output networks. For instance, a standard PGD adversarially-trained PreAct
ResNet18 network has 84% natural accuracy and only 46% robust accuracy on CIFAR10 under `∞
threat model, as shown in Figure 1(a). Empirically, we have to increase the pre-specified perturba-
tion budget to allocate more model capacity for defense against adversarial attacks to mitigate the
degree of robustness disparity, as shown in Figure 1(b). However, the feasible range of perturbation
budget is different for networks with different model capacities. For example, AT with perturbation
budget = 40/255 will make PreAct ResNet-18 optimization collapse, while wide ResNet-34-10
can learn normally. In order to maintain a steady degree of robustness disparity, we have to find
separate perturbation budgets for each network with different model capacities. Therefore, it may
be pessimistic to use AT with a pre-specified perturbation budget to learn a robust network.
Secondly, the attack strength of adversarial training data constrained by the pre-specified pertur-
bation budget is gradually weakened as the growth of network robustness. During the training
process, adversarial training data are generated on the fly and are changed based on the updating of
the network. As the the network’s adversarial robustness continues to increase, the attack strength
of adversarial training data with the pre-specified perturbation budget is getting relatively weaker.
Given the limited network capacity, a degenerate or stagnant adversary accompanied by an evolving
network will easily cause training bias: adversarial training is more inclined to the defense against
weak strength attacks, and thereby erodes defenses on strong strength attacks, leading to the unde-
sirable robust overfitting, as shown in Figure 1(c). Moreover, compared with the “best” checkpoint
in AT with robust overfitting, the “last” checkpoint’s defense advantage in weak strength attack is
slight, while its defense disadvantage in strong strength attack is significant, which indicates that
robust overfitting not only exacerbates the degree of robustness disparity, but also further degrades
the adversarial robustness. Thus, it may be deficient to use adversarial data with a pre-specified
perturbation budget to train a robust network.
To overcome these limitations, we propose strength-adaptive adversarial training (SAAT), which
employs an adversarial loss constraint to generate adversarial training data. The adversarial pertur-
bation generated under this constraint is adaptive to the dynamic training schedule and networks
of various model capacities. Specifically, as adversarial training progresses, a larger perturbation
budget is required to satisfy the adversarial loss constraint since the network becomes more robust.
Thus, the perturbation budgets in our SAAT is adaptively adjusted according to the training state of
adversarial data, which restrains the training bias and effectively avoids robust overfitting. Besides,
SAAT explicitly constrains the attack strength of training data by the adversarial loss constraint,
which guides model capacity scheduling in adversarial training, and thereby can flexibly adjust the
tradeoff between natural accuracy and robustness, ensuring that the output network maintains a
steady degree of robustness disparity even under networks with different model capacities.
Our contributions are as follows. (a) In standard AT, we characterize the pessimism of adversary
with a pre-specified perturbation budget, which is due to the intractable robustness disparity and
undesirable robust overfitting (in Section 3.1). (b) We propose a new adversarial training method,
i.e., SAAT (its learning objective in Section 3.2 and its realization in Section 3.3). SAAT is a
general adversarial training method that can be easily converted to natural training or standard AT.
(c) Empirically, we find that adversarial training loss is well-correlated with the degree of robustness
disparity and robust generalization gap (in Section 4.2), which enables our SAAT to overcome the
issue of robust overfitting and flexibly adjust the tradeoff of adversarial training, leading to the
improved natural accuracy and robustness (in Section 4.3).
2