
Table 1:
Impact of augmentations:
Perfor-
mance (%) of ACAT models on Base augmen-
tations and AutoAugment (Auto). Clean and
robust accuracy against GAMA attack [
42
] are
reported. The use of AutoAugment results in
∼1.5−2% drop in robust accuracy.
Test: No Aug AutoAugment
Model Train ↓Clean Robust Clean Robust
Base 82.41 50.00 63.79 37.07
ResNet-18 Auto 82.54 48.11 76.40 43.22
Base 86.71 55.58 68.24 40.83
WideResNet-34-10 Auto 86.80 53.99 82.64 48.98
Figure 2:
Comparison of BN layer statistics
for
a WRN-34-10 model trained on CIFAR-10 us-
ing DAJAT. BN layers of the Base augmentations
(Pad+Crop,H-Flip) are compared with those of Au-
toAugment. Initial layer (L3) parameters are diverse,
while those of deeper layers (L25) are similar.
expected to generalize to a target domain (test data). We use the theoretical formulation by Ben-David
et al. [4] shown below to justify the respective claims in Conjecture-1:
ϵt(f)≤ϵs(f) + 1
2dF∆F(s, t) + λ(1)
(i)
The use of a more diverse or larger source dataset reduces overfitting, improving the performance
of the network on the source distribution. From Eq.1, expected error on the target distribution
ϵt
(test set in this case) is upper bounded by the expected error on the source distribution
ϵs
(augmented dataset) along with other terms. Therefore, improved performance on the
augmented distribution can improve the performance on the test set as well.
(ii)
The expected error on the target distribution
ϵt
is upper bounded by the distribution shift
between the source and target distributions
1
2dF∆F(s, t)
along with other terms. Therefore,
a larger domain shift between the augmented and test data distributions can indeed limit the
performance gains on the test set.
(iii)
The constant
λ
in Eq.1measures the risk of the optimal joint classifier:
λ= min
f∈F ϵs(f) + ϵt(f)
.
Neural Networks with a higher capacity can minimize the expected risk on the source set
ϵs
and
the risk of the optimal joint classifier effectively. Therefore, capacity of the Neural Network
and complexity of the task influence the gains that can be obtained using augmentations.
4.2 Analysing the role of Augmentations in Adversarial Training
We analyse the trade-off between the factors described in Conjecture-1 for adversarial training when
compared to standard ERM training. In addition to the goal of improving accuracy on clean samples,
adversarial training aims to achieve local smoothness of the loss landscape as well. Hence, the
complexity of Adversarial Training is higher than that of standard ERM training, making it important
to use larger model capacities to obtain gains using data augmentations (based on Conjecture-1 (iii)).
This justifies the gains obtained by Rebuffi et al. [
37
] on the WRN-70-16 architecture by using
CutMix based augmentations (2.9% higher robust accuracy and 1.23% higher clean accuracy). The
same method does not obtain significant gains on smaller architectures such as ResNet-18 where a
1.76% boost in robust accuracy is accompanied by a 2.55% drop in clean accuracy.
Secondly, while the distribution shift between augmented data and test data (
1
2dF∆F(s, t)
) may be
sufficiently low for natural images leading to improved generalization to test set (Conjecture-1(i,ii)),
the same may not be true for adversarial images. There is a large difference between the augmented
data and test data in pixel space, although they may be similar in feature space. Since adversarial
attacks perturb images in pixel space, the distribution shift between the corresponding perturbations
widens further as shown in Fig.7,8and 9. Based on Conjecture-1(ii), unless this difference is
accounted for, complex augmentations cannot improve the performance of adversarial training. (Ref.
Appendix-B). This trend has also been observed empirically by Rebuffi et al. [
37
], based on which
they conclude that the augmentations designed for robustness need to preserve low-level features.
We present the performance of Adversarial Training by using either Base augmentations (Pad+Crop,
Flip) or AutoAugment [
10
] during training and inference on the CIFAR-10 dataset using ResNet-18
and WideResNet-34-10 architectures in Table-1. Firstly, we note that by using AutoAugment during
training alone, robust accuracy on the test set drops by
∼1.5−2%
which is as observed in prior
work [
18
]. Secondly, the clean and robust accuracy drop by
∼6.5%
when augmented images are
4