
•
Inspired by the adapters literature, we evaluate various adapters. We show that training separate
classification tokens of a VIT for the clean and adversarial domains is enough to match the
classification performance of dual normalization layers with
49×
fewer domain specific parameters.
This classification token acts as a conditioning token which can modify the behaviour of the network
to be either in clean or robust mode (Figure 1).
•
Unlike Xie et al. (2019a) and Herrmann et al. (2022), we also aim at preserving the robust
performance of the network against adversarial attacks. We show that our conditional token can
obtain SOTA nominal accuracy in the clean mode while at the same time achieving competitive
`∞
-robustness in the robust mode. As a by-product of our study, we show that adversarial
training of VIT-B16 on IMAGENET leads to state-of-the-art robustness against
`∞
-norm bounded
perturbations of size 4/255.
•
We empirically demonstrate that training with adapters enables model soups (Wortsman et al.,
2022). This allow us to introduce adversarial model soups, models that trade-off between clean
and robust accuracy through linear interpolation of the clean and adversarial adapters. To the
best of our knowledge, our work is the first to study adversarial model soups. We also show
that adversarial model soups perform better on IMAGENET variants than the state-of-the-art with
masked auto-encoding (He et al.,2022).
2 RELATED WORK
Adversarial training.
Although more recent approaches have been proposed, the most successful
method to reduce the vulnerability of image classifiers to adversarial attacks is adversarial training,
which generates on-the-fly adversarial counterparts for the training images and uses them to augment
the training set (Croce et al.,2020). Goodfellow et al. (2015) used the single-step Fast Gradient Sign
Method (FGSM) attack to craft such adversarial images. Later, Madry et al. (2018) found that using
iterative Projected Gradient Descent (PGD) yields models robust to stronger attacks. Their scheme
has been subsequently improved by several modifications, e.g. a different loss function (Zhang et al.,
2019), unlabelled or synthetic data (Carmon et al.,2019;Uesato et al.,2019;Gowal et al.,2021),
model weight averaging (Gowal et al.,2020), adversarial weight perturbations (Wu et al.,2020), and
better data augmentation (Rebuffi et al.,2021). While the main drawback of adversarial training is
the degradation of performance of robust models on clean images (Tsipras et al.,2018), Xie et al.
(2019a) showed that adversarial images can be leveraged as a strong regularizer to improve the clean
accuracy of classifiers on IMAGENET. In particular, they propose AdvProp, which introduces separate
BatchNorm layers specific to clean or adversarial inputs, with the remaining layers being shared. This
approach and the role of normalization layers when training with both clean and adversarial points
has been further studied by Xie & Yuille (2019); Walter et al. (2022). Recently, Wang et al. (2022)
suggest removing BatchNorm layers from the standard RESNET architecture (He et al.,2016) to
retain high clean accuracy with adversarial training, but this negatively affects the robustness against
stronger attacks.
1
Finally, Kireev et al. (2021); Herrmann et al. (2022) showed that carefully tuning
the threat model in adversarial training might improve the performance on clean images and in the
presence of distribution shifts, such as common corruptions (Hendrycks & Dietterich,2018).
Adapters.
In early work on deep neural networks, Caruana (1997) showed that sharing network
parameters among tasks acts as a regularizer. Aiming at a more efficient parameter sharing, Rebuffi
et al. (2017); Rosenfeld & Tsotsos (2018) introduced adapters – small training modules specific to
each task which can be stitched all along the network. In other lines of work, Mallya et al. (2018);
Mancini et al. (2018) adapt a model to new tasks using efficient weight masking and Li et al. (2016);
Maria Carlucci et al. (2017) perform domain adaptation by batch statistics modulation. While these
approaches require having as many adapters as tasks, Perez et al. (2018) propose an adapter layer
whose weights are generated by a conditioning network. Besides computer vision, adapters are also
used in natural language processing for efficient fine-tuning (Houlsby et al.,2019;Pfeiffer et al.,
2020;Wang et al.,2020) and multi-task learning (Stickland & Murray,2019).
Merging multiple models.
While ensembles are a popular and successful way to combine multiple
independently trained classifiers to improve on individual performance (Ovadia et al.,2019;Gontijo-
Lopes et al.,2021), they increase the inference cost as they require a forward pass for each sub-network
1See https://github.com/amazon-research/normalizer- free-robust-training/issues/2.
2