REVISITING ADAPTERS WITH ADVERSARIAL TRAINING Sylvestre-Alvise Rebuffi Francesco Croce Sven Gowal DeepMind London

2025-04-29 0 0 1.06MB 18 页 10玖币
侵权投诉
REVISITING ADAPTERS WITH ADVERSARIAL TRAINING
Sylvestre-Alvise Rebuffi, Francesco Croce& Sven Gowal
DeepMind, London
{sylvestre,sgowal}@deepmind.com
ABSTRACT
While adversarial training is generally used as a defense mechanism, recent works
show that it can also act as a regularizer. By co-training a neural network on clean
and adversarial inputs, it is possible to improve classification accuracy on the clean,
non-adversarial inputs. We demonstrate that, contrary to previous findings, it is
not necessary to separate batch statistics when co-training on clean and adversarial
inputs, and that it is sufficient to use adapters with few domain-specific parameters
for each type of input. We establish that using the classification token of a Vision
Transformer (VIT) as an adapter is enough to match the classification performance
of dual normalization layers, while using significantly less additional parameters.
First, we improve upon the top-1 accuracy of a non-adversarially trained VIT-B16
model by +1.12% on IMAGENET (reaching 83.76% top-1 accuracy). Second, and
more importantly, we show that training with adapters enables model soups through
linear combinations of the clean and adversarial tokens. These model soups, which
we call adversarial model soups, allow us to trade-off between clean and robust
accuracy without sacrificing efficiency. Finally, we show that we can easily adapt
the resulting models in the face of distribution shifts. Our VIT-B16 obtains top-1
accuracies on IMAGENET variants that are on average +4.00% better than those
obtained with Masked Autoencoders.
1 INTRODUCTION
Neural networks are inherently susceptible to adversarial perturbations. Adversarial perturbations
fool neural networks by adding an imperceptible amount of noise which leads to an incorrect
prediction with high confidence (Carlini & Wagner,2017;Goodfellow et al.,2015;Kurakin et al.,
2016b;Szegedy et al.,2014). There has been a lot of work on building defenses against adversarial
perturbations (Papernot et al.,2016;Kannan et al.,2018); the most commonly used defense is
adversarial training as proposed by Madry et al. (2018) and its variants (Zhang et al.,2019;Pang
et al.,2020;Huang et al.,2020;Rice et al.,2020;Gowal et al.,2020), which use adversarially
perturbed images at each training step as training data. Earlier studies (Kurakin et al.,2016a;Xie
et al.,2019b) showed that using adversarial samples during training leads to performance degradation
on clean images. However, AdvProp (Xie et al.,2019a) challenged this observation by showing that
adversarial training can act as a regularizer, and therefore improve nominal accuracy, when using
dual batch normalization (BatchNorm) layers (Ioffe & Szegedy,2015) to disentangle the clean and
adversarial distributions.
We draw attention to the broad similarity between the AdvProp approach and the adapters literature
(Rebuffi et al.,2017;Houlsby et al.,2019) where a single backbone network is trained on multiple
domains by means of adapters, where a few parameters specific to each domain are trained separately
while the rest of the parameters are shared. In light of this comparison, we further develop the
line of work introduced by AdvProp and analyze it from an adapter perspective. In particular, we
explore various adapters and aim to obtain the best classification performance with minimal additional
parameters. Our contributions are as follows:
We show that, in order to benefit from co-training on clean and adversarial samples, it is not
necessary to separate the batch statistics of clean and adversarial images in BatchNorm layers. We
demonstrate empirically that it is enough to use domain specific trainable parameters to achieve
similar results.
Work done during an internship at DeepMind
1
arXiv:2210.04886v1 [cs.CV] 10 Oct 2022
Inspired by the adapters literature, we evaluate various adapters. We show that training separate
classification tokens of a VIT for the clean and adversarial domains is enough to match the
classification performance of dual normalization layers with
49×
fewer domain specific parameters.
This classification token acts as a conditioning token which can modify the behaviour of the network
to be either in clean or robust mode (Figure 1).
Unlike Xie et al. (2019a) and Herrmann et al. (2022), we also aim at preserving the robust
performance of the network against adversarial attacks. We show that our conditional token can
obtain SOTA nominal accuracy in the clean mode while at the same time achieving competitive
`
-robustness in the robust mode. As a by-product of our study, we show that adversarial
training of VIT-B16 on IMAGENET leads to state-of-the-art robustness against
`
-norm bounded
perturbations of size 4/255.
We empirically demonstrate that training with adapters enables model soups (Wortsman et al.,
2022). This allow us to introduce adversarial model soups, models that trade-off between clean
and robust accuracy through linear interpolation of the clean and adversarial adapters. To the
best of our knowledge, our work is the first to study adversarial model soups. We also show
that adversarial model soups perform better on IMAGENET variants than the state-of-the-art with
masked auto-encoding (He et al.,2022).
2 RELATED WORK
Adversarial training.
Although more recent approaches have been proposed, the most successful
method to reduce the vulnerability of image classifiers to adversarial attacks is adversarial training,
which generates on-the-fly adversarial counterparts for the training images and uses them to augment
the training set (Croce et al.,2020). Goodfellow et al. (2015) used the single-step Fast Gradient Sign
Method (FGSM) attack to craft such adversarial images. Later, Madry et al. (2018) found that using
iterative Projected Gradient Descent (PGD) yields models robust to stronger attacks. Their scheme
has been subsequently improved by several modifications, e.g. a different loss function (Zhang et al.,
2019), unlabelled or synthetic data (Carmon et al.,2019;Uesato et al.,2019;Gowal et al.,2021),
model weight averaging (Gowal et al.,2020), adversarial weight perturbations (Wu et al.,2020), and
better data augmentation (Rebuffi et al.,2021). While the main drawback of adversarial training is
the degradation of performance of robust models on clean images (Tsipras et al.,2018), Xie et al.
(2019a) showed that adversarial images can be leveraged as a strong regularizer to improve the clean
accuracy of classifiers on IMAGENET. In particular, they propose AdvProp, which introduces separate
BatchNorm layers specific to clean or adversarial inputs, with the remaining layers being shared. This
approach and the role of normalization layers when training with both clean and adversarial points
has been further studied by Xie & Yuille (2019); Walter et al. (2022). Recently, Wang et al. (2022)
suggest removing BatchNorm layers from the standard RESNET architecture (He et al.,2016) to
retain high clean accuracy with adversarial training, but this negatively affects the robustness against
stronger attacks.
1
Finally, Kireev et al. (2021); Herrmann et al. (2022) showed that carefully tuning
the threat model in adversarial training might improve the performance on clean images and in the
presence of distribution shifts, such as common corruptions (Hendrycks & Dietterich,2018).
Adapters.
In early work on deep neural networks, Caruana (1997) showed that sharing network
parameters among tasks acts as a regularizer. Aiming at a more efficient parameter sharing, Rebuffi
et al. (2017); Rosenfeld & Tsotsos (2018) introduced adapters – small training modules specific to
each task which can be stitched all along the network. In other lines of work, Mallya et al. (2018);
Mancini et al. (2018) adapt a model to new tasks using efficient weight masking and Li et al. (2016);
Maria Carlucci et al. (2017) perform domain adaptation by batch statistics modulation. While these
approaches require having as many adapters as tasks, Perez et al. (2018) propose an adapter layer
whose weights are generated by a conditioning network. Besides computer vision, adapters are also
used in natural language processing for efficient fine-tuning (Houlsby et al.,2019;Pfeiffer et al.,
2020;Wang et al.,2020) and multi-task learning (Stickland & Murray,2019).
Merging multiple models.
While ensembles are a popular and successful way to combine multiple
independently trained classifiers to improve on individual performance (Ovadia et al.,2019;Gontijo-
Lopes et al.,2021), they increase the inference cost as they require a forward pass for each sub-network
1See https://github.com/amazon-research/normalizer- free-robust-training/issues/2.
2
Clean
Embedder
IMG
Transformer blocks
(a) Clean mode
Adv
Embedder
IMG
Transformer blocks
(b) Robust mode
+ (1-β)
Clean
Embedder
IMG
Adv
β
Transformer blocks
(c) Model soup
Figure 1: Classification token as adapter.
The image is embedded into visual tokens (in pink). The behaviour
of the model can set to the clean mode,robust mode or a model soup by respectively using the clean token (in
blue), the adversarial token (in orange) or a linear combination of these two tokens. The parameters of the
embedder, the transformer blocks and the classifier are shared between modes.
of the ensemble. An alternative approach is taken by Wortsman et al. (2022) who propose to fine-
tune a fully trained model with different hyperparameter configurations and then average the entire
set of weights of the various networks. The obtained model soups get better performance than
each individual model and even ensembles. Model soups are in spirit similar to Stochastic Weight
Averaging (Izmailov et al.,2018) which consists in averaging weights along an optimization trajectory
rather than averaging over independent runs.
3 METHOD
3.1 CO-TRAINING WITH NOMINAL AND ADVERSARIAL TRAINING
Goodfellow et al. (2015) propose adversarial training as a way to regularize standard training. They
jointly optimize the model parameters θon clean and adversarial images with the co-training loss
αL(f(x;θ), y) + (1 α) max
δSL(f(x+δ;θ), y),(1)
where pairs of associated examples
x
and labels
y
are sampled from the training dataset,
f(·;θ)
is a model parametrized by
θ
,
L
defines the loss function (such as the cross-entropy loss in the
classification context), and
S
is the set of allowed perturbations. Setting
α= 1
boils down to nominal
training on clean images and setting
α= 0
leads to adversarial training as defined by Madry et al.
(2018). In our case, we consider
`
norm-bounded perturbations of size
= 4/255
, so we have
S={δ| kδk}, and we use untargeted attacks to generate the adversarial perturbations δ(see
details in Section 4).
3.2 SEPARATING BATCH STATISTICS IS NOT NECESSARY
BatchNorm is a widely used normalization layer shown to improve performance and training stability
of image classifiers (Ioffe & Szegedy,2015). We recall that a BatchNorm layer, given a batch as input,
first normalizes it by subtracting the mean and dividing by the standard deviation computed over
the entire batch, then it applies an affine transformation, with learnable scale and offset parameters.
During training, it accumulates these so-called batch statistics to use during test time, so that the
output of the classifier for each image is independent of the other images in the batch. The batch
statistics can be seen an approximation of the statistics over the image distribution.
Xie et al. (2019a) show that optimizing the co-training loss in Eq. 1can yield worse results on clean
images than simple nominal training. This is especially the case when the network has a low capacity
or the attack (i.e., the inner maximization) is too strong (such as using a large perturbation radius
).
To solve this issue, they propose AdvProp, which consists in using distinct BatchNorm layers for
clean and adversarial images. They argue that “maintaining one set of [BatchNorm] statistics results
in incorrect statistics estimation”, which could be the reason for the performance degradation. We
note that using two sets of BatchNorm layers for the clean and adversarial samples as in AdvProp
3
Figure 2: Dual parameters are enough.
We report the clean (solid lines) and robust accuracy (dashed lines)
over training steps of RESNET-50 trained on IMAGENET with the co-training loss of Eq. 1(
= 4/255
): for
models with dual layers. Clean accuracy refers to the clean mode and the robust accuracy to the robust mode.
Left panel. We compare models with different normalization layers with no domain-specific parameters (Shared
BatchNorm, Shared LayerNorm, Shared GroupNorm) to Dual BatchNorm as proposed by Xie et al. (2019a):
regardless of the type of normalization, the robustness of classifiers without dual layers drops to (almost) zero
at the end of training. Right panel. We use domain-specific normalization layers (Dual BatchNorm, Dual
LayerNorm, Dual GroupNorm) and a model with BatchNorm with shared batch statistics but domain-specific
scale and offset (DualParams BatchNorm): all models achieve high clean and robust accuracy.
creates two sets of batch statistics but also two sets of learnable scale and offset parameters. In
the following we investigate whether having separate batch statistics is a necessary condition for
successful co-training.
Figure 2shows the clean and robust accuracy of various model architectures as training progresses.
The left panel demonstrates that, if we share both batch statistics and scales/offsets (Shared Batch-
Norm, orange curves), the robust accuracy (orange dashed line) quickly drops, far from the one
obtained by AdvProp (Dual BatchNorm, blue curve) which is above
34%
. However, if we use a
single set of batch statistics but specific scales and offsets for clean and adversarial images, we can
observe on the right panel of Figure 2that the robust accuracy (DualParams BatchNorm, orange
dashed line) matches the one (blue dashed line) obtained by AdvProp. This demonstrates that it is
possible to achieve nominal and robust classification results similar to those of AdvProp without
separate batch statistics.
Furthermore, there exist normalization layers such as LayerNorm (Ba et al.,2016) or GroupNorm
(Wu & He,2018) which do not use batch statistics, as their normalization step is done per sample and
not per batch. Hence, according to the hypothesis of Xie et al. (2019a), these types of normalization
layer should not suffer from performance degradation. Nevertheless, the left panel of Figure 2shows
that their robust accuracy (green and red dashed lines) does not match the robust accuracy of AdvProp
(Dual BatchNorm), and is unstable over training steps. However, by making the scales and offsets of
LayerNorm and GroupNorm specific to clean and adversarial images, their robust accuracy matches
that obtained with dual BatchNorm layers, as shown in the right panel of Figure 2. This suggests that
a key element to make the co-training loss of Eq. 1work for various normalization layers is to have
trainable parameters which are specific to the clean and adversarial images.2
3.3 REVISITING ADAPTERS WITH ADVERSARIAL TRAINING
The last observation strongly relates this setting to the adapters literature where a single backbone
architecture has some parameters, called adapters, which are specific to different domains while
the rest of the parameters are shared among tasks. In our case, the clean images form one domain
and the adversarial images constitute another domain. In this work, we go beyond having separate
normalization layers for the clean and adversarial images and investigate other types of adapters.
2
Interestingly, contrary to our observation that standard GroupNorm fails to retain robustness, Xie & Yuille
(2019) report that GroupNorm matches Dual BatchNorm. We explain this difference as we use a stronger
untargeted attack in this manuscript compared to the targeted attack of Xie & Yuille (2019). Using a stronger
attack allows us to reveal failure modes that would have been hidden otherwise.
4
摘要:

REVISITINGADAPTERSWITHADVERSARIALTRAININGSylvestre-AlviseRebuf,FrancescoCroce&SvenGowalDeepMind,London{sylvestre,sgowal}@deepmind.comABSTRACTWhileadversarialtrainingisgenerallyusedasadefensemechanism,recentworksshowthatitcanalsoactasaregularizer.Byco-traininganeuralnetworkoncleanandadversarialinpu...

展开>> 收起<<
REVISITING ADAPTERS WITH ADVERSARIAL TRAINING Sylvestre-Alvise Rebuffi Francesco Croce Sven Gowal DeepMind London.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.06MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注