REVISITING ADAPTERS WITH ADVERSARIAL TRAINING Sylvestre-Alvise Rebufﬁ Francesco Croce Sven Gowal DeepMind London

2025-04-29 0 0 1.06MB 18 页 10玖币

侵权投诉

REVISITING ADAPTERS WITH ADVERSARIAL TRAINING

Sylvestre-Alvise Rebufﬁ, Francesco Croce∗& Sven Gowal

DeepMind, London

{sylvestre,sgowal}@deepmind.com

ABSTRACT

While adversarial training is generally used as a defense mechanism, recent works

show that it can also act as a regularizer. By co-training a neural network on clean

and adversarial inputs, it is possible to improve classiﬁcation accuracy on the clean,

non-adversarial inputs. We demonstrate that, contrary to previous ﬁndings, it is

not necessary to separate batch statistics when co-training on clean and adversarial

inputs, and that it is sufﬁcient to use adapters with few domain-speciﬁc parameters

for each type of input. We establish that using the classiﬁcation token of a Vision

Transformer (VIT) as an adapter is enough to match the classiﬁcation performance

of dual normalization layers, while using signiﬁcantly less additional parameters.

First, we improve upon the top-1 accuracy of a non-adversarially trained VIT-B16

model by +1.12% on IMAGENET (reaching 83.76% top-1 accuracy). Second, and

more importantly, we show that training with adapters enables model soups through

linear combinations of the clean and adversarial tokens. These model soups, which

we call adversarial model soups, allow us to trade-off between clean and robust

accuracy without sacriﬁcing efﬁciency. Finally, we show that we can easily adapt

the resulting models in the face of distribution shifts. Our VIT-B16 obtains top-1

accuracies on IMAGENET variants that are on average +4.00% better than those

obtained with Masked Autoencoders.

1 INTRODUCTION

Neural networks are inherently susceptible to adversarial perturbations. Adversarial perturbations

fool neural networks by adding an imperceptible amount of noise which leads to an incorrect

prediction with high conﬁdence (Carlini & Wagner,2017;Goodfellow et al.,2015;Kurakin et al.,

2016b;Szegedy et al.,2014). There has been a lot of work on building defenses against adversarial

perturbations (Papernot et al.,2016;Kannan et al.,2018); the most commonly used defense is

adversarial training as proposed by Madry et al. (2018) and its variants (Zhang et al.,2019;Pang

et al.,2020;Huang et al.,2020;Rice et al.,2020;Gowal et al.,2020), which use adversarially

perturbed images at each training step as training data. Earlier studies (Kurakin et al.,2016a;Xie

et al.,2019b) showed that using adversarial samples during training leads to performance degradation

on clean images. However, AdvProp (Xie et al.,2019a) challenged this observation by showing that

adversarial training can act as a regularizer, and therefore improve nominal accuracy, when using

dual batch normalization (BatchNorm) layers (Ioffe & Szegedy,2015) to disentangle the clean and

adversarial distributions.

We draw attention to the broad similarity between the AdvProp approach and the adapters literature

(Rebufﬁ et al.,2017;Houlsby et al.,2019) where a single backbone network is trained on multiple

domains by means of adapters, where a few parameters speciﬁc to each domain are trained separately

while the rest of the parameters are shared. In light of this comparison, we further develop the

line of work introduced by AdvProp and analyze it from an adapter perspective. In particular, we

explore various adapters and aim to obtain the best classiﬁcation performance with minimal additional

parameters. Our contributions are as follows:

•

We show that, in order to beneﬁt from co-training on clean and adversarial samples, it is not

necessary to separate the batch statistics of clean and adversarial images in BatchNorm layers. We

demonstrate empirically that it is enough to use domain speciﬁc trainable parameters to achieve

similar results.

∗Work done during an internship at DeepMind

arXiv:2210.04886v1 [cs.CV] 10 Oct 2022

•

Inspired by the adapters literature, we evaluate various adapters. We show that training separate

classiﬁcation tokens of a VIT for the clean and adversarial domains is enough to match the

classiﬁcation performance of dual normalization layers with

49×

fewer domain speciﬁc parameters.

This classiﬁcation token acts as a conditioning token which can modify the behaviour of the network

to be either in clean or robust mode (Figure 1).

•

Unlike Xie et al. (2019a) and Herrmann et al. (2022), we also aim at preserving the robust

performance of the network against adversarial attacks. We show that our conditional token can

obtain SOTA nominal accuracy in the clean mode while at the same time achieving competitive

`∞

-robustness in the robust mode. As a by-product of our study, we show that adversarial

training of VIT-B16 on IMAGENET leads to state-of-the-art robustness against

`∞

-norm bounded

perturbations of size 4/255.

•

We empirically demonstrate that training with adapters enables model soups (Wortsman et al.,

2022). This allow us to introduce adversarial model soups, models that trade-off between clean

and robust accuracy through linear interpolation of the clean and adversarial adapters. To the

best of our knowledge, our work is the ﬁrst to study adversarial model soups. We also show

that adversarial model soups perform better on IMAGENET variants than the state-of-the-art with

masked auto-encoding (He et al.,2022).

2 RELATED WORK

Adversarial training.

Although more recent approaches have been proposed, the most successful

method to reduce the vulnerability of image classiﬁers to adversarial attacks is adversarial training,

which generates on-the-ﬂy adversarial counterparts for the training images and uses them to augment

the training set (Croce et al.,2020). Goodfellow et al. (2015) used the single-step Fast Gradient Sign

Method (FGSM) attack to craft such adversarial images. Later, Madry et al. (2018) found that using

iterative Projected Gradient Descent (PGD) yields models robust to stronger attacks. Their scheme

has been subsequently improved by several modiﬁcations, e.g. a different loss function (Zhang et al.,

2019), unlabelled or synthetic data (Carmon et al.,2019;Uesato et al.,2019;Gowal et al.,2021),

model weight averaging (Gowal et al.,2020), adversarial weight perturbations (Wu et al.,2020), and

better data augmentation (Rebufﬁ et al.,2021). While the main drawback of adversarial training is

the degradation of performance of robust models on clean images (Tsipras et al.,2018), Xie et al.

(2019a) showed that adversarial images can be leveraged as a strong regularizer to improve the clean

accuracy of classiﬁers on IMAGENET. In particular, they propose AdvProp, which introduces separate

BatchNorm layers speciﬁc to clean or adversarial inputs, with the remaining layers being shared. This

approach and the role of normalization layers when training with both clean and adversarial points

has been further studied by Xie & Yuille (2019); Walter et al. (2022). Recently, Wang et al. (2022)

suggest removing BatchNorm layers from the standard RESNET architecture (He et al.,2016) to

retain high clean accuracy with adversarial training, but this negatively affects the robustness against

stronger attacks.

Finally, Kireev et al. (2021); Herrmann et al. (2022) showed that carefully tuning

the threat model in adversarial training might improve the performance on clean images and in the

presence of distribution shifts, such as common corruptions (Hendrycks & Dietterich,2018).

Adapters.

In early work on deep neural networks, Caruana (1997) showed that sharing network

parameters among tasks acts as a regularizer. Aiming at a more efﬁcient parameter sharing, Rebufﬁ

et al. (2017); Rosenfeld & Tsotsos (2018) introduced adapters – small training modules speciﬁc to

each task which can be stitched all along the network. In other lines of work, Mallya et al. (2018);

Mancini et al. (2018) adapt a model to new tasks using efﬁcient weight masking and Li et al. (2016);

Maria Carlucci et al. (2017) perform domain adaptation by batch statistics modulation. While these

approaches require having as many adapters as tasks, Perez et al. (2018) propose an adapter layer

whose weights are generated by a conditioning network. Besides computer vision, adapters are also

used in natural language processing for efﬁcient ﬁne-tuning (Houlsby et al.,2019;Pfeiffer et al.,

2020;Wang et al.,2020) and multi-task learning (Stickland & Murray,2019).

Merging multiple models.

While ensembles are a popular and successful way to combine multiple

independently trained classiﬁers to improve on individual performance (Ovadia et al.,2019;Gontijo-

Lopes et al.,2021), they increase the inference cost as they require a forward pass for each sub-network

1See https://github.com/amazon-research/normalizer- free-robust-training/issues/2.

Clean

Embedder

IMG

Transformer blocks

(a) Clean mode

Adv

Embedder

IMG

Transformer blocks

(b) Robust mode

+ (1-β)

Clean

Embedder

IMG

Adv

Transformer blocks

Figure 1: Classiﬁcation token as adapter.

The image is embedded into visual tokens (in pink). The behaviour

of the model can set to the clean mode,robust mode or a model soup by respectively using the clean token (in

blue), the adversarial token (in orange) or a linear combination of these two tokens. The parameters of the

embedder, the transformer blocks and the classiﬁer are shared between modes.

of the ensemble. An alternative approach is taken by Wortsman et al. (2022) who propose to ﬁne-

tune a fully trained model with different hyperparameter conﬁgurations and then average the entire

set of weights of the various networks. The obtained model soups get better performance than

each individual model and even ensembles. Model soups are in spirit similar to Stochastic Weight

Averaging (Izmailov et al.,2018) which consists in averaging weights along an optimization trajectory

rather than averaging over independent runs.

3 METHOD

3.1 CO-TRAINING WITH NOMINAL AND ADVERSARIAL TRAINING

Goodfellow et al. (2015) propose adversarial training as a way to regularize standard training. They

jointly optimize the model parameters θon clean and adversarial images with the co-training loss

αL(f(x;θ), y) + (1 −α) max

δ∈SL(f(x+δ;θ), y),(1)

where pairs of associated examples

and labels

are sampled from the training dataset,

f(·;θ)

is a model parametrized by

deﬁnes the loss function (such as the cross-entropy loss in the

classiﬁcation context), and

is the set of allowed perturbations. Setting

α= 1

boils down to nominal

training on clean images and setting

α= 0

leads to adversarial training as deﬁned by Madry et al.

(2018). In our case, we consider

`∞

norm-bounded perturbations of size

= 4/255

, so we have

S={δ| kδk∞≤}, and we use untargeted attacks to generate the adversarial perturbations δ(see

details in Section 4).

3.2 SEPARATING BATCH STATISTICS IS NOT NECESSARY

BatchNorm is a widely used normalization layer shown to improve performance and training stability

of image classiﬁers (Ioffe & Szegedy,2015). We recall that a BatchNorm layer, given a batch as input,

ﬁrst normalizes it by subtracting the mean and dividing by the standard deviation computed over

the entire batch, then it applies an afﬁne transformation, with learnable scale and offset parameters.

During training, it accumulates these so-called batch statistics to use during test time, so that the

output of the classiﬁer for each image is independent of the other images in the batch. The batch

statistics can be seen an approximation of the statistics over the image distribution.

Xie et al. (2019a) show that optimizing the co-training loss in Eq. 1can yield worse results on clean

images than simple nominal training. This is especially the case when the network has a low capacity

or the attack (i.e., the inner maximization) is too strong (such as using a large perturbation radius



To solve this issue, they propose AdvProp, which consists in using distinct BatchNorm layers for

clean and adversarial images. They argue that “maintaining one set of [BatchNorm] statistics results

in incorrect statistics estimation”, which could be the reason for the performance degradation. We

note that using two sets of BatchNorm layers for the clean and adversarial samples as in AdvProp

Figure 2: Dual parameters are enough.

We report the clean (solid lines) and robust accuracy (dashed lines)

over training steps of RESNET-50 trained on IMAGENET with the co-training loss of Eq. 1(

= 4/255

): for

models with dual layers. Clean accuracy refers to the clean mode and the robust accuracy to the robust mode.

Left panel. We compare models with different normalization layers with no domain-speciﬁc parameters (Shared

BatchNorm, Shared LayerNorm, Shared GroupNorm) to Dual BatchNorm as proposed by Xie et al. (2019a):

regardless of the type of normalization, the robustness of classiﬁers without dual layers drops to (almost) zero

at the end of training. Right panel. We use domain-speciﬁc normalization layers (Dual BatchNorm, Dual

LayerNorm, Dual GroupNorm) and a model with BatchNorm with shared batch statistics but domain-speciﬁc

scale and offset (DualParams BatchNorm): all models achieve high clean and robust accuracy.

creates two sets of batch statistics but also two sets of learnable scale and offset parameters. In

the following we investigate whether having separate batch statistics is a necessary condition for

successful co-training.

Figure 2shows the clean and robust accuracy of various model architectures as training progresses.

The left panel demonstrates that, if we share both batch statistics and scales/offsets (Shared Batch-

Norm, orange curves), the robust accuracy (orange dashed line) quickly drops, far from the one

obtained by AdvProp (Dual BatchNorm, blue curve) which is above

34%

. However, if we use a

single set of batch statistics but speciﬁc scales and offsets for clean and adversarial images, we can

observe on the right panel of Figure 2that the robust accuracy (DualParams BatchNorm, orange

dashed line) matches the one (blue dashed line) obtained by AdvProp. This demonstrates that it is

possible to achieve nominal and robust classiﬁcation results similar to those of AdvProp without

separate batch statistics.

Furthermore, there exist normalization layers such as LayerNorm (Ba et al.,2016) or GroupNorm

(Wu & He,2018) which do not use batch statistics, as their normalization step is done per sample and

not per batch. Hence, according to the hypothesis of Xie et al. (2019a), these types of normalization

layer should not suffer from performance degradation. Nevertheless, the left panel of Figure 2shows

that their robust accuracy (green and red dashed lines) does not match the robust accuracy of AdvProp

(Dual BatchNorm), and is unstable over training steps. However, by making the scales and offsets of

LayerNorm and GroupNorm speciﬁc to clean and adversarial images, their robust accuracy matches

that obtained with dual BatchNorm layers, as shown in the right panel of Figure 2. This suggests that

a key element to make the co-training loss of Eq. 1work for various normalization layers is to have

trainable parameters which are speciﬁc to the clean and adversarial images.2

3.3 REVISITING ADAPTERS WITH ADVERSARIAL TRAINING

The last observation strongly relates this setting to the adapters literature where a single backbone

architecture has some parameters, called adapters, which are speciﬁc to different domains while

the rest of the parameters are shared among tasks. In our case, the clean images form one domain

and the adversarial images constitute another domain. In this work, we go beyond having separate

normalization layers for the clean and adversarial images and investigate other types of adapters.

Interestingly, contrary to our observation that standard GroupNorm fails to retain robustness, Xie & Yuille

(2019) report that GroupNorm matches Dual BatchNorm. We explain this difference as we use a stronger

untargeted attack in this manuscript compared to the targeted attack of Xie & Yuille (2019). Using a stronger

attack allows us to reveal failure modes that would have been hidden otherwise.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

REVISITINGADAPTERSWITHADVERSARIALTRAININGSylvestre-AlviseRebuf,FrancescoCroce&SvenGowalDeepMind,London{sylvestre,sgowal}@deepmind.comABSTRACTWhileadversarialtrainingisgenerallyusedasadefensemechanism,recentworksshowthatitcanalsoactasaregularizer.Byco-traininganeuralnetworkoncleanandadversarialinpu...

展开>> 收起<<

REVISITING ADAPTERS WITH ADVERSARIAL TRAINING Sylvestre-Alvise Rebufﬁ Francesco Croce Sven Gowal DeepMind London.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

REVISITING ADAPTERS WITH ADVERSARIAL TRAINING Sylvestre-Alvise Rebufﬁ Francesco Croce Sven Gowal DeepMind London

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: