Symmetry Defense Against CNN Adversarial Perturbation Attacks Blerta Lindqvist0000000249502250

2025-05-02 0 0 733.22KB 19 页 10玖币

侵权投诉

Symmetry Defense Against

CNN Adversarial Perturbation Attacks

Blerta Lindqvist[0000−0002−4950−2250]

Aalto University, Espoo, Finland

blerta.lindqvist@aalto.fi

Abstract. This paper uses symmetry to make Convolutional Neural

Network classiﬁers (CNNs) robust against adversarial perturbation at-

tacks. Such attacks add perturbation to original images to generate ad-

versarial images that fool classiﬁers such as road sign classiﬁers of au-

tonomous vehicles. Although symmetry is a pervasive aspect of the nat-

ural world, CNNs are unable to handle symmetry well. For example, a

CNN can classify an image diﬀerently from its mirror image. For an ad-

versarial image that misclassiﬁes with a wrong label lw, CNN inability

to handle symmetry means that a symmetric adversarial image can clas-

sify diﬀerently from the wrong label lw. Further than that, we ﬁnd that

the classiﬁcation of a symmetric adversarial image reverts to the correct

label. To classify an image when adversaries are unaware of the defense,

we apply symmetry to the image and use the classiﬁcation label of the

symmetric image. To classify an image when adversaries are aware of the

defense, we use mirror symmetry and pixel inversion symmetry to form

a symmetry group. We apply all the group symmetries to the image and

decide on the output label based on the agreement of any two of the clas-

siﬁcation labels of the symmetry images. Adaptive attacks fail because

they need to rely on loss functions that use conﬂicting CNN output val-

ues for symmetric images. Without attack knowledge, the proposed sym-

metry defense succeeds against both gradient-based and random-search

attacks, with up to near-default accuracies for ImageNet. The defense

even improves the classiﬁcation accuracy of original images.

Keywords: Adversarial perturbation defense ·Symmetry ·CNN adver-

sarial robustness.

1 Introduction

Despite achieving state-of-the-art status in computer vision [24,30], convolutional

neural network classiﬁers (CNNs) lack adversarial robustness because they can

classify imperceptibly perturbed images incorrectly [11,23,36,47]. One of the ﬁrst

and still undefeated defenses against adversarial perturbation attacks is adversar-

ial training (AT) [31,36,47], which uses adversarial images in training. However,

AT reliance on attack knowledge during training [36] is a signiﬁcant drawback

since such knowledge might not be available.

arXiv:2210.04087v3 [cs.LG] 10 Aug 2023

2 Blerta Lindqvist

Attack

Flip

teapot panda teapot

Defense

Perturb

Fig. 1. The ﬂip symmetry defense against zero-knowledge adversaries reverts adver-

sarial images to their correct classiﬁcation by horizontally ﬂipping the images before

classiﬁcation. The defense classiﬁes non-adversarial images in the same way.

Although engineered to incorporate symmetries such as horizontal ﬂipping,

translations, and rotations, CNNs lack invariance with respect to these symme-

tries [19] in the classiﬁcation of datasets such as ImageNet [15], CIFAR10 [29],

MNIST [34]. The CNN lack of invariance means that CNNs can classify images

diﬀerently after they have been horizontally ﬂipped, or even slightly shifted or

rotated [3,19]. Furthermore, CNNs only provide approximate translation invari-

ance [3,4,19,26] and are unable to learn invariances with respect to symmetries

such as rotation and horizontal ﬂipping with data augmentation [3,4,19].

Against adversarial perturbation attacks causing misclassiﬁcation, the CNN

inability to handle symmetry well can be beneﬁcial. Although an adversarial

image classiﬁes with a wrong label, a symmetric adversarial image generated by

applying a symmetry to an adversarial image can classify with a label that is

diﬀerent from the wrong label of the adversarial image. Aiming to classify ad-

versarial images correctly, we ask:

Can we use the CNN inability to handle symmetry correctly for a defense that

provides robustness against adversarial perturbation attacks?

Addressing this question, we design a novel symmetry defense that only uses

symmetry to counter adversarial perturbation attacks. The proposed symmetry

defense makes the following main contributions:

–We show that the proposed symmetry defense succeeds against gradient-

based attacks and a random search attack without using adversarial im-

ages or attack knowledge. In contrast, the current best defense needs attack

knowledge to train the classiﬁer with adversarial images.

–The symmetry defense counters zero-knowledge adversaries with near-default

accuracies by using either the horizontal ﬂip symmetry or an artiﬁcial pixel

inversion symmetry. Results are shown in Table 1 and in Table 2.

–The defense also counters perfect-knowledge adversaries with near-default

accuracies, as shown in Table 4. Against such adversaries, the defense uses

a symmetry subgroup that consists of the identity symmetry, the mirror

symmetry (also called horizontal ﬂip), the pixel inversion symmetry, and the

symmetry that combines the mirror ﬂip and the pixel inversion symmetry.

–The defense counters adaptive attacks that use symmetry against the defense

because an attack loss function applied to symmetric images depends on the

function value of symmetric images, that is, on the CNN output evaluated

at these images. Loss functions measure the distance between the function

Symmetry Defense Against CNN Adversarial Perturbation Attacks 3

output and a label value. Since the function output can be diﬀerent for

symmetric images due to the CNN inability to handle symmetry well, the

optimization of adaptive attacks that incorporate symmetry in their loss

functions is not optimal.

–The usage of the pixel intensity inversion symmetry, discussed in Section 5.1

and in Section 5.2, that does not exist in natural images of the dataset means

that the proposed defense could be applied even to datasets without existing

symmetries.

–The symmetry defense maintains and even exceeds the non-adversarial ac-

curacy against perfect-knowledge adversaries, as shown in Table 4.

2 Related Work and Background

2.1 Symmetry, Equivariance and Invariance in CNNs.

Symmetry of an object is a transformation that leaves that object invariant.

Image symmetries include rotation, horizontal ﬂipping, and inversion [38]. We

provide deﬁnitions related to symmetry groups in Appendix 1.A function fis

equivariant with respect to a transformation Tif they commute with each-

other [44]: f◦T=T◦f.Invariance is a special case of equivariance where the

Ttransformation applied after the function is the identity transformation [44]:

f◦T=f.

CNNs stack equivariant convolution and pooling layers [22] followed by an

invariant map in order to learn invariant functions [5] with respect to symme-

tries, following a standard blueprint used in machine learning [5,25]. Translation

invariance for image classiﬁcation means that the position of an object in an im-

age should not impact its classiﬁcation. To achieve translation invariance, CNN

convolutional layers [30,32] compute feature maps over the translation symmetry

group [21,46] using kernel sliding [21,33]. CNN pooling layers enable local trans-

lation invariance [5,16,22]. The pooling layers of CNNs positioned after convolu-

tional layers enable local invariance to translation [16] because the output of the

pooling operation does not change when the position of features changes within

the pooling region. Cohen and Welling [12] show that convolutional layers, pool-

ing, arbitrary pointwise nonlinearities, batch normalization, and residual blocks

are equivariant to translation. CNNs learn invariance with respect to symme-

tries such as rotations, horizontal ﬂipping, and scaling with data augmentation,

which adds to the training dataset images obtained by applying symmetries to

original images [30]. For ImageNet, data augmentation can consist of a random

crop, horizontal ﬂip, color jitter, and color transforms of original images [18].

CNN Lack of Translation Equivariance. Studies suggest that CNNs are

not equivariant to translation in CNNs [3,4,19,26,49], not even to small trans-

lations or rotations [19]. Bouchacourt et al. [4] claim that the CNN translation

invariance is approximate and that translation invariance is primarily learned

from the data and data augmentation. The cause of translation invariance has

4 Blerta Lindqvist

been attributed to aliasing eﬀects caused by the subsampling of the convolu-

tional stride [3], by max pooling, average pooling, and strides [49], or by image

boundary eﬀects [26].

CNN Data Augmentation Marginally Eﬀective. Studies show that

data augmentation is only marginally eﬀective [12,3,27,4,19] at incorporating

symmetries because CNNs cannot learn invariances with data augmentation [3,4,19].

Engstrom et al. [19] ﬁnd that data augmentation only marginally improves invari-

ance. Azulay and Weiss [3] ﬁnd that data augmentation only enables invariance

to symmetries of images that resemble dataset images. Bouchacourt et al. [4]

claim that non-translation invariance is learned from the data independently of

data augmentation.

Other Equivariance CNNs Approaches Have Dataset Limitation.

CNN architectures that handle symmetry better have only been shown to work

for simple MNIST [34] or CIFAR10 [28] or synthetic datasets, not ImageNet

[44,6,45,21,12,16,50,37,20,42].

2.2 Adversarial Perturbation Attacks

Szegedy et al. [47] deﬁned the problem of generating adversarial images as start-

ing from original images and adding a small perturbation that results in mis-

classiﬁcation. Szegedy et al. [47] formalized the generation of adversarial images

as a minimization of the sum of perturbation and an adversarial loss function,

as shown in Appendix 2. The loss function uses the distance between obtained

function output values and desired function output values.

Most attacks use the classiﬁer gradient to generate adversarial perturba-

tion [11,36], but random search [1] is also used.

PGD Attack. PGD is an iterative white-box attack with a parameter that

deﬁnes the magnitude of the perturbation of each step. PGD starts from an

initial sample point x0and then iteratively ﬁnds the perturbation of each step

and projects the perturbation on an Lp-ball.

Auto-PGD Attack. Auto-PGD (APGD) [14] is a variant of PGD that

varies the step size and can use two diﬀerent loss functions to achieve a stronger

attack.

Square Attack. The Square Attack [1] is a score-based, black-box, random-

search attack based on local randomized square-shaped updates.

Fast Adaptive Boundary The white-box Fast Adaptive Boundary attack

(FAB) [13] aims to ﬁnd the minimum perturbation needed to change the classiﬁ-

cation of an original sample. However, FAB does not scale to ImageNet because

of the large number of dataset classes.

AutoAttack. AutoAttack [14] is a parameter-free ensemble of attacks that

includes: APGDCE and APGDDLR, FAB [13] and Square Attack [1].

2.3 Adversarial Defenses

Adversarial Training. AT [31,36,47] trains classiﬁers with correctly-labeled

adversarial images and is one of the ﬁrst and few defenses that have not been

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SymmetryDefenseAgainstCNNAdversarialPerturbationAttacksBlertaLindqvist[0000−0002−4950−2250]AaltoUniversity,Espoo,Finlandblerta.lindqvist@aalto.fiAbstract.ThispaperusessymmetrytomakeConvolutionalNeuralNetworkclassifiers(CNNs)robustagainstadversarialperturbationat-tacks.Suchattacksaddperturbationtoori...

展开>> 收起<<

Symmetry Defense Against CNN Adversarial Perturbation Attacks Blerta Lindqvist0000000249502250.pdf

共19页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Symmetry Defense Against CNN Adversarial Perturbation Attacks Blerta Lindqvist0000000249502250

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: