Symmetry Defense Against CNN Adversarial Perturbation Attacks Blerta Lindqvist0000000249502250

2025-05-02 0 0 733.22KB 19 页 10玖币
侵权投诉
Symmetry Defense Against
CNN Adversarial Perturbation Attacks
Blerta Lindqvist[0000000249502250]
Aalto University, Espoo, Finland
blerta.lindqvist@aalto.fi
Abstract. This paper uses symmetry to make Convolutional Neural
Network classifiers (CNNs) robust against adversarial perturbation at-
tacks. Such attacks add perturbation to original images to generate ad-
versarial images that fool classifiers such as road sign classifiers of au-
tonomous vehicles. Although symmetry is a pervasive aspect of the nat-
ural world, CNNs are unable to handle symmetry well. For example, a
CNN can classify an image differently from its mirror image. For an ad-
versarial image that misclassifies with a wrong label lw, CNN inability
to handle symmetry means that a symmetric adversarial image can clas-
sify differently from the wrong label lw. Further than that, we find that
the classification of a symmetric adversarial image reverts to the correct
label. To classify an image when adversaries are unaware of the defense,
we apply symmetry to the image and use the classification label of the
symmetric image. To classify an image when adversaries are aware of the
defense, we use mirror symmetry and pixel inversion symmetry to form
a symmetry group. We apply all the group symmetries to the image and
decide on the output label based on the agreement of any two of the clas-
sification labels of the symmetry images. Adaptive attacks fail because
they need to rely on loss functions that use conflicting CNN output val-
ues for symmetric images. Without attack knowledge, the proposed sym-
metry defense succeeds against both gradient-based and random-search
attacks, with up to near-default accuracies for ImageNet. The defense
even improves the classification accuracy of original images.
Keywords: Adversarial perturbation defense ·Symmetry ·CNN adver-
sarial robustness.
1 Introduction
Despite achieving state-of-the-art status in computer vision [24,30], convolutional
neural network classifiers (CNNs) lack adversarial robustness because they can
classify imperceptibly perturbed images incorrectly [11,23,36,47]. One of the first
and still undefeated defenses against adversarial perturbation attacks is adversar-
ial training (AT) [31,36,47], which uses adversarial images in training. However,
AT reliance on attack knowledge during training [36] is a significant drawback
since such knowledge might not be available.
arXiv:2210.04087v3 [cs.LG] 10 Aug 2023
2 Blerta Lindqvist
Attack
Flip
teapot panda teapot
Defense
Perturb
Fig. 1. The flip symmetry defense against zero-knowledge adversaries reverts adver-
sarial images to their correct classification by horizontally flipping the images before
classification. The defense classifies non-adversarial images in the same way.
Although engineered to incorporate symmetries such as horizontal flipping,
translations, and rotations, CNNs lack invariance with respect to these symme-
tries [19] in the classification of datasets such as ImageNet [15], CIFAR10 [29],
MNIST [34]. The CNN lack of invariance means that CNNs can classify images
differently after they have been horizontally flipped, or even slightly shifted or
rotated [3,19]. Furthermore, CNNs only provide approximate translation invari-
ance [3,4,19,26] and are unable to learn invariances with respect to symmetries
such as rotation and horizontal flipping with data augmentation [3,4,19].
Against adversarial perturbation attacks causing misclassification, the CNN
inability to handle symmetry well can be beneficial. Although an adversarial
image classifies with a wrong label, a symmetric adversarial image generated by
applying a symmetry to an adversarial image can classify with a label that is
different from the wrong label of the adversarial image. Aiming to classify ad-
versarial images correctly, we ask:
Can we use the CNN inability to handle symmetry correctly for a defense that
provides robustness against adversarial perturbation attacks?
Addressing this question, we design a novel symmetry defense that only uses
symmetry to counter adversarial perturbation attacks. The proposed symmetry
defense makes the following main contributions:
We show that the proposed symmetry defense succeeds against gradient-
based attacks and a random search attack without using adversarial im-
ages or attack knowledge. In contrast, the current best defense needs attack
knowledge to train the classifier with adversarial images.
The symmetry defense counters zero-knowledge adversaries with near-default
accuracies by using either the horizontal flip symmetry or an artificial pixel
inversion symmetry. Results are shown in Table 1 and in Table 2.
The defense also counters perfect-knowledge adversaries with near-default
accuracies, as shown in Table 4. Against such adversaries, the defense uses
a symmetry subgroup that consists of the identity symmetry, the mirror
symmetry (also called horizontal flip), the pixel inversion symmetry, and the
symmetry that combines the mirror flip and the pixel inversion symmetry.
The defense counters adaptive attacks that use symmetry against the defense
because an attack loss function applied to symmetric images depends on the
function value of symmetric images, that is, on the CNN output evaluated
at these images. Loss functions measure the distance between the function
Symmetry Defense Against CNN Adversarial Perturbation Attacks 3
output and a label value. Since the function output can be different for
symmetric images due to the CNN inability to handle symmetry well, the
optimization of adaptive attacks that incorporate symmetry in their loss
functions is not optimal.
The usage of the pixel intensity inversion symmetry, discussed in Section 5.1
and in Section 5.2, that does not exist in natural images of the dataset means
that the proposed defense could be applied even to datasets without existing
symmetries.
The symmetry defense maintains and even exceeds the non-adversarial ac-
curacy against perfect-knowledge adversaries, as shown in Table 4.
2 Related Work and Background
2.1 Symmetry, Equivariance and Invariance in CNNs.
Symmetry of an object is a transformation that leaves that object invariant.
Image symmetries include rotation, horizontal flipping, and inversion [38]. We
provide definitions related to symmetry groups in Appendix 1.A function fis
equivariant with respect to a transformation Tif they commute with each-
other [44]: fT=Tf.Invariance is a special case of equivariance where the
Ttransformation applied after the function is the identity transformation [44]:
fT=f.
CNNs stack equivariant convolution and pooling layers [22] followed by an
invariant map in order to learn invariant functions [5] with respect to symme-
tries, following a standard blueprint used in machine learning [5,25]. Translation
invariance for image classification means that the position of an object in an im-
age should not impact its classification. To achieve translation invariance, CNN
convolutional layers [30,32] compute feature maps over the translation symmetry
group [21,46] using kernel sliding [21,33]. CNN pooling layers enable local trans-
lation invariance [5,16,22]. The pooling layers of CNNs positioned after convolu-
tional layers enable local invariance to translation [16] because the output of the
pooling operation does not change when the position of features changes within
the pooling region. Cohen and Welling [12] show that convolutional layers, pool-
ing, arbitrary pointwise nonlinearities, batch normalization, and residual blocks
are equivariant to translation. CNNs learn invariance with respect to symme-
tries such as rotations, horizontal flipping, and scaling with data augmentation,
which adds to the training dataset images obtained by applying symmetries to
original images [30]. For ImageNet, data augmentation can consist of a random
crop, horizontal flip, color jitter, and color transforms of original images [18].
CNN Lack of Translation Equivariance. Studies suggest that CNNs are
not equivariant to translation in CNNs [3,4,19,26,49], not even to small trans-
lations or rotations [19]. Bouchacourt et al. [4] claim that the CNN translation
invariance is approximate and that translation invariance is primarily learned
from the data and data augmentation. The cause of translation invariance has
4 Blerta Lindqvist
been attributed to aliasing effects caused by the subsampling of the convolu-
tional stride [3], by max pooling, average pooling, and strides [49], or by image
boundary effects [26].
CNN Data Augmentation Marginally Effective. Studies show that
data augmentation is only marginally effective [12,3,27,4,19] at incorporating
symmetries because CNNs cannot learn invariances with data augmentation [3,4,19].
Engstrom et al. [19] find that data augmentation only marginally improves invari-
ance. Azulay and Weiss [3] find that data augmentation only enables invariance
to symmetries of images that resemble dataset images. Bouchacourt et al. [4]
claim that non-translation invariance is learned from the data independently of
data augmentation.
Other Equivariance CNNs Approaches Have Dataset Limitation.
CNN architectures that handle symmetry better have only been shown to work
for simple MNIST [34] or CIFAR10 [28] or synthetic datasets, not ImageNet
[44,6,45,21,12,16,50,37,20,42].
2.2 Adversarial Perturbation Attacks
Szegedy et al. [47] defined the problem of generating adversarial images as start-
ing from original images and adding a small perturbation that results in mis-
classification. Szegedy et al. [47] formalized the generation of adversarial images
as a minimization of the sum of perturbation and an adversarial loss function,
as shown in Appendix 2. The loss function uses the distance between obtained
function output values and desired function output values.
Most attacks use the classifier gradient to generate adversarial perturba-
tion [11,36], but random search [1] is also used.
PGD Attack. PGD is an iterative white-box attack with a parameter that
defines the magnitude of the perturbation of each step. PGD starts from an
initial sample point x0and then iteratively finds the perturbation of each step
and projects the perturbation on an Lp-ball.
Auto-PGD Attack. Auto-PGD (APGD) [14] is a variant of PGD that
varies the step size and can use two different loss functions to achieve a stronger
attack.
Square Attack. The Square Attack [1] is a score-based, black-box, random-
search attack based on local randomized square-shaped updates.
Fast Adaptive Boundary The white-box Fast Adaptive Boundary attack
(FAB) [13] aims to find the minimum perturbation needed to change the classifi-
cation of an original sample. However, FAB does not scale to ImageNet because
of the large number of dataset classes.
AutoAttack. AutoAttack [14] is a parameter-free ensemble of attacks that
includes: APGDCE and APGDDLR, FAB [13] and Square Attack [1].
2.3 Adversarial Defenses
Adversarial Training. AT [31,36,47] trains classifiers with correctly-labeled
adversarial images and is one of the first and few defenses that have not been
摘要:

SymmetryDefenseAgainstCNNAdversarialPerturbationAttacksBlertaLindqvist[0000−0002−4950−2250]AaltoUniversity,Espoo,Finlandblerta.lindqvist@aalto.fiAbstract.ThispaperusessymmetrytomakeConvolutionalNeuralNetworkclassifiers(CNNs)robustagainstadversarialperturbationat-tacks.Suchattacksaddperturbationtoori...

展开>> 收起<<
Symmetry Defense Against CNN Adversarial Perturbation Attacks Blerta Lindqvist0000000249502250.pdf

共19页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:19 页 大小:733.22KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 19
客服
关注