Understanding Adversarial Robustness Against On-manifold Adversarial Examples Jiancong Xiao1 Liusha Yang3 Yanbo Fan2 Jue Wang2 Zhi-Quan Luo13

2025-05-06 3 0 1.37MB 27 页 10玖币

侵权投诉

Understanding Adversarial Robustness Against On-manifold

Adversarial Examples

Jiancong Xiao1, Liusha Yang3, Yanbo Fan2,∗, Jue Wang2, Zhi-Quan Luo1,3,∗

1The Chinese University of Hong Kong, Shenzhen;

2Tencent AI Lab; 3Shenzhen Research Institute of Big Data

jiancongxiao@link.cuhk.edu.cn, yangliusha@sribd.cn,

fanyanbo0124@gmail.com, arphid@gmail.com, luozq@cuhk.edu.cn

Abstract

Deep neural networks (DNNs) are shown to be vulnerable to adversarial examples. A well-trained

model can be easily attacked by adding small perturbations to the original data. One of the hypotheses of

the existence of the adversarial examples is the oﬀ-manifold assumption: adversarial examples lie oﬀ the

data manifold. However, recent research showed that on-manifold adversarial examples also exist. In this

paper, we revisit the oﬀ-manifold assumption and want to study a question: at what level is the poor per-

formance of neural networks against adversarial attacks due to on-manifold adversarial examples? Since

the true data manifold is unknown in practice, we consider two approximated on-manifold adversarial

examples on both real and synthesis datasets. On real datasets, we show that on-manifold adversarial

examples have greater attack rates than oﬀ-manifold adversarial examples on both standard-trained and

adversarially-trained models. On synthetic datasets, theoretically, We prove that on-manifold adver-

sarial examples are powerful, yet adversarial training focuses on oﬀ-manifold directions and ignores the

on-manifold adversarial examples. Furthermore, we provide analysis to show that the properties derived

theoretically can also be observed in practice. Our analysis suggests that on-manifold adversarial exam-

ples are important, and we should pay more attention to on-manifold adversarial examples for training

robust models.

1 Introduction

In recent years, deep neural networks (DNNs) (Krizhevsky et al.,2012;Hochreiter and Schmidhuber,1997)

have become popular and successful in many machine learning tasks. They have been used in diﬀerent

problems with great success. But DNNs are shown to be vulnerable to adversarial examples (Szegedy et al.,

2013;Goodfellow et al.,2014b). A well-trained model can be easily attacked by adding small perturbations

to the images. One of the hypotheses of the existence of adversarial examples is the oﬀ-manifold assumption

(Szegedy et al.,2013):

Clean data lies in a low-dimensional manifold. Even though the adversarial examples are close to the clean

data, they lie oﬀ the underlying data manifold.

DNNs only ﬁt the data in the manifold and perform badly on adversarial examples out of the manifold.

Many research support this point of view. Pixeldefends (Song et al.,2017) leveraged a generative model to

show that adversarial examples lie in a low probability region of the data distribution. The work of (Gilmer

et al.,2018;Khoury and Hadﬁeld-Menell,2018) studied the geometry of adversarial examples. It shows that

adversarial examples are related to the high dimension of the data manifold and they are conducted in the

directions oﬀ the data manifold. The work of (Ma et al.,2018) useed Local Intrinsic Dimensionality (LID)

∗Correspondence Authors.

arXiv:2210.00430v1 [cs.LG] 2 Oct 2022

Table 1: Summary of the main results of our paper. In Sec. 4, we show that (approximated) on-manifold

adversarial examples (Gen-AE, Eigen-AE) have higher attack rates than oﬀ-manifold adversarial examples.

In Sec. 5, we provide a theoretical analysis of on-manifold attacks on GMMs, where the true data manifold

is known. In Sec. 6, we provide further analysis to show the similarity of these four cases.

Real datasets Synthetic datasets (GMMs, known manifold)

Gen-AE higher attack rates (Table 2)Excess risk: Thm. 5.1

Adversarial distribution shift: Thm. 5.3

Eigen-AE higher attack rates (Fig. 1)Excess risk: Thm. 5.2

Adversarial distribution shift: Thm: 5.4

to characterize the adversarial region and argues that the adversarial subspaces are of low probability, and

lie oﬀ (but are close to) the data submanifold.

One of the most eﬀective approaches to improve the adversarial robustness of DNNs is to augment

adversarial examples to the training set, i.e., adversarial training. However, the performance of adversarially-

trained models are still far from satisfactory. Based on the oﬀ-manifold assumption, previous studies provided

an explanation on the poor performance of adversarial training that the adversarial data lies in a higher-

dimensional manifold. DNNs can work well on a low dimensional manifold but not on a high dimensional

manifold. This is discussed in the paper we mentioned above (Gilmer et al.,2018;Khoury and Hadﬁeld-

Menell,2018). There are also other interpretations of the existence of adversarial examples, see Sec. 2.

In recent years, researchers found that on-manifold adversarial examples also exist. They can fool the

target models (Lin et al.,2020), boost clean generalization (Stutz et al.,2019), improve uncertainty calibra-

tion (Patel et al.,2021), and improve model compression (Kwon and Lee,2021). Therefore, the oﬀ-manifold

assumption may not be a perfect hypothesis explaining the existence of adversarial examples. It motivates

us to revisit the oﬀ-manifold assumption. Speciﬁcally, we study the following question:

At what level is the poor performance of neural networks against adversarial attacks due to on-manifold

adversarial examples?

The main diﬃculty in studying this question is that the true data manifold is unknown in practice. It is

hard to obtain the true on-manifold adversarial examples. To have a closer look at on-manifold adversarial

examples, we consider two approximated on-manifold adversarial examples, and we consider both real and

synthetic datasets.

Approximate on-manifold adversarial examples We use two approaches to approximate the on-

manifold adversarial examples. The ﬁrst one is generative adversarial examples (Gen-AE). Generative mod-

els, such as generative adversarial networks (GAN) (Goodfellow et al.,2014a) and variational autoencoder

(VAE) (Kingma and Welling,2013), are used to craft adversarial examples. Since the generative model is an

approximation of the data manifold, the crafted data lies in the data manifold. We perform perturbation in

the latent space of the generative model. Since the ﬁrst approach relies on the quality of the generative mod-

els, we consider the second approach: using the eigenspace of the training dataset to approximate the data

manifold, which we call eigenspace adversarial examples (Eigen-AE). In this method, the crafted adversarial

examples are closer to the original samples. For more details, see Section 4.

To start, we provide experiments of on-manifold attacks on both standard-trained and adversarially-

trained models on MNIST, CIFAR-10, CIFAR-100, and ImageNet. The experiments show that on-manifold

attacks are powerful, with higher attack rates than oﬀ-manifold adversarial examples. It help justiﬁes that

the oﬀ-manifold assumption might not be a perfect hypothesis of the existence of adversarial examples.

The experiments motivate us to study on-manifold adversarial examples from a theoretical perspective.

Since the true data manifold is unknown in practice, we provide a theoretical study on synthetic datasets

using Gaussian mixture models (GMMs). In this case, the true data manifold is given. We study the excess

risk (Thm. 5.1 and Thm. 5.2) and adversarial distribution shift (Thm. 5.3 and Thm. 5.4) of these two types

of on-manifold adversarial examples. Our main technical contribution is providing closed-form solutions to

the min-max problems of on-manifold adversarial training. Our theoretical results show that on-manifold

adversarial examples incur a large excess risk and foul the target models. Compared to regular adversarial

attacks, we show how adversarial training focuses on oﬀ-manifold directions and ignores the on-manifold

adversarial examples.

Finally, we provide a comprehensive analysis to connect the four settings (two approximate on-manifold

attacks on both real and synthetic datasets). We show the similarity of these four cases: 1) the attacks

directions of Gen-AE and Eigen-AE are similar on common datasets, and 2) the theoretical properties

derived using GMMs (Thm. 5.1 to Thm. 5.4) can also be observed in practice.

Based on our study, we ﬁnd that on-manifold adversarial examples are important for adversarial robust-

ness. We emphasize that we should pay more attention to on-manifold adversarial examples for training

robust models. Our contributions are listed as follows:

•We develop two approximate on-manifold adversarial attacks to take a closer look at on-manifold

adversarial examples.

•We provide comprehensive analyses of on-manifold adversarial examples empirically and theoretically.

We summarize our main results in Table 1. Our results suggest the importance of on-manifold adver-

sarial examples, and we should pay more attention to on-manifold adversarial examples to train robust

models.

•Technical contributions: our main technical contribution is providing the closed-form solutions to the

min-max problems of on-manifold adversarial training (Thm. 5.3 and 5.4) in GMMs. We also provide

the upper and lower bounds of excess risk (Thm. 5.1 and 5.2) of on-manifold attacks.

2 Related Work

Attack Adversarial examples for deep neural networks were ﬁrst intruduced in (Szegedy et al.,2013).

However, adversarial machine learning or robust machine learning has been studied for a long time (Biggio

and Roli,2018). In the setting of white box attack (Kurakin et al.,2016;Papernot et al.,2016;Moosavi-

Dezfooli et al.,2016;Carlini and Wagner,2017), the attackers have fully access to the model (weights,

gradients, etc.). In black box attack (Chen et al.,2017;Su et al.,2019;Ilyas et al.,2018), the attackers have

limited access to the model. First order optimization methods, which use the gradient information to craft

adversarial examples, such as PGD (Madry et al.,2017), are widely used for white box attack. Zeroth-order

optimization methods (Chen et al.,2017) are used in black box setting. (Li et al.,2019) improved the query

eﬃciency in black-box attack. Generative adversarial examples: Generative models have been used to craft

adversarial examples (Xiao et al.,2018;Song et al.,2018;Kos et al.,2018). The adversarial examples are

more natural (Zhao et al.,2017).

Defense Training algorithms against adversarial attacks can be subdivided into the following categories.

Adversarial training: The training data is augmented with adversarial examples to make the models more

robust (Madry et al.,2017;Szegedy et al.,2013;Tramèr et al.,2017). Preprocessing: Inputs or hidden layers

are quantized, projected onto diﬀerent sets or other preprocessing methods (Buckman et al.,2018;Guo et al.,

2017;Kabilan et al.,2018). Stochasticity: Inputs or hidden activations are randomized (Prakash et al.,2018;

Dhillon et al.,2018;Xie et al.,2017). However, some of them are shown to be useless defenses given by

obfuscated gradients (Athalye et al.,2018). Adaptive attack (Tramer et al.,2020) is used for evaluating

defenses to adversarial examples. (Xiao et al.,2022) consider the Rademacher complexity in adversarial

training case.

3 On-manifold Adversarial Attacks

Adversarial Attacks Given a classiﬁer fθand a sample data (x, y). The goal of regular adversarial

attacks is to ﬁnd an adversarial example x0to foul the classiﬁer fθ. A widely studied norm-based adversarial

attack is to solve the optimization problem maxkx−x0k≤ε`(fθ(x0), y),where `(·,·)is the loss function and ε

is the perturbation intensity. In this paper, we focus on on-manifold adversarial attacks. We must introduce

additional constraints to restrict x0in the data manifold and preserve the label y. In practice, the true

data manifold is unknown. Assuming that the true data manifold is a push-forward function p(z) : Z → X ,

where Zis a low dimensional Euclidean space and Xis the support of the data distribution. One way to

approximate p(z)is to use a generative model G(z)such that G(z)≈p(z). The second way is to consider the

Taylor expansion of p(z)and use the ﬁrst-order term to approximate p(z). They correspond to the following

two approximated on-manifold adversarial examples.

Generative Adversarial Examples One method to approximate the data manifold is to use generative

models, such as GAN and VAE. Let G:Z → X be a generative model and I:X → Z be the inverse

mapping of G(z). Generative adversarial attack can be formulated as the following problem

max

kz0−I(x)k≤ε`(fθ(G(z0)), y).(3.1)

Then, x0=G(z0)is an (approximate) on-manifold adversarial example. In this method, kx−x0kcan be

large. To preserve the label y, we use the conditional generative models (e.g. C-GAN (Mirza and Osindero,

2014) and C-VAE (Sohn et al.,2015), i.e. the generator Gy(z)and inverse mapping Iy(x)are conditioned

on the label y. In the experiments, we use two widely used gradient-based attack algorithms, fast gradient

sign method (FGSM) (Goodfellow et al.,2014b) and projected gradient descend (PGD) (Madry et al.,2017)

in the latent space Z, which we call GFGSM and GPGD, respectively.

Eigenspace Adversarial Examples The on-manifold adversarial examples found by the above method

are not norm-bounded. To study norm-based on-manifold adversarial examples, we use the eigenspace to

approximate the data manifold. Consider the following problem

max

x0`(fθ(x0), y)

s.t. A⊥

y(x−x0)=0,kx−x0k ≤ ε

(3.2)

where the rows of Ayis the eigenvectors corresponds to the top eigenvalues of the co-variance matrix of

the training data with label y. Then, x0is restricted in the eigenspace. A baseline algorithm for standard

adversarial attack is PGD attack (Madry et al.,2017). To make a fair comparison, we should use the same

algorithm to solve the attack problem. Notice that the inner maximization problem in Eq. (3.2) cannot

be solved by PGD in general. Because the projection step is an optimization problem. We consider `2

adversarial attacks in this case. Then, Eq. (3.2) is equivalent to

max

k∆zk2≤ε, `(f(x+ ∆zTAy), y).(3.3)

We can use PGD to ﬁnd eigenspace adversarial examples. In this case, the perturbation constraint is a subset

of the constraint of regular adversarial attacks. For comparison, we consider the case that the rows of Ayis

the eigenvectors corresponds to the bottom eigenvalues. We call this type of on-manifold and oﬀ-manifold

adversarial examples as top eigenvectors and bottom eigenvectors subspace adversarial examples.

Target Models In this paper, we aim to study the performance of on-manifold adversarial attacks on

both standard-trained model and adversarially-trained model. For standard training, only the original data

is used in training. For adversarial training, adversarial examples are augmented to the training dataset. For

ablation studies, we also consider on-manifold adversarial training, i.e., on-manifold adversarial examples

crafted by the above mentioned algorithms are augmented to the training dataset.

4 Warm-up: Performance of On-manifold Adversarial Attacks

In this section, we provide experiments in diﬀerent settings to study the performance of on-manifold adver-

sarial attacks.

Table 2: Test accuracy of diﬀerent defense algorithms (PGD-AT, GPGD-AT, and joint-PGD-AT) against

diﬀerent attacks (regular attacks (FGSM, PGD) and generative attacks (GFGSM, GPGD) on MNIST,

CIFAR-10, and ImageNet.

MNIST clean data FGSM-Attack PGD-Attack GFGSM-Attack GPGD-Attack

Std training 98.82% 47.38% 3.92% 42.37% 10.74%

PGD-AT 98.73% 96.50% 95.51% 52.17% 15.53%

GPGD-AT 98.63% 20.63% 2.11% 99.66% 96.78%

joint-PGD-AT 98.45% 97.31% 95.70% 99.27% 96.03%

CIFAR-10 clean data FGSM-Attack PGD-Attack GFGSM-Attack GPGD-Attack

Std training 91.80% 15.08% 5.39% 7.07% 3.41%

PGD-AT 80.72% 56.42% 50.18% 14.27% 8.51%

GPGD-AT 78.93% 10.64% 3.21% 40.18% 26.66%

joint-PGD-AT 79.21% 50.19% 49.77% 42.87% 28.54%

ImageNet clean data FGSM-Attack PGD-Attack GFGSM-Attack GPGD-Attack

Std training 74.72% 2.59% 0.00% / 0.26%

PGD-AT 73.31% 48.02% 38.88% / 7.23%

GPGD-AT 78.10% 21.68% 0.03% / 27.53%

joint-PGD-AT 77.96% 49.12% 37.86% / 20.53%

4.1 Experiments of Generative Adversarial Attacks

To study the performance of generative adversarial attacks, we compare the performance of diﬀerent attacks

(FGSM, PGD, GFGSM, and GPGD) versus diﬀerent models (standard training, PGD-adversarial training,

and GPGD-adversarial training).

Experiments setup In this section we report our experimental results on training LeNet on MNIST

(LeCun et al.,1998), ResNet18 (He et al.,2016) on CIFAR-10 (Krizhevsky et al.,2009), and ResNet50 on

ImageNet (Deng et al.,2009). The experiments on CIFAR-100 are discussed in Appendix B.3. On MNIST,

we use ε= 0.3and 40 steps PGD for adversarial training and use ε= 1 and 40 steps PGD for generative

adversarial training. On CIFAR-10 and CIFAR-100, we use ε= 8/255, PGD-10 for adversarial training,

and ε= 0.1, PGD-10 for generative adversarial training. On ImageNet, we adopt the settings in (Lin et al.,

2020) and use ε= 4/255, PGD-5 for adversarial training, and ε= 0.02, PGD-5 for generative adversarial

training. The choice of in the latent space is based on the quality of the generative examples. In the

outer minimization, we use the SGD optimizer with momentum 0.9 and weight decay 5·10−4. For reference

Details of the hyperparameters setting are in Appendix B.1.

Generative AT cannot defend regular attack, and vice versa In Table 2, the test accuracy of GPGD-

AT vs PGD-attack is 3.21%, which means that on-manifold adversarial training cannot defend a regular

norm-based attack. Similarly, the test accuracy of PGD-AT vs GPGD-Attack is 15.53%, a adversarially-

trained model preforms badly on on-manifold attacks. The results on the experiments on CIFAR-10 and

ImageNet are similar. PGD-AT is not able to defend GPGD-Attack, and vice versa. In the experiments

on CIFAR-10 and ImageNet, we can see that generative attacks have greater attack rates than regular

PGD-attacks.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnderstandingAdversarialRobustnessAgainstOn-manifoldAdversarialExamplesJiancongXiao1,LiushaYang3,YanboFan2;*,JueWang2,Zhi-QuanLuo1;3;1TheChineseUniversityofHongKong,Shenzhen;2TencentAILab;3ShenzhenResearchInstituteofBigDatajiancongxiao@link.cuhk.edu.cn,yangliusha@sribd.cn,fanyanbo0124@gmail.com,arp...

展开>> 收起<<

Understanding Adversarial Robustness Against On-manifold Adversarial Examples Jiancong Xiao1 Liusha Yang3 Yanbo Fan2 Jue Wang2 Zhi-Quan Luo13.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Understanding Adversarial Robustness Against On-manifold Adversarial Examples Jiancong Xiao1 Liusha Yang3 Yanbo Fan2 Jue Wang2 Zhi-Quan Luo13

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: