
Table 1: Summary of the main results of our paper. In Sec. 4, we show that (approximated) on-manifold
adversarial examples (Gen-AE, Eigen-AE) have higher attack rates than off-manifold adversarial examples.
In Sec. 5, we provide a theoretical analysis of on-manifold attacks on GMMs, where the true data manifold
is known. In Sec. 6, we provide further analysis to show the similarity of these four cases.
Real datasets Synthetic datasets (GMMs, known manifold)
Gen-AE higher attack rates (Table 2)Excess risk: Thm. 5.1
Adversarial distribution shift: Thm. 5.3
Eigen-AE higher attack rates (Fig. 1)Excess risk: Thm. 5.2
Adversarial distribution shift: Thm: 5.4
to characterize the adversarial region and argues that the adversarial subspaces are of low probability, and
lie off (but are close to) the data submanifold.
One of the most effective approaches to improve the adversarial robustness of DNNs is to augment
adversarial examples to the training set, i.e., adversarial training. However, the performance of adversarially-
trained models are still far from satisfactory. Based on the off-manifold assumption, previous studies provided
an explanation on the poor performance of adversarial training that the adversarial data lies in a higher-
dimensional manifold. DNNs can work well on a low dimensional manifold but not on a high dimensional
manifold. This is discussed in the paper we mentioned above (Gilmer et al.,2018;Khoury and Hadfield-
Menell,2018). There are also other interpretations of the existence of adversarial examples, see Sec. 2.
In recent years, researchers found that on-manifold adversarial examples also exist. They can fool the
target models (Lin et al.,2020), boost clean generalization (Stutz et al.,2019), improve uncertainty calibra-
tion (Patel et al.,2021), and improve model compression (Kwon and Lee,2021). Therefore, the off-manifold
assumption may not be a perfect hypothesis explaining the existence of adversarial examples. It motivates
us to revisit the off-manifold assumption. Specifically, we study the following question:
At what level is the poor performance of neural networks against adversarial attacks due to on-manifold
adversarial examples?
The main difficulty in studying this question is that the true data manifold is unknown in practice. It is
hard to obtain the true on-manifold adversarial examples. To have a closer look at on-manifold adversarial
examples, we consider two approximated on-manifold adversarial examples, and we consider both real and
synthetic datasets.
Approximate on-manifold adversarial examples We use two approaches to approximate the on-
manifold adversarial examples. The first one is generative adversarial examples (Gen-AE). Generative mod-
els, such as generative adversarial networks (GAN) (Goodfellow et al.,2014a) and variational autoencoder
(VAE) (Kingma and Welling,2013), are used to craft adversarial examples. Since the generative model is an
approximation of the data manifold, the crafted data lies in the data manifold. We perform perturbation in
the latent space of the generative model. Since the first approach relies on the quality of the generative mod-
els, we consider the second approach: using the eigenspace of the training dataset to approximate the data
manifold, which we call eigenspace adversarial examples (Eigen-AE). In this method, the crafted adversarial
examples are closer to the original samples. For more details, see Section 4.
To start, we provide experiments of on-manifold attacks on both standard-trained and adversarially-
trained models on MNIST, CIFAR-10, CIFAR-100, and ImageNet. The experiments show that on-manifold
attacks are powerful, with higher attack rates than off-manifold adversarial examples. It help justifies that
the off-manifold assumption might not be a perfect hypothesis of the existence of adversarial examples.
The experiments motivate us to study on-manifold adversarial examples from a theoretical perspective.
Since the true data manifold is unknown in practice, we provide a theoretical study on synthetic datasets
using Gaussian mixture models (GMMs). In this case, the true data manifold is given. We study the excess
risk (Thm. 5.1 and Thm. 5.2) and adversarial distribution shift (Thm. 5.3 and Thm. 5.4) of these two types
2