
Probabilistic Categorical Adversarial Attack and Adversarial Training
a high expected loss value and (2) the samples only have
a few features which are different from the original clean
sample (with high probability). Based on this property,
we are able to leverage existing gradient-based algorithms
from continuous data such as (Goodfellow et al.,2015;
Madry et al.,2018) to figure out the adversarial examples
for categorical data. In such a way, we can successfully
obtain the categorical adversarial examples by optimizing
the adversarial distribution and taking samples from this
distribution. Meanwhile, our attack can also be easily in-
corporated with existing powerful defenses for continuous
data such as adversarial training (Madry et al.,2018) and
TRADES (Tu et al.,2019). Empirically, we verify that our
attack can achieve a better optimality vs. efficiency trade-off
to find strong adversarial examples, and demonstrate the
advantages of our defense over other categorical defenses.
To summarize, the major contributions of this paper are:
•
We propose an efficient and effective framework
PCAA to bridge the gap between categorical data and
continuous data, which allows us to generate adversar-
ial examples for categorical data by leveraging methods
from continuous data.
•
Equipped with PCAA , existing defenses in continuous
data, such as adversarial training and TRADES, can
be easily adapted to categorical data. This contribution
enables us to generalize new advances in defenses from
continuous data to categorical data.
•
We empirically validate the great benefit of
PCAA from perspectives of both attack and
defense.
2. Related Work
In this section, we provide a brief review of existing method-
ologies of adversarial attacks and defenses for continuous
and categorical data, which highlights the gap between the
major methodologies in these two types of data.
2.1. Attacks and Defenses on Continuous Data
The adversarial attacks and defenses in the image domain
have been extensively studied (Madry et al.,2018;Ilyas
et al.,2019;Xu et al.,2020). Most frequently studied at-
tack methods such as FGSM (Goodfellow et al.,2015),
PGD (Madry et al.,2018) and C&W Attack (Carlini & Wag-
ner,2017) can only be conducted in the continuous domain,
since they require calculating the gradients of model outputs
on the input data. As countermeasures to resist adversarial
examples in the image domain, most defense strategies are
also based on the assumption that the input data space is con-
tinuous. For example, adversarial training methods (Madry
et al.,2018;Zhang et al.,2019) train the DNNs on the ad-
versarial examples generated by PGD. A SOTA certified de-
fense method Randomized Smoothing (Cohen et al.,2019),
calculates the certifiable bounds by adding Gaussian noise
to the input samples. However, these methods are hard to
be directly applicable to categorical data as the input data
space is discrete.
2.2. Attacks and Defenses on Categorical Data
To generate adversarial examples for categorical data, most
existing methods apply a search-based approach. For ex-
ample, the works (Yang et al.,2020;Bao et al.,2022;Lei
et al.,2019) first find top-
K
features of a given sample that
have the maximal influence on the model output, and then
a greedy search or brutal search is applied to obtain the
optimal combination of perturbation in these
K
features. In
NLP tasks, there are also many attack methods proposed
to find adversarial “sentences” or “articles”, which follow
a similar approach as general search-based categorical at-
tacks. For example, Ebrahimi et al. (2017) proposes to
search important characters in the text based on the gradient,
and then apply greedy search to find the optimal character
flipping. Samanta & Mehta (2017) generate the adversarial
embedding in the word embedding space, then search for
the closest meaningful adversarial example that is legitimate
in the text domain. These attack methods are very different
from those in continuous domain.
To defend against categorical attacks, most defenses have
been proposed for NLP tasks and exclusively rely on the
property of word embedding: similar words have a close
distance in the embedding space. Miyato et al. (2016);
Barham & Feizi (2019) conduct adversarial training on em-
bedding space, where the adversarial examples are within
l2-ball around the embedding of clean samples. Dong et al.
(2021) also proposes an adversarial training method - ASCC,
which conducts adversarial training in the space which is
composed of convex hulls of adversarial word vectors. How-
ever, these methods can hardly be applied to defend DNNs
for general categorical data beyond NLP tasks. Different
from existing defense methods in NLP tasks, in this paper,
our proposed defense does not rely on the property of word
embedding and achieves a similar defense performance to
these NLP defenses. Moreover, our defense can be applied
in general categorical ML tasks beyond NLP.
3.
Probabilistic Categorical Adversarial Attack
In this section, we first introduce the necessary notations
and definitions of our studied problem. Then, we provide
the details of our proposed attack framework PCAA.
3.1. Problem Setup
In this work, we consider a classifier
f
that predicts la-
bels
y∈ Y
based on categorical inputs
x∈ X
. Each
2