Probabilistic Categorical Adversarial Attack and Adversarial Training

2025-05-02 0 0 1.03MB 15 页 10玖币
侵权投诉
Probabilistic Categorical Adversarial Attack and Adversarial Training
Han Xu 1Pengfei He 1Jie Ren 1Yuxuan Wan 1Zitao Liu 2Hui Liu 1Jiliang Tang 1
Abstract
The studies on adversarial attacks and defenses
have greatly improved the robustness of Deep
Neural Networks (DNNs). Most advanced ap-
proaches have been overwhelmingly designed for
continuous data such as images. However, these
achievements are still hard to be generalized to
categorical data. To bridge this gap, we propose
a novel framework, Probabilistic Categorical Ad-
versarial Attack (or PCAA). It transfers the dis-
crete optimization problem of finding categorical
adversarial examples to a continuous problem that
can be solved via gradient-based methods. We an-
alyze the optimality (attack success rate) and time
complexity of PCAA to demonstrate its signifi-
cant advantage over current search-based attacks.
More importantly, through extensive empirical
studies, we demonstrate that the well-established
defenses for continuous data, such as adversarial
training and TRADES, can be easily accommo-
dated to defend DNNs for categorical data.
1. Introduction
Adversarial examples (Goodfellow et al.,2015) have raised
great concerns for the applications of Deep Neural Networks
(DNNs) in many security-critical domains (Cui et al.,2019;
Stringhini et al.,2010;Cao & Tay,2001). Recent years
have witnessed an increasing number of adversarial attack
and defense methods (Goodfellow et al.,2015;Madry et al.,
2018;Ilyas et al.,2019). These studies have not only greatly
deepened our understanding on the vulnerabilities of DNNs
but also tremendously advanced the robustness of DNNs.
Until now, the majority of existing accomplishments have
been achieved in continuous data such as images, where
gradient-based approaches can be leveraged.
1
Department of Computer Science and Engineering, Michigan
State University, East Lansing, MI, USA
2
Guangdong Institute of
Smart Education, Jinan University, Guangzhou, China. Correspon-
dence to: Zitao Liu <liuzitao@jnu.edu.cn>.
Proceedings of the
40 th
International Conference on Machine
Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright
2023 by the author(s).
However, there are also many machine learning tasks where
the input data is categorical. For example, data in ML-based
intrusion detection systems (Khraisat et al.,2019) contains
records of the type of system operations; in financial trans-
action systems, data includes categorical features such as
the region and card information of transactions; and in NLP
tasks, the words in a sentence can only be chosen from a
given vocabulary, which is categorical. To generate cate-
gorical adversarial examples, there are recent search-based
approaches such as (Yang et al.,2020;Lei et al.,2019;Bao
et al.,2022). For example, the method (Yang et al.,2020)
first finds top-
K
features of a given sample that have the
maximal influence on the model output, and then a greedy
search is applied to obtain the optimal perturbation in these
K
features. However, these search-based attack methods
usually suffer from a poor trade-off between efficiency and
optimality (attack success rate) to find strong adversarial
examples. Moreover, if these attack methods are applied in
defenses like adversarial training (Madry et al.,2018), they
can only search for adversarial examples for each training
sample at each time, instead of efficiently producing adver-
sarial examples in batches. In a nutshell, these drawbacks
of existing categorical attack methods dramatically prohibit
the possibility of applying recent advances of attack and
defense established in continuous data to categorical data.
Therefore, a natural question is can we generalize the well-
studied methods of continuous data to categorical data? We
face tremendous challenges to answer this question. First,
the input data space are categorical, thus the gradient meth-
ods of adversarial attacks and defenses (Goodfellow et al.,
2015;Madry et al.,2018) for continuous data are not di-
rectly applicable. Second, most attacks for categorical data
desire to constrain the number of perturbed features (Yang
et al.,2020;Bao et al.,2022), which is different from the
commonly considered
l2, l
norm constraints in continu-
ous data. To address these challenges, we propose a novel
framework: Probabilistic Categorical Adversarial Attack
(PCAA). Overall, it transforms the discrete optimization
problem of finding categorical adversarial examples into a
continuous problem by estimating the probabilistic distri-
bution of categorical adversarial examples. In detail, given
a clean data sample, we assume that (each feature of) its
adversarial example follows a categorical distribution, and
satisfies: (1) the samples following this distribution have
1
arXiv:2210.09364v3 [cs.LG] 6 Nov 2023
Probabilistic Categorical Adversarial Attack and Adversarial Training
a high expected loss value and (2) the samples only have
a few features which are different from the original clean
sample (with high probability). Based on this property,
we are able to leverage existing gradient-based algorithms
from continuous data such as (Goodfellow et al.,2015;
Madry et al.,2018) to figure out the adversarial examples
for categorical data. In such a way, we can successfully
obtain the categorical adversarial examples by optimizing
the adversarial distribution and taking samples from this
distribution. Meanwhile, our attack can also be easily in-
corporated with existing powerful defenses for continuous
data such as adversarial training (Madry et al.,2018) and
TRADES (Tu et al.,2019). Empirically, we verify that our
attack can achieve a better optimality vs. efficiency trade-off
to find strong adversarial examples, and demonstrate the
advantages of our defense over other categorical defenses.
To summarize, the major contributions of this paper are:
We propose an efficient and effective framework
PCAA to bridge the gap between categorical data and
continuous data, which allows us to generate adversar-
ial examples for categorical data by leveraging methods
from continuous data.
Equipped with PCAA , existing defenses in continuous
data, such as adversarial training and TRADES, can
be easily adapted to categorical data. This contribution
enables us to generalize new advances in defenses from
continuous data to categorical data.
We empirically validate the great benefit of
PCAA from perspectives of both attack and
defense.
2. Related Work
In this section, we provide a brief review of existing method-
ologies of adversarial attacks and defenses for continuous
and categorical data, which highlights the gap between the
major methodologies in these two types of data.
2.1. Attacks and Defenses on Continuous Data
The adversarial attacks and defenses in the image domain
have been extensively studied (Madry et al.,2018;Ilyas
et al.,2019;Xu et al.,2020). Most frequently studied at-
tack methods such as FGSM (Goodfellow et al.,2015),
PGD (Madry et al.,2018) and C&W Attack (Carlini & Wag-
ner,2017) can only be conducted in the continuous domain,
since they require calculating the gradients of model outputs
on the input data. As countermeasures to resist adversarial
examples in the image domain, most defense strategies are
also based on the assumption that the input data space is con-
tinuous. For example, adversarial training methods (Madry
et al.,2018;Zhang et al.,2019) train the DNNs on the ad-
versarial examples generated by PGD. A SOTA certified de-
fense method Randomized Smoothing (Cohen et al.,2019),
calculates the certifiable bounds by adding Gaussian noise
to the input samples. However, these methods are hard to
be directly applicable to categorical data as the input data
space is discrete.
2.2. Attacks and Defenses on Categorical Data
To generate adversarial examples for categorical data, most
existing methods apply a search-based approach. For ex-
ample, the works (Yang et al.,2020;Bao et al.,2022;Lei
et al.,2019) first find top-
K
features of a given sample that
have the maximal influence on the model output, and then
a greedy search or brutal search is applied to obtain the
optimal combination of perturbation in these
K
features. In
NLP tasks, there are also many attack methods proposed
to find adversarial “sentences” or “articles”, which follow
a similar approach as general search-based categorical at-
tacks. For example, Ebrahimi et al. (2017) proposes to
search important characters in the text based on the gradient,
and then apply greedy search to find the optimal character
flipping. Samanta & Mehta (2017) generate the adversarial
embedding in the word embedding space, then search for
the closest meaningful adversarial example that is legitimate
in the text domain. These attack methods are very different
from those in continuous domain.
To defend against categorical attacks, most defenses have
been proposed for NLP tasks and exclusively rely on the
property of word embedding: similar words have a close
distance in the embedding space. Miyato et al. (2016);
Barham & Feizi (2019) conduct adversarial training on em-
bedding space, where the adversarial examples are within
l2-ball around the embedding of clean samples. Dong et al.
(2021) also proposes an adversarial training method - ASCC,
which conducts adversarial training in the space which is
composed of convex hulls of adversarial word vectors. How-
ever, these methods can hardly be applied to defend DNNs
for general categorical data beyond NLP tasks. Different
from existing defense methods in NLP tasks, in this paper,
our proposed defense does not rely on the property of word
embedding and achieves a similar defense performance to
these NLP defenses. Moreover, our defense can be applied
in general categorical ML tasks beyond NLP.
3.
Probabilistic Categorical Adversarial Attack
In this section, we first introduce the necessary notations
and definitions of our studied problem. Then, we provide
the details of our proposed attack framework PCAA.
3.1. Problem Setup
In this work, we consider a classifier
f
that predicts la-
bels
y∈ Y
based on categorical inputs
x∈ X
. Each
2
Probabilistic Categorical Adversarial Attack and Adversarial Training
input sample
x
contains
n
categorical features, and each
feature
xi
can be perturbed to take a value from
d
allowed
categories. Namely, we define the space of all allowed per-
turbed samples of
x
to be
S(x)
. Meanwhile, to keep the
perturbation “un-noticeable”, we follow the setups of exist-
ing works (Yang et al.,2020;Bao et al.,2022;Wang et al.,
2020), to limit the number of perturbed features, which is
the
l0
distance of clean sample
x
and the adversarial exam-
ple
x
. It is because constraining
l0
distance is most intuitive
and has a broad interest in categorical data, e.g. only a few
nucleotides are mutated in genetics data. Therefore, we
formally define the objective of our considered attack to
be: given the budget size
ϵ
, we aim to find an adversarial
example
x
, which maximizes the model’s loss value, while
has a small l0distance to x:
max
x∈S(x)L(f(x), y)s.t. xx0ϵ. (1)
Notably, the objective in Eq.(1) is general and it can be
applied to find adversarial examples in various applications
by accommodating the space
S(x)
. For example, in NLP
tasks such as sentiment analysis, we can define the space
S(x)
to be the set of sentences where some words in
x
are
changed to their synonyms. In this way, we can keep the
semantic meaning of
x
during attacking. More details about
how to get S(x)in NLP tasks are given in Section 5.2.
3.2. The Objective of PCAA
To solve the problem in Eq.(1), there are existing search-
based methods (Lei et al.,2019;Yang et al.,2020) to search
the adversarial examples, via either a greedy search method
or brutal search method. However, both of these two search
methods can suffer from poor optimality vs. efficiency trade-
off during attacking. For example, if one conducts a brutal
search (Lei et al.,2019) to traverse the whole discrete space,
it must cause an extremely high computational cost. Mean-
while, greedy search narrows down the search space so that
it usually finds weak adversarial examples (which cannot
maximize the loss in Eq.(1)). Moreover, the poor optimality
vs. efficiency trade-off will make these attacks impossible
to be incorporated to the most studied defense methods (in
the continuous domain), such as adversarial training.
To address these problems, we are motivated to leverage
gradient-based methods to conduct adversarial attacks in the
categorical domain. In general, we first define a continu-
ous probabilistic space where the adversarial examples are
sampled from, and we devise a new objective in Eq.(2) to
approximately solve Eq.(1). In specific, following the illus-
tration in Figure 1, we assume that each feature of (adver-
sarial) categorical data
x
i
follows a categorical distribution:
Categorical(πi)
, where
πiΠi= (0,1)d
. Each element
πi,j
represents the probability that the feature
i
takes the
category value
j
. In the remaining of the paper, we will
Probabilistic Space Feature Space Neural Network Output
𝝅𝟎
𝝅𝟏
𝝅𝟐
𝒙𝟎
$
𝒙𝟏
$
𝒙𝟐
$
Figure 1.
An illustration of PCAA when
n= 3
and
d= 4
. Each
feature
x
i
of the adversarial example
x
is sampled from proba-
bilistic distribution πibefore feed into the model.
use
πi
to denote the categorical distribution
Categorical(πi)
without the loss of generality. Then, the input sample
x
s dis-
tribution is the joint distribution of all
πi
, which is denoted
as π= [π0;π1;...;πn]ΠRn×d.
Then, we define a new continuous optimization problem to
find a probability distribution πin the space of Π:
max
πΠ
ExπL(f(x), y)s.t. Pr
xπ(xx0ϵ)δ(2)
where
ϵ
denotes the perturbation budget size and
δ
is the tail
probability constraint. By solving the problem in Eq.(2), we
aim to find a distribution with parameter
π
such that: (1) on
average, the generated samples
x
from distribution
π
have
a high loss value; and (2) with low probability, the sampled
x
has a
l0
distance to the clean sample
x
larger than
ϵ
. Thus,
the generated samples
x
are likely to mislead the model
prediction while preserving most features of
x
. As shown
in Figure 1, the probabilistic distribution
π
to Eq.(2) is first
used to sample adversarial examples
x
, and then, the model
makes predictions on the x.
It worth mentioning that, in our attack, each feature
x
i
of the
adversarial example
x
is sampled independently. However,
it does not mean that the features themselves in the data
distribution are independent to each other. Therefore, our
framework is general and applicable to various data types
and model architectures, including sequential data such as
sentences in NLP areas, or DNA sequences.
3.3. An Efficient Algorithm of PCAA
Solving the problem in Eq.(2) is not trivial because the
probability and the
l0
term are not differentiable. Thus, we
provide a feasible algorithm to solve Eq.(2), by substituting
the constraint in Eq.(2) to a differentiable term. In detail,
we substitute the
l0
distance between
x
and
x
by calcu-
lating the sum of Cross Entropy Loss between
πi
and
xi
,
which is
LCE (πi, xi)
, for all features
i∈ |x|
. It is because
LCE (πi, xi)
measures the probability that the categorical
variables
x
i
following the distribution
πi
is different from
xi
.
Thus, we use the sum of Cross Entropy
Pi∈|n|LCE (πi, xi)
to approximate the total number of changed features in
x
,
which is the
l0
difference
xx0
. In our algorithm, we
3
摘要:

ProbabilisticCategoricalAdversarialAttackandAdversarialTrainingHanXu1PengfeiHe1JieRen1YuxuanWan1ZitaoLiu2HuiLiu1JiliangTang1AbstractThestudiesonadversarialattacksanddefenseshavegreatlyimprovedtherobustnessofDeepNeuralNetworks(DNNs).Mostadvancedap-proacheshavebeenoverwhelminglydesignedforcontinuousda...

展开>> 收起<<
Probabilistic Categorical Adversarial Attack and Adversarial Training.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.03MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注