Probabilistic Categorical Adversarial Attack and Adversarial Training

2025-05-02 0 0 1.03MB 15 页 10玖币

侵权投诉

Han Xu 1Pengfei He 1Jie Ren 1Yuxuan Wan 1Zitao Liu 2Hui Liu 1Jiliang Tang 1

Abstract

The studies on adversarial attacks and defenses

have greatly improved the robustness of Deep

Neural Networks (DNNs). Most advanced ap-

proaches have been overwhelmingly designed for

continuous data such as images. However, these

achievements are still hard to be generalized to

categorical data. To bridge this gap, we propose

a novel framework, Probabilistic Categorical Ad-

versarial Attack (or PCAA). It transfers the dis-

crete optimization problem of ﬁnding categorical

adversarial examples to a continuous problem that

can be solved via gradient-based methods. We an-

alyze the optimality (attack success rate) and time

complexity of PCAA to demonstrate its signiﬁ-

cant advantage over current search-based attacks.

More importantly, through extensive empirical

studies, we demonstrate that the well-established

defenses for continuous data, such as adversarial

training and TRADES, can be easily accommo-

dated to defend DNNs for categorical data.

1. Introduction

Adversarial examples (Goodfellow et al.,2015) have raised

great concerns for the applications of Deep Neural Networks

(DNNs) in many security-critical domains (Cui et al.,2019;

Stringhini et al.,2010;Cao & Tay,2001). Recent years

have witnessed an increasing number of adversarial attack

and defense methods (Goodfellow et al.,2015;Madry et al.,

2018;Ilyas et al.,2019). These studies have not only greatly

deepened our understanding on the vulnerabilities of DNNs

but also tremendously advanced the robustness of DNNs.

Until now, the majority of existing accomplishments have

been achieved in continuous data such as images, where

gradient-based approaches can be leveraged.

Department of Computer Science and Engineering, Michigan

State University, East Lansing, MI, USA

Guangdong Institute of

Smart Education, Jinan University, Guangzhou, China. Correspon-

dence to: Zitao Liu <liuzitao@jnu.edu.cn>.

Proceedings of the

40 th

International Conference on Machine

Learning, Honolulu, Hawaii, USA. PMLR 202, 2023. Copyright

2023 by the author(s).

However, there are also many machine learning tasks where

the input data is categorical. For example, data in ML-based

intrusion detection systems (Khraisat et al.,2019) contains

records of the type of system operations; in ﬁnancial trans-

action systems, data includes categorical features such as

the region and card information of transactions; and in NLP

tasks, the words in a sentence can only be chosen from a

given vocabulary, which is categorical. To generate cate-

gorical adversarial examples, there are recent search-based

approaches such as (Yang et al.,2020;Lei et al.,2019;Bao

et al.,2022). For example, the method (Yang et al.,2020)

ﬁrst ﬁnds top-

features of a given sample that have the

maximal inﬂuence on the model output, and then a greedy

search is applied to obtain the optimal perturbation in these

features. However, these search-based attack methods

usually suffer from a poor trade-off between efﬁciency and

optimality (attack success rate) to ﬁnd strong adversarial

examples. Moreover, if these attack methods are applied in

defenses like adversarial training (Madry et al.,2018), they

can only search for adversarial examples for each training

sample at each time, instead of efﬁciently producing adver-

sarial examples in batches. In a nutshell, these drawbacks

of existing categorical attack methods dramatically prohibit

the possibility of applying recent advances of attack and

defense established in continuous data to categorical data.

Therefore, a natural question is can we generalize the well-

studied methods of continuous data to categorical data? We

face tremendous challenges to answer this question. First,

the input data space are categorical, thus the gradient meth-

ods of adversarial attacks and defenses (Goodfellow et al.,

2015;Madry et al.,2018) for continuous data are not di-

rectly applicable. Second, most attacks for categorical data

desire to constrain the number of perturbed features (Yang

et al.,2020;Bao et al.,2022), which is different from the

commonly considered

l2, l∞

norm constraints in continu-

ous data. To address these challenges, we propose a novel

framework: Probabilistic Categorical Adversarial Attack

(PCAA). Overall, it transforms the discrete optimization

problem of ﬁnding categorical adversarial examples into a

continuous problem by estimating the probabilistic distri-

bution of categorical adversarial examples. In detail, given

a clean data sample, we assume that (each feature of) its

adversarial example follows a categorical distribution, and

satisﬁes: (1) the samples following this distribution have

arXiv:2210.09364v3 [cs.LG] 6 Nov 2023

Probabilistic Categorical Adversarial Attack and Adversarial Training

a high expected loss value and (2) the samples only have

a few features which are different from the original clean

sample (with high probability). Based on this property,

we are able to leverage existing gradient-based algorithms

from continuous data such as (Goodfellow et al.,2015;

Madry et al.,2018) to ﬁgure out the adversarial examples

for categorical data. In such a way, we can successfully

obtain the categorical adversarial examples by optimizing

the adversarial distribution and taking samples from this

distribution. Meanwhile, our attack can also be easily in-

corporated with existing powerful defenses for continuous

data such as adversarial training (Madry et al.,2018) and

TRADES (Tu et al.,2019). Empirically, we verify that our

attack can achieve a better optimality vs. efﬁciency trade-off

to ﬁnd strong adversarial examples, and demonstrate the

advantages of our defense over other categorical defenses.

To summarize, the major contributions of this paper are:

•

We propose an efﬁcient and effective framework

PCAA to bridge the gap between categorical data and

continuous data, which allows us to generate adversar-

ial examples for categorical data by leveraging methods

from continuous data.

•

Equipped with PCAA , existing defenses in continuous

data, such as adversarial training and TRADES, can

be easily adapted to categorical data. This contribution

enables us to generalize new advances in defenses from

continuous data to categorical data.

•

We empirically validate the great beneﬁt of

PCAA from perspectives of both attack and

defense.

2. Related Work

In this section, we provide a brief review of existing method-

ologies of adversarial attacks and defenses for continuous

and categorical data, which highlights the gap between the

major methodologies in these two types of data.

2.1. Attacks and Defenses on Continuous Data

The adversarial attacks and defenses in the image domain

have been extensively studied (Madry et al.,2018;Ilyas

et al.,2019;Xu et al.,2020). Most frequently studied at-

tack methods such as FGSM (Goodfellow et al.,2015),

PGD (Madry et al.,2018) and C&W Attack (Carlini & Wag-

ner,2017) can only be conducted in the continuous domain,

since they require calculating the gradients of model outputs

on the input data. As countermeasures to resist adversarial

examples in the image domain, most defense strategies are

also based on the assumption that the input data space is con-

tinuous. For example, adversarial training methods (Madry

et al.,2018;Zhang et al.,2019) train the DNNs on the ad-

versarial examples generated by PGD. A SOTA certiﬁed de-

fense method Randomized Smoothing (Cohen et al.,2019),

calculates the certiﬁable bounds by adding Gaussian noise

to the input samples. However, these methods are hard to

be directly applicable to categorical data as the input data

space is discrete.

2.2. Attacks and Defenses on Categorical Data

To generate adversarial examples for categorical data, most

existing methods apply a search-based approach. For ex-

ample, the works (Yang et al.,2020;Bao et al.,2022;Lei

et al.,2019) ﬁrst ﬁnd top-

features of a given sample that

have the maximal inﬂuence on the model output, and then

a greedy search or brutal search is applied to obtain the

optimal combination of perturbation in these

features. In

NLP tasks, there are also many attack methods proposed

to ﬁnd adversarial “sentences” or “articles”, which follow

a similar approach as general search-based categorical at-

tacks. For example, Ebrahimi et al. (2017) proposes to

search important characters in the text based on the gradient,

and then apply greedy search to ﬁnd the optimal character

ﬂipping. Samanta & Mehta (2017) generate the adversarial

embedding in the word embedding space, then search for

the closest meaningful adversarial example that is legitimate

in the text domain. These attack methods are very different

from those in continuous domain.

To defend against categorical attacks, most defenses have

been proposed for NLP tasks and exclusively rely on the

property of word embedding: similar words have a close

distance in the embedding space. Miyato et al. (2016);

Barham & Feizi (2019) conduct adversarial training on em-

bedding space, where the adversarial examples are within

l2-ball around the embedding of clean samples. Dong et al.

(2021) also proposes an adversarial training method - ASCC,

which conducts adversarial training in the space which is

composed of convex hulls of adversarial word vectors. How-

ever, these methods can hardly be applied to defend DNNs

for general categorical data beyond NLP tasks. Different

from existing defense methods in NLP tasks, in this paper,

our proposed defense does not rely on the property of word

embedding and achieves a similar defense performance to

these NLP defenses. Moreover, our defense can be applied

in general categorical ML tasks beyond NLP.

Probabilistic Categorical Adversarial Attack

In this section, we ﬁrst introduce the necessary notations

and deﬁnitions of our studied problem. Then, we provide

the details of our proposed attack framework PCAA.

3.1. Problem Setup

In this work, we consider a classiﬁer

that predicts la-

bels

y∈ Y

based on categorical inputs

x∈ X

. Each

Probabilistic Categorical Adversarial Attack and Adversarial Training

input sample

contains

categorical features, and each

feature

can be perturbed to take a value from

allowed

categories. Namely, we deﬁne the space of all allowed per-

turbed samples of

to be

S(x)

. Meanwhile, to keep the

perturbation “un-noticeable”, we follow the setups of exist-

ing works (Yang et al.,2020;Bao et al.,2022;Wang et al.,

2020), to limit the number of perturbed features, which is

the

distance of clean sample

and the adversarial exam-

ple

x′

. It is because constraining

distance is most intuitive

and has a broad interest in categorical data, e.g. only a few

nucleotides are mutated in genetics data. Therefore, we

formally deﬁne the objective of our considered attack to

be: given the budget size

, we aim to ﬁnd an adversarial

example

x′

, which maximizes the model’s loss value, while

has a small l0distance to x:

max

x′∈S(x)L(f(x′), y)s.t. ∥x′−x∥0≤ϵ. (1)

Notably, the objective in Eq.(1) is general and it can be

applied to ﬁnd adversarial examples in various applications

by accommodating the space

S(x)

. For example, in NLP

tasks such as sentiment analysis, we can deﬁne the space

S(x)

to be the set of sentences where some words in

are

changed to their synonyms. In this way, we can keep the

semantic meaning of

during attacking. More details about

how to get S(x)in NLP tasks are given in Section 5.2.

3.2. The Objective of PCAA

To solve the problem in Eq.(1), there are existing search-

based methods (Lei et al.,2019;Yang et al.,2020) to search

the adversarial examples, via either a greedy search method

or brutal search method. However, both of these two search

methods can suffer from poor optimality vs. efﬁciency trade-

off during attacking. For example, if one conducts a brutal

search (Lei et al.,2019) to traverse the whole discrete space,

it must cause an extremely high computational cost. Mean-

while, greedy search narrows down the search space so that

it usually ﬁnds weak adversarial examples (which cannot

maximize the loss in Eq.(1)). Moreover, the poor optimality

vs. efﬁciency trade-off will make these attacks impossible

to be incorporated to the most studied defense methods (in

the continuous domain), such as adversarial training.

To address these problems, we are motivated to leverage

gradient-based methods to conduct adversarial attacks in the

categorical domain. In general, we ﬁrst deﬁne a continu-

ous probabilistic space where the adversarial examples are

sampled from, and we devise a new objective in Eq.(2) to

approximately solve Eq.(1). In speciﬁc, following the illus-

tration in Figure 1, we assume that each feature of (adver-

sarial) categorical data

x′

follows a categorical distribution:

Categorical(πi)

, where

πi∈Πi= (0,1)d

. Each element

πi,j

represents the probability that the feature

takes the

category value

. In the remaining of the paper, we will

Probabilistic Space Feature Space Neural Network Output

𝝅𝟎

𝝅𝟏

𝝅𝟐

𝒙𝟎

𝒙𝟏

𝒙𝟐

Figure 1.

An illustration of PCAA when

n= 3

and

d= 4

. Each

feature

x′

of the adversarial example

x′

is sampled from proba-

bilistic distribution πibefore feed into the model.

use

πi

to denote the categorical distribution

Categorical(πi)

without the loss of generality. Then, the input sample

’s dis-

tribution is the joint distribution of all

πi

, which is denoted

as π= [π0;π1;...;πn]∈Π⊂Rn×d.

Then, we deﬁne a new continuous optimization problem to

ﬁnd a probability distribution πin the space of Π:

max

π∈Π

Ex′∼πL(f(x′), y)s.t. Pr

x′∼π(∥x′−x∥0≥ϵ)≤δ(2)

where

denotes the perturbation budget size and

is the tail

probability constraint. By solving the problem in Eq.(2), we

aim to ﬁnd a distribution with parameter

such that: (1) on

average, the generated samples

x′

from distribution

have

a high loss value; and (2) with low probability, the sampled

x′

has a

distance to the clean sample

larger than

. Thus,

the generated samples

x′

are likely to mislead the model

prediction while preserving most features of

. As shown

in Figure 1, the probabilistic distribution

to Eq.(2) is ﬁrst

used to sample adversarial examples

x′

, and then, the model

makes predictions on the x′.

It worth mentioning that, in our attack, each feature

x′

of the

adversarial example

x′

is sampled independently. However,

it does not mean that the features themselves in the data

distribution are independent to each other. Therefore, our

framework is general and applicable to various data types

and model architectures, including sequential data such as

sentences in NLP areas, or DNA sequences.

3.3. An Efﬁcient Algorithm of PCAA

Solving the problem in Eq.(2) is not trivial because the

probability and the

term are not differentiable. Thus, we

provide a feasible algorithm to solve Eq.(2), by substituting

the constraint in Eq.(2) to a differentiable term. In detail,

we substitute the

distance between

x′

and

by calcu-

lating the sum of Cross Entropy Loss between

πi

and

which is

LCE (πi, xi)

, for all features

i∈ |x|

. It is because

LCE (πi, xi)

measures the probability that the categorical

variables

x′

following the distribution

πi

is different from

Thus, we use the sum of Cross Entropy

Pi∈|n|LCE (πi, xi)

to approximate the total number of changed features in

x′

which is the

difference

∥x′−x∥0

. In our algorithm, we

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ProbabilisticCategoricalAdversarialAttackandAdversarialTrainingHanXu1PengfeiHe1JieRen1YuxuanWan1ZitaoLiu2HuiLiu1JiliangTang1AbstractThestudiesonadversarialattacksanddefenseshavegreatlyimprovedtherobustnessofDeepNeuralNetworks(DNNs).Mostadvancedap-proacheshavebeenoverwhelminglydesignedforcontinuousda...

展开>> 收起<<

Probabilistic Categorical Adversarial Attack and Adversarial Training.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Probabilistic Categorical Adversarial Attack and Adversarial Training

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: