Towards Generating Adversarial Examples on Mixed-type Data Han Xu

2025-05-06 0 0 875.18KB 10 页 10玖币
侵权投诉
Towards Generating Adversarial Examples on
Mixed-type Data
Han Xu
Michigan State University, East Lansing, Michigan, USA
xuhan1@msu.edu
Zhimeng Jiang
Texas A&M University, College Station, Texas, USA
zhimengj@tamu.edu
Menghai Pan, Huiyuan Chen, Xiaoting Li, Mahashweta Das, Hao Yang
VISA Research, Palo Alto, California, USA
Abstract—The existence of adversarial attacks (or adversarial
examples) brings huge concern about the machine learning
(ML) model’s safety issues. For many safety-critical ML tasks,
such as financial forecasting, fraudulent detection, and anomaly
detection, the data samples are usually mixed-type, which contain
plenty of numerical and categorical features at the same time.
However, how to generate adversarial examples with mixed-
type data is still seldom studied. In this paper, we propose a
novel attack algorithm M-Attack, which can effectively generate
adversarial examples in mixed-type data. Based on M-Attack, at-
tackers can attempt to mislead the targeted classification model’s
prediction, by only slightly perturbing both the numerical and
categorical features in the given data samples. More importantly,
by adding designed regularizations, our generated adversarial
examples can evade potential detection models, which makes the
attack indeed insidious. Through extensive empirical studies, we
validate the effectiveness and efficiency of our attack method and
evaluate the robustness of existing classification models against
our proposed attack. The experimental results highlight the
feasibility of generating adversarial examples toward machine
learning models in real-world applications.
I. INTRODUCTION
The existence of adversarial attacks (or adversarial ex-
amples) [1]–[4] has brought huge concerns when applying
machine learning (ML) models to safety-critical tasks. In many
ML applications in Web Service or mobile applications, the
data inputs are often mixed type, which contains both nu-
merical and categorical features simultaneously. For example,
for an online financial institute, ML models are trained to
evaluate whether a loan applicant has the ability to repay
his/her loan. In this scenario, the data has numerical features,
e.g., the applicant’s age, account balance, and annual income,
as well as categorical features including his/her educational
background, occupation type, and marital status. Similarly,
in recommender systems for online shopping, the data also
contains both numerical and categorical information, such as
the price and categories of the recommended items. However,
how to conduct adversarial attacks for mixed-type data still
lacks full exploration. Therefore, in this paper, we focus on the
problem of how to generate adversarial examples for mixed-
type data. Specifically, we aim to solve the problem: given a
well-trained ML model, how can we perturb a data sample
with an unnoticeable distortion, such that this ML model’s
prediction is misled to give a wrong prediction? The studied
problem is crucial in practice. Recall the above financial
institute example: if an unqualified applicant provides a fake
profile that contains a few fraudulent records so that he/she
can fool the trained ML model to get the approval of the loan,
it will cause a huge cost to this financial institute.
To achieve this attacking goal, we face tremendous chal-
lenges. First, the attacker is expected to modify as few
features of the original (clean) data as possible to keep the
perturbation behavior unnoticeable and insidious. This requires
the perturbation added on the original sample to be sparse in
the input data space. Notably, there are existing methods for
sparse adversarial attacks in either numerical or categorical
data domains, separately. In the numerical domain, Projected
Gradient Descent methods (PGD) are adopted to guide the
search of adversarial examples and project the perturbation
into a continuous l1-norm bounded space to stress the sparsity
of perturbation [1], [5]. In the categorical domain, search-
based methods [6], [7] are employed to iteratively find the
top-Kcategorical features, which have the largest influence
to model prediction, and then search optimal perturbation for
these Kfeatures. However, our task involves both the numer-
ical and categorical features and there are still no confirmative
methods to generate the optimal sparse adversarial attack over
the targeted searching space, which is a Cartesian product of a
discrete space (for categorical features) and continuous space
(for numerical features). Moreover, our experimental results in
Section V suggest that simply combining existing strategies
usually provide sub-optimal solutions. For example, for an
algorithm which first applies the search-based methods [6],
[7] to perturb categorical features, and then applies l1-PGD
method [1], [5] to find numerical perturbations, it cannot
successfully find strongest adversarial examples (or leading
to the maximal loss value of the targeted classifier). This
fact highlights the necessity to devise new adversarial attack
arXiv:2210.09405v1 [cs.LG] 17 Oct 2022
algorithms exclusively designed for mixed-type data.
Second, the attacker should also keep the generated ad-
versarial examples to be seemingly benign. In other words,
adversarial examples should be close to the distribution of
original clean data samples. It is hard to achieve this goal
since features in mixed-type data are usually highly correlated.
For example, in Home Credit Default Risk Dataset [8] where
the task is to predict a person’s qualification for a loan, the
feature “age” of an applicant is always strongly related to
other numerical features (such as “number of children”) and
categorical features (such as “family status”). In this dataset,
if an attacker perturbs the feature “family status” from “child”
to “parent” for an 18-year-old loan applicant, the perturbed
sample obviously deviates from the true distribution of clean
data samples (because there are rarely 18-year-old parents in
reality). As a result, the generated adversarial example can
be easily detected as “abnormal” samples, by an defender
that applies Out-of-Distribution (OOD) detection (or Anomaly
Detection) methods [9]–[11], or detected by human experts
who can judge the authentication of data samples based on
their domain knowledge. Thus, the attacker should generate
adversarial examples, which do not significantly violate the
correlation between any pair of numerical features, as well as
any pair of categorical features, or the pair of categorical and
numerical features.
In this paper, to tackle the aforementioned challenges, we
proposed M-Attack, which is a novel attacking framework
for mixed-type data. In detail, we first transform the searching
space of mixed-type adversarial examples into a unified contin-
uous space (see Figure 1). To achieve this goal, we convert the
problem of finding adversarial categorical perturbations into
the problem to find the probabilistic (categorical) distribution
which the adversarial categorical features are sampled from.
Therefore, we are facilitated to find sparse adversarial exam-
ples in this unified continuous space, by simultaneously up-
dating the numerical & categorical perturbation via gradient-
based methods. Furthermore, to generate in-distribution ad-
versarial examples, we propose a Mix-Mahalanobis Distance
to measure and regularize the distance of an (adversarial)
example to the distribution of the clean mixed-type data
samples. Through extensive experiments, we verify that: (1)
M-Attack can achieve better attacking performance compared
to representative baselines, which are the combination of ex-
isting numerical & categorical attack methods; (2) M-Attack
can achieve better (or similar) efficiency compared to these
baseline methods; and (3) Mix-Mahalanobis Distance can
help the generated adversarial examples be close to the true
distribution of clean data. Our contribution can be summarized
as follows:
We propose an efficient and effective algorithm, M-
Attack, to generate adversarial examples for mixed-type
data.
We propose a novel measurement, Mixed Mahalanobis
Distance, which helps the generated adversarial examples
be close to the true distribution.
We conduct extensive experiments to validate the feasi-
bility of M-Attack, and demonstrate the vulnerability of
popular classification models against M-Attack.
II. RELATED WORKS
There has been a rise in the importance of the adversarial
robustness of machine learning models in recent years. Most
of the existing works, including evasion attack, and poison
attack, focus on continuous input space, especially in the
image domain [1], [12], [13]. In the image domain, because
the data space (which is the space of pixel values) is con-
tinuous, people use gradient-based methods [1], [5] to find
adversarial examples. Based on the gradient attack methods,
various defense methods, such as adversarial training, [1], [14]
are proposed to improve the model robustness. Meanwhile,
adversarial attacks focusing on discrete input data, like text
data, which have categorical features, are also starting to catch
the attention of researchers. In the natural language processing
domain, the work [15] discusses the potential to attack text
classification models such as sentiment analysis, by replacing
words with their synonyms. The study [16] proposes to modify
the text token based on the gradient of input one-hot vectors.
The method [17] develops a scoring function to select the most
effective attack and a simple character-level transformation to
replace projected gradient or multiple linguistic-driven steps
on text data. In the domain of graph data learning, there
are methods [18] that greedily search the perturbations to
manipulate the graph structure to mislead the targeted models.
In a conclusion, when the data space is discrete, these methods
share a similar core idea by applying searched-based methods
to find adversarial perturbations. Although there are well-
established methods for either numerical domain or categorical
domain separately, there is still a lack of studies in mixed-type
data. However, mixed-type data are widely existing in various
machine learning tasks in the physical world. For example,
for the fraudulent transaction detection systems [19], [20] of
credit card institutes, the transaction records from cardholders
may include features such as transaction amount (numerical)
and the type of the purchased product (categorical). Similarly,
for ML models applying in AI healthcare [21], [22], i.e., in
epidemiological studies [23], [24], the data can be collected
from surveys which ask the respondents’ information, includ-
ing their age, gender, race, the type of medical treatment
and the expenditure amount of each type of medical supplies
that are used. In recommender systems [25]–[27] in online-
shopping websites which are built for product recommenda-
tions, the data can include the purchase history of the clients
or the property of the products, both containing numerical and
categorical features. In this paper, we focus on the problem of
how to slightly perturb the input (mixed-type) data sample of a
model to mislead the model’s prediction. This study can have
great significance to help us understand the potential risk of
ML models under malicious perturbations in the applications
mentioned above.
摘要:

TowardsGeneratingAdversarialExamplesonMixed-typeDataHanXuMichiganStateUniversity,EastLansing,Michigan,USAxuhan1@msu.eduZhimengJiangTexasA&MUniversity,CollegeStation,Texas,USAzhimengj@tamu.eduMenghaiPan,HuiyuanChen,XiaotingLi,MahashwetaDas,HaoYangVISAResearch,PaloAlto,California,USAAbstract—Theexiste...

展开>> 收起<<
Towards Generating Adversarial Examples on Mixed-type Data Han Xu.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:875.18KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注