Are Synonym Substitution Attacks Really Synonym Substitution Attacks

2025-04-22 0 0 689.94KB 24 页 10玖币
侵权投诉
Are Synonym Substitution Attacks Really Synonym Substitution Attacks?
Cheng-Han Chiang
National Taiwan University,
Taiwan
dcml0714@gmail.com
Hung-yi Lee
National Taiwan University,
Taiwan
hungyilee@ntu.edu.tw
Abstract
In this paper, we explore the following ques-
tion: Are synonym substitution attacks really
synonym substitution attacks (SSAs)? We ap-
proach this question by examining how SSAs
replace words in the original sentence and
show that there are still unresolved obstacles
that make current SSAs generate invalid ad-
versarial samples. We reveal that four widely
used word substitution methods generate a
large fraction of invalid substitution words that
are ungrammatical or do not preserve the orig-
inal sentence’s semantics. Next, we show that
the semantic and grammatical constraints used
in SSAs for detecting invalid word replace-
ments are highly insufficient in detecting in-
valid adversarial samples.
1 Introduction
Deep learning-based natural language processing
models have been extensively used in different
tasks in many domains and have shown strong per-
formance in different realms. However, these mod-
els seem to be astonishingly vulnerable in that their
predictions can be misled by some small perturba-
tions in the original input (Gao et al.,2018;Tan
et al.,2020). These imperceptible perturbations,
while not changing humans’ predictions, can make
a well-trained model behave worse than random.
One important type of adversarial attack in nat-
ural language processing (NLP) is the
synonym
substitution attack
(SSA). In SSAs, an adver-
sarial sample is constructed by substituting some
words in the original sentence with their syn-
onyms (Alzantot et al.,2018;Ren et al.,2019;
Garg and Ramakrishnan,2020;Jin et al.,2020;
Li et al.,2020;Maheshwary et al.,2021). This
ensures that the adversarial sample is semantically
similar to the original sentence, thus fulfilling the
imperceptibility requirement of a valid adversar-
ial sample. While substituting words with their
semantic-related counterparts can retain the seman-
tics of the original sentence, these attacks often
utilize constraints to further guarantee that the gen-
erated adversarial samples are grammatically cor-
rect and semantically similar to the original sen-
tence. These SSAs have all been shown to suc-
cessfully bring down well-trained text classifiers’
performance.
However, some recent works observe, by human
evaluations, that the quality of the generated adver-
sarial samples of those SSAs is fairly low and is
highly perceptible by human (Morris et al.,2020a;
Hauser et al.,2021). These adversarial samples of-
ten contain grammatical errors and do not preserve
the semantics of the original samples, making them
difficult to understand. These characteristics vio-
late the fundamental criteria of a
valid adversarial
sample
: preserving semantics and being impercep-
tible to humans. This motivates us to investigate
what causes those SSAs to generate invalid adver-
sarial samples. Only by answering this question
can we move on to design more realistic SSAs in
the future.
In this paper, we are determined to answer the
following question: Are synonym substitution at-
tacks in the literature really synonym substitution
attacks? We explore the answer by scrutinizing
the key components in several important SSAs and
why they fail to generate valid adversarial samples.
Specifically, we conduct a detailed analysis of how
the word substitution sets are obtained in SSAs,
and we look into the semantic and grammatical con-
straints used to filter invalid adversarial samples.
We have the following astonishing observations:
When substituting words by WordNet syn-
onym sets, current methods neglect the word
sense differences within the substitution set.
(Section 3.1)
When using counter-fitted GloVe embedding
space or BERT to generate the substitution set,
the substitution set only contains a teeny-tiny
fraction of synonyms. (Section 3.2)
arXiv:2210.02844v3 [cs.CL] 8 May 2023
Using word embedding cosine similarity or
sentence embedding cosine similarity to filter
words in the substitution set does not neces-
sarily exclude semantically invalid word sub-
stitutions. (Section 4.1 and Section 4.2)
The grammar checker used for filtering un-
grammatical adversarial samples fails to de-
tect most erroneous verb inflectional forms in
a sentence. (Section 4.3)
2 Backgrounds
In this section, we provide an overview of SSAs
and introduce some related notations that will be
used throughout the paper.
2.1 Synonym Substitution Attacks (SSAs)
Given a victim text classifier trained on a dataset
Dtrain
and a clean testing data
xori
sampled
from the same distribution of
Dtrain
;
xori =
{x1,· · · , xT}
is a sequence with
T
tokens. An
SSA attacks the victim model by constructing an
adversarial sample
xadv ={x0
1,· · · , x0
T}
by swap-
ping the words in
xori
with their semantic-related
counterparts. For
xadv
to be considered as a
valid
adversarial sample of
xori
, a few requirements
must be met (Morris et al.,2020a): (0)
xadv
should
make the model yield a wrong prediction while
the model can correctly classify
xori
. (1)
xadv
should be semantically similar with
xori
. (2)
xadv
should not induce new grammar errors compared
with
xori
. (3) The word-level overlap between
xadv
and
xori
should be high enough. (4) The
modification made in
xadv
should be natural and
non-suspicious.
In our paper, we will refer to the
adversarial samples that fail to meet the above
criteria as invalid adversarial samples.
SSAs rely on heuristic procedures to ensure that
xadv
satisfies the preceding specifications. Here,
we describe a canonical pipeline of generating
xadv
from
xori
(Morris et al.,2020b). Given a clean
testing sample
xori
that the text classifier correctly
predicts, an SSA will first generate a candidate
word substitution set
Sxi
for each word
xi
. The
process of generating the candidate set
Sxi
is called
transformation
. Next, the SSA will determine
which word in
xori
should be substituted first, and
which word should be the next to swap, etc. After
the word substitution order is decided, the SSA will
iteratively substitute each word
xi
in
xori
using
the candidate words in
Sxi
according to the pre-
determined order. In each substitution step, an
xi
is replaced by a word in
Sxi
, and a new
xswap
is obtained. When an
xswap
is obtained, some
constraints
are used to verify the validity of
xswap
.
The iterative word substitution process will end if
the model’s prediction is successfully corrupted by
a substituted sentence that sticks to the constraints,
yielding the desired xadv eventually.
Clearly, the transformations and the constraints
are critical to the quality of the final
xadv
. In the
remaining part of the paper, we will look deeper
into the transformations and constraints used in
SSAs and their role in creating adversarial sam-
ples
1
. Next, we briefly introduce the transforma-
tions and constraints that have been used in SSAs.
2.2 Transformations
Transformation is the process of generating the
substitution set
Sxi
for a word
xi
in
xori
. There are
four representative transformations in the literature.
WordNet Synonym Transformation
con-
structs
Sxi
by querying a word’s synonym using
WordNet (Miller,1995;University,2010), a lexical
database containing the word sense definition,
synonyms, and antonyms of the words in English.
This transformation is used in PWWS (Ren et al.,
2019) and LexicalAT (Xu et al.,2019).
Word Embedding Space Nearest Neighbor
Transformation
constructs
Sxi
by looking up
the word embedding of
xi
in a word embed-
ding space, and finding its
k
nearest neighbors
(
k
NN) in the word embedding space. Using
k
NN
for word substitution is based on the assumption
that semantically related words are closer in the
word embedding space. Counter-fitted GloVe em-
bedding space (Mrkši´
c et al.,2016) is the em-
bedding space obtained from post-processing the
GloVe embedding space (Pennington et al.,2014).
Counter-fitting refers to the process of pulling
away antonyms and narrowing the distance be-
tween synonyms. This transformation is adopted in
TextFooler (Jin et al.,2020), Genetic algorithm
attack (Alzantot et al.,2018), and TextFooler-
Adj (Morris et al.,2020a).
1
In our paper, we do not discuss the relationship between
the validity of an SSA and how an SSA determines which
word in
xori
should be substituted. Most SSAs use word
importance scores to determine what the most salient words
are and substitute the most salient words. Since most SSAs use
similar methods to determine what word should be replaced,
our analyses are generalizable to those SSAs.
Masked Language Model (MLM) Mask-
Infilling Transformation
constructs
Sxi
by
masking
xi
in
xori
and asking an MLM to predict
the masked token; MLM’s top-
k
prediction of the
masked token forms the word substitution set of
xi
. Widely adopted MLMs includes BERT (Devlin
et al.,2019) and RoBERTa (Liu et al.,2019).
Using MLM mask-infilling to generate a candidate
set relies on the belief that MLMs can generate
fluent and semantic-consistent substitutions for
xori
. This method is used in BERT-ATTACK (Li
et al.,2020) and CLARE (Li et al.,2021).
MLM Reconstruction Transformation
also
uses MLMs. When using MLM reconstruction
transformation to generate the candidate set, one
just feeds the MLM with the original sentence
xori
without masking
any tokens in the sentence. Here,
the MLM is not performing mask-infilling but re-
constructs the input tokens from the unmasked in-
puts. For each word
xi
, one can take its top-
k
token
reconstruction prediction as the candidates. This
transformation relies on the intuition that recon-
struction can generate more semantically similar
words than using mask-infilling. This method is
used in BAE (Garg and Ramakrishnan,2020).
2.3 Constraints
When an
xori
is perturbed by swapping some words
in it, we need to use some constraints to check
whether the perturbed sentence,
xswap
, is semanti-
cally or grammatically valid or not. We use
xswap
instead of
xadv
here as
xswap
does not necessarily
flip the model’s prediction and thus not necessarily
an adversarial sample.
Word Embedding Cosine Similarity
requires
a word
xi
and its perturbed counterpart
x0
i
to be
close enough in the counter-fitted GloVe embed-
ding space, in terms of cosine similarity. A sub-
stitution is valid if its word embedding’s cosine
similarity with the original word’s embedding is
higher than a pre-defined threshold. This is used in
Genetic Algorithm Attack (Alzantot et al.,2018)
and TextFooler (Jin et al.,2020).
Sentence Embedding Cosine Similarity
de-
mands that the sentence embedding cosine simi-
larity between
xswap
and
xori
are higher than a
pre-defined threshold. Most previous works (Jin
et al.,2020;Li et al.,2020;Garg and Ramakrishnan,
2020;Morris et al.,2020a) use Universal Sentence
Encoder (USE) (Cer et al.,2018) as the sentence
encoder; A2T (Yoo and Qi,2021) use a Distil-
BERT (Sanh et al.,2019) fine-tuned on STS-B (Cer
et al.,2017) as the sentence encoder.
In some previous work (Li et al.,2020), the sen-
tence embedding is computed using the whole sen-
tence
xori
and
xswap
. But most previous works (Jin
et al.,2020;Garg and Ramakrishnan,2020) only
extract a context around the currently swapped
word in
xori
and
xswap
to compute the sentence
embedding. For example, if
xi
is substituted in the
current substitution step, one will compute the sen-
tence embedding between
xori[iw:i+w+ 1]
and
xadv[iw:i+w+ 1]
, where
w
determines
the window size.
w
is set to 7 in Jin et al. (2020)
and Garg and Ramakrishnan (2020).
LanguageTool
(language-tool python,2022)
is an open-source grammar tool that can detect
spelling errors and grammar mistakes in an input
sentence. It is used in TextFooler-Adj (Morris et al.,
2020a) to evaluate the grammaticality of the adver-
sarial samples.
3 Problems with the Transformations in
SSAs
In this section, we show that the transformations
introduced in Section 2.2 are largely to blame for
the invalid adversarial samples in SSAs. This is
because the substitution set
Sxi
for
xi
is mostly
invalid, either semantically or grammatically.
3.1 WordNet Synonym Substitution Set
Ignores Word Senses
In WordNet, each word is associated with one or
more word senses, and each word sense has its cor-
responding synonym sets. Thus, the substitution
set
Sxi
proposed by WordNet is the union of the
synonym sets of different senses of
xi
. When swap-
ping
xi
with its synonym using WordNet, it is more
sensible to first identify the word sense of
xi
in
xori
, and use the synonym set of the very sense as
the substitution set. However, current attacks using
WordNet synonym substitution neglect the sense
differences within the substitution set (Ren et al.,
2019), which may result in adversarial samples that
semantically deviate from the original input.
As a working example, consider a movie review
that reads "I highly recommend it". The word "rec-
ommend" here corresponds to the word sense of
"express a good opinion of " according to WordNet
and has the synonym set {recommend, commend}.
Aside from the above word sense, "recommend"
also have another word sense: "push for some-
thing", as in "The travel agent recommends not
to travel amid the pandemic". This second word
sense has the synonym set {recommend, urge, ad-
vocate}
2
. Apparently, the only valid substitution
is "commend", which preserves the semantics of
the original movie review. While "urge" is the syn-
onym of "recommend", it obviously does not fit
in the context and should not be considered as a
possible substitution. We call substituting
xi
with
a synonym that matches the word sense of
xi
in
xori
amatched sense substitution, and we use mis-
matched sense substitution to refer to swapping
words with the synonym which belongs to the syn-
onym set of a different word sense.
3.1.1 Experiments
To illustrate that mismatched sense substitution is
a problem existing in practical attack algorithms,
we conduct the following analysis. We examine
the adversarial samples generated by PWWS (Ren
et al.,2019), which substitutes words using Word-
Net synonym set. We use a benchmark dataset (Yoo
et al.,2022) that contains the adversarial samples
generated by PWWS against a BERT-based classi-
fier fine-tuned on AG-News (Zhang et al.,2015).
AG-News is a news topic classification dataset,
which aims to classify a piece of news into four
categories: world, sports, business, and sci/tech
news. The attack success rate on the testing set
composed of 7.6K samples is 57.25%. More statis-
tics about the datasets can be found in Appendix B.
We categorize the words replaced by PWWS into
three disjoint categories: matched sense substitu-
tion,mismatched sense substitution, and morpho-
logical substitution. The last category, morphologi-
cal substitution, refers to substituting words with a
word that only differs in inflectional morphemes
3
or derivational morphemes
4
with the original word.
We specifically isolate morphological substitution
since it is hard to categorize it into either matched
or mismatched sense substitution.
The detailed procedure of categorizing a re-
placed word’s substitution type is as follows: Given
2The word senses and synonyms are from WordNet.
3
Inflectional morphemes are the suffixes that change the
grammatical property of a word but do not create a new word,
such as a verb’s tense or a noun’s number. For example,
recommendsrecommend.
4
Derivational morphemes are affixes or suffixes that
change the form of a word and create a new word, such
as changing a verb into a noun form. For example,
recommendrecommendation.
a pair of
(xori,xadv )
, we first use NLTK (Bird
et al.,2009) to perform word sense disambiguation
on each word xiin xori. We use LemmInflect and
NLTK, to generate the morphological substitution
set
MLxi
of
xi
. The matched sense substitution set
Mxi
is constructed using the WordNet synonym set
of the word sense of
xi
in
xori
; since this synonym
set includes the original word
xi
and may also in-
clude some words in the
MLxi
, we remove
xi
and
words that are already included in the
MLxi
from
the synonym set, forming the final matched sense
substitution set,
Mxi
. The mismatched sense sub-
stitution set
MMxi
is constructed by first collecting
all synonyms of
xi
that belong to the different word
sense(s) of
xi
in
xori
using WordNet, and then re-
moving all words that have been included in
MLxi
and Mxi.
After inspecting 4140 adversarial samples pro-
duced by PWWS, we find that among
26600
words
that are swapped by PWWS, only
5398 (20.2%)
words fall in the category of matched sense substi-
tution. A majority of
20055 (75.4%)
word substi-
tutions are mismatched sense substitutions, which
should be considered invalid substitutions since us-
ing mismatched sense substitution cannot preserve
the semantics of
xori
and makes
xadv
incompre-
hensible. Last, about
3.8%
of words are substi-
tuted with their morphological related words, such
as converting the part of speech (POS) from verb
to noun or changing the verb tense. These sub-
stitutions, while maintaining the semantics of the
original sentence and perhaps human readable, are
mostly ungrammatical and lead to unnatural adver-
sarial samples. The aforementioned statistics illus-
trate that only about 20% word substitutions pro-
duced by PWWS are real synonym substitutions,
and thus the high attack success rate of 57.25%
should not be surprising since most word replace-
ments are highly questionable.
3.2 Counter-fitted Embedding kNN and
MLM Mask-Infilling/Reconstruction
Contain Few Matched Sense Synonym
As shown in Section 3.1.1, even when using Word-
Net synonyms as the candidate sets, the proportion
of the valid substitutions is unthinkably low. This
makes us more concerned about the word substitu-
tion quality of the other three heuristic transforma-
tions introduced in Section 2.2. These three word
substitution methods mostly rely on assumptions
about the quality of the embedding space or the
Transformations Syn. (matched) Syn. (mismatched) Antonyms Morphemes Others
GloVe-kNN 0.22 1.01 0 1.55 27.22
BERT mask-infill 0.08 0.36 0.06 0.57 28.93
BERT reconstruction 0.14 0.58 0.09 1.19 27.99
Table 1: The average words of different substitution types in the candidate word set of k=30 words. Syn. is short
for Synonym.
ability of the MLM and require setting a hyperpa-
rameter
k
for the size of the substitution set. To
the best of our knowledge, no previous work has
systematically studied what the candidate sets pro-
posed by the three transformations are like; still,
they have been widely used in SSAs.
3.2.1 Experiments
To understand what those substitution sets are like,
we conduct the following experiment. We use the
benchmark dataset generated by Yoo et al. (2022)
that attacks 7.6k samples in the AG-News testing
data using TextFooler. For each word
xi
in
xori
that is perturbed into another
x0
i
in
xadv
, we use
the following three transformations to obtain the
candidate substitution set: counter-fitted GloVe em-
bedding space, BERT mask-infilling, and BERT
reconstruction.
5
We only consider the substitu-
tion set of
xi
that are perturbed in
xadv
because
not all words in
xori
will be perturbed by an SSA,
and it is thus more reasonable to consider only the
words that are really perturbed by an SSA. We set
the
k
in
k
NN of counter-fitted GloVe embedding
space transformation and top-
k
prediction in BERT
mask-infilling/reconstruction to
30
, a reasonable
number compared with many previous works.
We categorize the candidate words into five dis-
joint word substitution types. Aside from the three
word substitution types discussed in Section 3.1.1,
we include two other substitution types. The first
one is antonym substitution, which is obtained by
querying the antonyms of a word
xi
using WordNet.
Different from synonym substitutions, we do not
separate antonyms into antonyms that matched the
word sense of
xi
in
xori
and the sense-mismatched
antonyms, since neither of them should be consid-
ered a valid swap in SSAs. The other substitution
type is others, which simply consists of the candi-
date words not falling in the category of synonyms,
antonyms, or morphological substitutions.
In Table 1, we show how different substitution
types comprise the 30 words in the candidate set
5
For BERT mask-infilling and reconstruction substitution,
we remove punctuation and incomplete subword tokens.
for different transformations on average. It is easy
to tell that only a slight proportion of the substitu-
tion set is made up of synonym substitution for all
three transformation methods, with counter-fitted
GloVe embedding substitution containing the most
synonyms among the three methods, but still only
a sprinkle of about 1 word on average. Moreover,
synonym substitution is mostly composed of mis-
matched sense substitution. When using BERT
mask-infilling as a transformation, there are only
0.08 matched sense substitutions in the top 30 pre-
dictions. While using BERT reconstruction for
producing the candidate set, the matched sense sub-
stitution slightly increases, compared with mask-
infilling, but still only accounts for less than 1 word
in the top-30 reconstruction predictions of BERT.
Within the substitution set, there is on average
about 1 word which is the morphological substitu-
tion of the original word. Surprisingly, using MLM
mask-infilling or reconstruction as transformation,
there is a slight chance that the candidate set con-
sists of antonyms of the original word. It is highly
doubtful whether the semantics is preserved when
swapping the original sentence with antonyms.
The vast majority of the substitution set com-
poses of words that do not fall into the previous
four categories. We provide examples of how the
substitution sets proposed by different transforma-
tions are like in Table 6in the Appendix, showing
that the candidate words in the others substitution
types are mostly unrelated words that should not
be used for word replacement. It is understand-
able that words falling to the other substitution
types are invalid candidates; this is because the
core of SSAs is to replace words with their seman-
tically close counterparts to preserve the semantics
of the original sentence. If a substitution word does
not belong to the synonym set proposed by Word-
Net, it is unlikely that swapping the original word
with this word can preserve the semantics of
xori
.
We also show some randomly selected adversarial
samples generated by different SSAs that use dif-
ferent transformations in Table 5in the Appendix,
摘要:

AreSynonymSubstitutionAttacksReallySynonymSubstitutionAttacks?Cheng-HanChiangNationalTaiwanUniversity,Taiwandcml0714@gmail.comHung-yiLeeNationalTaiwanUniversity,Taiwanhungyilee@ntu.edu.twAbstractInthispaper,weexplorethefollowingques-tion:Aresynonymsubstitutionattacksreallysynonymsubstitutionattacks(...

展开>> 收起<<
Are Synonym Substitution Attacks Really Synonym Substitution Attacks.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:689.94KB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注