Are Synonym Substitution Attacks Really Synonym Substitution Attacks

2025-04-22 0 0 689.94KB 24 页 10玖币

侵权投诉

Are Synonym Substitution Attacks Really Synonym Substitution Attacks?

Cheng-Han Chiang

National Taiwan University,

Taiwan

dcml0714@gmail.com

Hung-yi Lee

National Taiwan University,

Taiwan

hungyilee@ntu.edu.tw

Abstract

In this paper, we explore the following ques-

tion: Are synonym substitution attacks really

synonym substitution attacks (SSAs)? We ap-

proach this question by examining how SSAs

replace words in the original sentence and

show that there are still unresolved obstacles

that make current SSAs generate invalid ad-

versarial samples. We reveal that four widely

used word substitution methods generate a

large fraction of invalid substitution words that

are ungrammatical or do not preserve the orig-

inal sentence’s semantics. Next, we show that

the semantic and grammatical constraints used

in SSAs for detecting invalid word replace-

ments are highly insufﬁcient in detecting in-

valid adversarial samples.

1 Introduction

Deep learning-based natural language processing

models have been extensively used in different

tasks in many domains and have shown strong per-

formance in different realms. However, these mod-

els seem to be astonishingly vulnerable in that their

predictions can be misled by some small perturba-

tions in the original input (Gao et al.,2018;Tan

et al.,2020). These imperceptible perturbations,

while not changing humans’ predictions, can make

a well-trained model behave worse than random.

One important type of adversarial attack in nat-

ural language processing (NLP) is the

synonym

substitution attack

(SSA). In SSAs, an adver-

sarial sample is constructed by substituting some

words in the original sentence with their syn-

onyms (Alzantot et al.,2018;Ren et al.,2019;

Garg and Ramakrishnan,2020;Jin et al.,2020;

Li et al.,2020;Maheshwary et al.,2021). This

ensures that the adversarial sample is semantically

similar to the original sentence, thus fulﬁlling the

imperceptibility requirement of a valid adversar-

ial sample. While substituting words with their

semantic-related counterparts can retain the seman-

tics of the original sentence, these attacks often

utilize constraints to further guarantee that the gen-

erated adversarial samples are grammatically cor-

rect and semantically similar to the original sen-

tence. These SSAs have all been shown to suc-

cessfully bring down well-trained text classiﬁers’

performance.

However, some recent works observe, by human

evaluations, that the quality of the generated adver-

sarial samples of those SSAs is fairly low and is

highly perceptible by human (Morris et al.,2020a;

Hauser et al.,2021). These adversarial samples of-

ten contain grammatical errors and do not preserve

the semantics of the original samples, making them

difﬁcult to understand. These characteristics vio-

late the fundamental criteria of a

valid adversarial

sample

: preserving semantics and being impercep-

tible to humans. This motivates us to investigate

what causes those SSAs to generate invalid adver-

sarial samples. Only by answering this question

can we move on to design more realistic SSAs in

the future.

In this paper, we are determined to answer the

following question: Are synonym substitution at-

tacks in the literature really synonym substitution

attacks? We explore the answer by scrutinizing

the key components in several important SSAs and

why they fail to generate valid adversarial samples.

Speciﬁcally, we conduct a detailed analysis of how

the word substitution sets are obtained in SSAs,

and we look into the semantic and grammatical con-

straints used to ﬁlter invalid adversarial samples.

We have the following astonishing observations:

•

When substituting words by WordNet syn-

onym sets, current methods neglect the word

sense differences within the substitution set.

(Section 3.1)

•

When using counter-ﬁtted GloVe embedding

space or BERT to generate the substitution set,

the substitution set only contains a teeny-tiny

fraction of synonyms. (Section 3.2)

arXiv:2210.02844v3 [cs.CL] 8 May 2023

•

Using word embedding cosine similarity or

sentence embedding cosine similarity to ﬁlter

words in the substitution set does not neces-

sarily exclude semantically invalid word sub-

stitutions. (Section 4.1 and Section 4.2)

•

The grammar checker used for ﬁltering un-

grammatical adversarial samples fails to de-

tect most erroneous verb inﬂectional forms in

a sentence. (Section 4.3)

2 Backgrounds

In this section, we provide an overview of SSAs

and introduce some related notations that will be

used throughout the paper.

2.1 Synonym Substitution Attacks (SSAs)

Given a victim text classiﬁer trained on a dataset

Dtrain

and a clean testing data

xori

sampled

from the same distribution of

Dtrain

;

xori =

{x1,· · · , xT}

is a sequence with

tokens. An

SSA attacks the victim model by constructing an

adversarial sample

xadv ={x0

1,· · · , x0

by swap-

ping the words in

xori

with their semantic-related

counterparts. For

xadv

to be considered as a

valid

adversarial sample of

xori

, a few requirements

must be met (Morris et al.,2020a): (0)

xadv

should

make the model yield a wrong prediction while

the model can correctly classify

xori

. (1)

xadv

should be semantically similar with

xori

. (2)

xadv

should not induce new grammar errors compared

with

xori

. (3) The word-level overlap between

xadv

and

xori

should be high enough. (4) The

modiﬁcation made in

xadv

should be natural and

non-suspicious.

In our paper, we will refer to the

adversarial samples that fail to meet the above

criteria as invalid adversarial samples.

SSAs rely on heuristic procedures to ensure that

xadv

satisﬁes the preceding speciﬁcations. Here,

we describe a canonical pipeline of generating

xadv

from

xori

(Morris et al.,2020b). Given a clean

testing sample

xori

that the text classiﬁer correctly

predicts, an SSA will ﬁrst generate a candidate

word substitution set

Sxi

for each word

. The

process of generating the candidate set

Sxi

is called

transformation

. Next, the SSA will determine

which word in

xori

should be substituted ﬁrst, and

which word should be the next to swap, etc. After

the word substitution order is decided, the SSA will

iteratively substitute each word

xori

using

the candidate words in

Sxi

according to the pre-

determined order. In each substitution step, an

is replaced by a word in

Sxi

, and a new

xswap

is obtained. When an

xswap

is obtained, some

constraints

are used to verify the validity of

xswap

The iterative word substitution process will end if

the model’s prediction is successfully corrupted by

a substituted sentence that sticks to the constraints,

yielding the desired xadv eventually.

Clearly, the transformations and the constraints

are critical to the quality of the ﬁnal

xadv

. In the

remaining part of the paper, we will look deeper

into the transformations and constraints used in

SSAs and their role in creating adversarial sam-

ples

. Next, we brieﬂy introduce the transforma-

tions and constraints that have been used in SSAs.

2.2 Transformations

Transformation is the process of generating the

substitution set

Sxi

for a word

xori

. There are

four representative transformations in the literature.

WordNet Synonym Transformation

con-

structs

Sxi

by querying a word’s synonym using

WordNet (Miller,1995;University,2010), a lexical

database containing the word sense deﬁnition,

synonyms, and antonyms of the words in English.

This transformation is used in PWWS (Ren et al.,

2019) and LexicalAT (Xu et al.,2019).

Word Embedding Space Nearest Neighbor

Transformation

constructs

Sxi

by looking up

the word embedding of

in a word embed-

ding space, and ﬁnding its

nearest neighbors

(

NN) in the word embedding space. Using

for word substitution is based on the assumption

that semantically related words are closer in the

word embedding space. Counter-ﬁtted GloVe em-

bedding space (Mrkši´

c et al.,2016) is the em-

bedding space obtained from post-processing the

GloVe embedding space (Pennington et al.,2014).

Counter-ﬁtting refers to the process of pulling

away antonyms and narrowing the distance be-

tween synonyms. This transformation is adopted in

TextFooler (Jin et al.,2020), Genetic algorithm

attack (Alzantot et al.,2018), and TextFooler-

Adj (Morris et al.,2020a).

In our paper, we do not discuss the relationship between

the validity of an SSA and how an SSA determines which

word in

xori

should be substituted. Most SSAs use word

importance scores to determine what the most salient words

are and substitute the most salient words. Since most SSAs use

similar methods to determine what word should be replaced,

our analyses are generalizable to those SSAs.

Masked Language Model (MLM) Mask-

Inﬁlling Transformation

constructs

Sxi

masking

xori

and asking an MLM to predict

the masked token; MLM’s top-

prediction of the

masked token forms the word substitution set of

. Widely adopted MLMs includes BERT (Devlin

et al.,2019) and RoBERTa (Liu et al.,2019).

Using MLM mask-inﬁlling to generate a candidate

set relies on the belief that MLMs can generate

ﬂuent and semantic-consistent substitutions for

xori

. This method is used in BERT-ATTACK (Li

et al.,2020) and CLARE (Li et al.,2021).

MLM Reconstruction Transformation

also

uses MLMs. When using MLM reconstruction

transformation to generate the candidate set, one

just feeds the MLM with the original sentence

xori

without masking

any tokens in the sentence. Here,

the MLM is not performing mask-inﬁlling but re-

constructs the input tokens from the unmasked in-

puts. For each word

, one can take its top-

token

reconstruction prediction as the candidates. This

transformation relies on the intuition that recon-

struction can generate more semantically similar

words than using mask-inﬁlling. This method is

used in BAE (Garg and Ramakrishnan,2020).

2.3 Constraints

When an

xori

is perturbed by swapping some words

in it, we need to use some constraints to check

whether the perturbed sentence,

xswap

, is semanti-

cally or grammatically valid or not. We use

xswap

instead of

xadv

here as

xswap

does not necessarily

ﬂip the model’s prediction and thus not necessarily

an adversarial sample.

Word Embedding Cosine Similarity

requires

a word

and its perturbed counterpart

to be

close enough in the counter-ﬁtted GloVe embed-

ding space, in terms of cosine similarity. A sub-

stitution is valid if its word embedding’s cosine

similarity with the original word’s embedding is

higher than a pre-deﬁned threshold. This is used in

Genetic Algorithm Attack (Alzantot et al.,2018)

and TextFooler (Jin et al.,2020).

Sentence Embedding Cosine Similarity

de-

mands that the sentence embedding cosine simi-

larity between

xswap

and

xori

are higher than a

pre-deﬁned threshold. Most previous works (Jin

et al.,2020;Li et al.,2020;Garg and Ramakrishnan,

2020;Morris et al.,2020a) use Universal Sentence

Encoder (USE) (Cer et al.,2018) as the sentence

encoder; A2T (Yoo and Qi,2021) use a Distil-

BERT (Sanh et al.,2019) ﬁne-tuned on STS-B (Cer

et al.,2017) as the sentence encoder.

In some previous work (Li et al.,2020), the sen-

tence embedding is computed using the whole sen-

tence

xori

and

xswap

. But most previous works (Jin

et al.,2020;Garg and Ramakrishnan,2020) only

extract a context around the currently swapped

word in

xori

and

xswap

to compute the sentence

embedding. For example, if

is substituted in the

current substitution step, one will compute the sen-

tence embedding between

xori[i−w:i+w+ 1]

and

xadv[i−w:i+w+ 1]

, where

determines

the window size.

is set to 7 in Jin et al. (2020)

and Garg and Ramakrishnan (2020).

LanguageTool

(language-tool python,2022)

is an open-source grammar tool that can detect

spelling errors and grammar mistakes in an input

sentence. It is used in TextFooler-Adj (Morris et al.,

2020a) to evaluate the grammaticality of the adver-

sarial samples.

3 Problems with the Transformations in

SSAs

In this section, we show that the transformations

introduced in Section 2.2 are largely to blame for

the invalid adversarial samples in SSAs. This is

because the substitution set

Sxi

for

is mostly

invalid, either semantically or grammatically.

3.1 WordNet Synonym Substitution Set

Ignores Word Senses

In WordNet, each word is associated with one or

more word senses, and each word sense has its cor-

responding synonym sets. Thus, the substitution

set

Sxi

proposed by WordNet is the union of the

synonym sets of different senses of

. When swap-

ping

with its synonym using WordNet, it is more

sensible to ﬁrst identify the word sense of

xori

, and use the synonym set of the very sense as

the substitution set. However, current attacks using

WordNet synonym substitution neglect the sense

differences within the substitution set (Ren et al.,

2019), which may result in adversarial samples that

semantically deviate from the original input.

As a working example, consider a movie review

that reads "I highly recommend it". The word "rec-

ommend" here corresponds to the word sense of

"express a good opinion of " according to WordNet

and has the synonym set {recommend, commend}.

Aside from the above word sense, "recommend"

also have another word sense: "push for some-

thing", as in "The travel agent recommends not

to travel amid the pandemic". This second word

sense has the synonym set {recommend, urge, ad-

vocate}

. Apparently, the only valid substitution

is "commend", which preserves the semantics of

the original movie review. While "urge" is the syn-

onym of "recommend", it obviously does not ﬁt

in the context and should not be considered as a

possible substitution. We call substituting

with

a synonym that matches the word sense of

xori

amatched sense substitution, and we use mis-

matched sense substitution to refer to swapping

words with the synonym which belongs to the syn-

onym set of a different word sense.

3.1.1 Experiments

To illustrate that mismatched sense substitution is

a problem existing in practical attack algorithms,

we conduct the following analysis. We examine

the adversarial samples generated by PWWS (Ren

et al.,2019), which substitutes words using Word-

Net synonym set. We use a benchmark dataset (Yoo

et al.,2022) that contains the adversarial samples

generated by PWWS against a BERT-based classi-

ﬁer ﬁne-tuned on AG-News (Zhang et al.,2015).

AG-News is a news topic classiﬁcation dataset,

which aims to classify a piece of news into four

categories: world, sports, business, and sci/tech

news. The attack success rate on the testing set

composed of 7.6K samples is 57.25%. More statis-

tics about the datasets can be found in Appendix B.

We categorize the words replaced by PWWS into

three disjoint categories: matched sense substitu-

tion,mismatched sense substitution, and morpho-

logical substitution. The last category, morphologi-

cal substitution, refers to substituting words with a

word that only differs in inﬂectional morphemes

or derivational morphemes

with the original word.

We speciﬁcally isolate morphological substitution

since it is hard to categorize it into either matched

or mismatched sense substitution.

The detailed procedure of categorizing a re-

placed word’s substitution type is as follows: Given

2The word senses and synonyms are from WordNet.

Inﬂectional morphemes are the sufﬁxes that change the

grammatical property of a word but do not create a new word,

such as a verb’s tense or a noun’s number. For example,

recommends→recommend.

Derivational morphemes are afﬁxes or sufﬁxes that

change the form of a word and create a new word, such

as changing a verb into a noun form. For example,

recommend→recommendation.

a pair of

(xori,xadv )

, we ﬁrst use NLTK (Bird

et al.,2009) to perform word sense disambiguation

on each word xiin xori. We use LemmInﬂect and

NLTK, to generate the morphological substitution

set

MLxi

. The matched sense substitution set

Mxi

is constructed using the WordNet synonym set

of the word sense of

xori

; since this synonym

set includes the original word

and may also in-

clude some words in the

MLxi

, we remove

and

words that are already included in the

MLxi

from

the synonym set, forming the ﬁnal matched sense

substitution set,

Mxi

. The mismatched sense sub-

stitution set

MMxi

is constructed by ﬁrst collecting

all synonyms of

that belong to the different word

sense(s) of

xori

using WordNet, and then re-

moving all words that have been included in

MLxi

and Mxi.

After inspecting 4140 adversarial samples pro-

duced by PWWS, we ﬁnd that among

26600

words

that are swapped by PWWS, only

5398 (20.2%)

words fall in the category of matched sense substi-

tution. A majority of

20055 (75.4%)

word substi-

tutions are mismatched sense substitutions, which

should be considered invalid substitutions since us-

ing mismatched sense substitution cannot preserve

the semantics of

xori

and makes

xadv

incompre-

hensible. Last, about

3.8%

of words are substi-

tuted with their morphological related words, such

as converting the part of speech (POS) from verb

to noun or changing the verb tense. These sub-

stitutions, while maintaining the semantics of the

original sentence and perhaps human readable, are

mostly ungrammatical and lead to unnatural adver-

sarial samples. The aforementioned statistics illus-

trate that only about 20% word substitutions pro-

duced by PWWS are real synonym substitutions,

and thus the high attack success rate of 57.25%

should not be surprising since most word replace-

ments are highly questionable.

3.2 Counter-ﬁtted Embedding kNN and

MLM Mask-Inﬁlling/Reconstruction

Contain Few Matched Sense Synonym

As shown in Section 3.1.1, even when using Word-

Net synonyms as the candidate sets, the proportion

of the valid substitutions is unthinkably low. This

makes us more concerned about the word substitu-

tion quality of the other three heuristic transforma-

tions introduced in Section 2.2. These three word

substitution methods mostly rely on assumptions

about the quality of the embedding space or the

Transformations Syn. (matched) Syn. (mismatched) Antonyms Morphemes Others

GloVe-kNN 0.22 1.01 0 1.55 27.22

BERT mask-inﬁll 0.08 0.36 0.06 0.57 28.93

BERT reconstruction 0.14 0.58 0.09 1.19 27.99

Table 1: The average words of different substitution types in the candidate word set of k=30 words. Syn. is short

for Synonym.

ability of the MLM and require setting a hyperpa-

rameter

for the size of the substitution set. To

the best of our knowledge, no previous work has

systematically studied what the candidate sets pro-

posed by the three transformations are like; still,

they have been widely used in SSAs.

3.2.1 Experiments

To understand what those substitution sets are like,

we conduct the following experiment. We use the

benchmark dataset generated by Yoo et al. (2022)

that attacks 7.6k samples in the AG-News testing

data using TextFooler. For each word

xori

that is perturbed into another

xadv

, we use

the following three transformations to obtain the

candidate substitution set: counter-ﬁtted GloVe em-

bedding space, BERT mask-inﬁlling, and BERT

reconstruction.

We only consider the substitu-

tion set of

that are perturbed in

xadv

because

not all words in

xori

will be perturbed by an SSA,

and it is thus more reasonable to consider only the

words that are really perturbed by an SSA. We set

the

NN of counter-ﬁtted GloVe embedding

space transformation and top-

prediction in BERT

mask-inﬁlling/reconstruction to

, a reasonable

number compared with many previous works.

We categorize the candidate words into ﬁve dis-

joint word substitution types. Aside from the three

word substitution types discussed in Section 3.1.1,

we include two other substitution types. The ﬁrst

one is antonym substitution, which is obtained by

querying the antonyms of a word

using WordNet.

Different from synonym substitutions, we do not

separate antonyms into antonyms that matched the

word sense of

xori

and the sense-mismatched

antonyms, since neither of them should be consid-

ered a valid swap in SSAs. The other substitution

type is others, which simply consists of the candi-

date words not falling in the category of synonyms,

antonyms, or morphological substitutions.

In Table 1, we show how different substitution

types comprise the 30 words in the candidate set

For BERT mask-inﬁlling and reconstruction substitution,

we remove punctuation and incomplete subword tokens.

for different transformations on average. It is easy

to tell that only a slight proportion of the substitu-

tion set is made up of synonym substitution for all

three transformation methods, with counter-ﬁtted

GloVe embedding substitution containing the most

synonyms among the three methods, but still only

a sprinkle of about 1 word on average. Moreover,

synonym substitution is mostly composed of mis-

matched sense substitution. When using BERT

mask-inﬁlling as a transformation, there are only

0.08 matched sense substitutions in the top 30 pre-

dictions. While using BERT reconstruction for

producing the candidate set, the matched sense sub-

stitution slightly increases, compared with mask-

inﬁlling, but still only accounts for less than 1 word

in the top-30 reconstruction predictions of BERT.

Within the substitution set, there is on average

about 1 word which is the morphological substitu-

tion of the original word. Surprisingly, using MLM

mask-inﬁlling or reconstruction as transformation,

there is a slight chance that the candidate set con-

sists of antonyms of the original word. It is highly

doubtful whether the semantics is preserved when

swapping the original sentence with antonyms.

The vast majority of the substitution set com-

poses of words that do not fall into the previous

four categories. We provide examples of how the

substitution sets proposed by different transforma-

tions are like in Table 6in the Appendix, showing

that the candidate words in the others substitution

types are mostly unrelated words that should not

be used for word replacement. It is understand-

able that words falling to the other substitution

types are invalid candidates; this is because the

core of SSAs is to replace words with their seman-

tically close counterparts to preserve the semantics

of the original sentence. If a substitution word does

not belong to the synonym set proposed by Word-

Net, it is unlikely that swapping the original word

with this word can preserve the semantics of

xori

We also show some randomly selected adversarial

samples generated by different SSAs that use dif-

ferent transformations in Table 5in the Appendix,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AreSynonymSubstitutionAttacksReallySynonymSubstitutionAttacks?Cheng-HanChiangNationalTaiwanUniversity,Taiwandcml0714@gmail.comHung-yiLeeNationalTaiwanUniversity,Taiwanhungyilee@ntu.edu.twAbstractInthispaper,weexplorethefollowingques-tion:Aresynonymsubstitutionattacksreallysynonymsubstitutionattacks(...

展开>> 收起<<

Are Synonym Substitution Attacks Really Synonym Substitution Attacks.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Are Synonym Substitution Attacks Really Synonym Substitution Attacks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: