Understanding Impacts of Task Similarity on Backdoor Attack and Detection Di Tang Indiana University BloomingtonRui Zhu

2025-04-24 0 0 534.29KB 22 页 10玖币

侵权投诉

Understanding Impacts of Task Similarity on Backdoor Attack and Detection

Di Tang

Indiana University Bloomington

Rui Zhu

Indiana University Bloomington

XiaoFeng wang

Indiana University Bloomington

Haixu Tang

Indiana University Bloomington

Yi Chen

Indiana University Bloomington

Abstract

With extensive studies on backdoor attack and detection,

still fundamental questions are left unanswered regarding the

limits in the adversary’s capability to attack and the defender’s

capability to detect. We believe that answers to these ques-

tions can be found through an in-depth understanding of the

relations between the primary task that a benign model is sup-

posed to accomplish and the backdoor task that a backdoored

model actually performs. For this purpose, we leverage sim-

ilarity metrics in multi-task learning to formally deﬁne the

backdoor distance (similarity) between the primary task and

the backdoor task, and analyze existing stealthy backdoor at-

tacks, revealing that most of them fail to effectively reduce

the backdoor distance and even for those that do, still much

room is left to further improve their stealthiness. So we further

design a new method, called TSA attack, to automatically gen-

erate a backdoor model under a given distance constraint, and

demonstrate that our new attack indeed outperforms existing

attacks, making a step closer to understanding the attacker’s

limits. Most importantly, we provide both theoretic results

and experimental evidence on various datasets for the posi-

tive correlation between the backdoor distance and backdoor

detectability, demonstrating that indeed our task similarity

analysis help us better understand backdoor risks and has the

potential to identify more effective mitigations.

1 Introduction

A backdoor is a function hidden inside a machine learning

(ML) model, through which a special pattern on the model’s

input, called a trigger, can induce misclassiﬁcation of the in-

put. The backdoor attack is considered to be a serious threat

to trustworthy AI, allowing the adversary to control the op-

erations of an ML model, a deep neural network (DNN) in

particular, for the purposes such as evading malware detec-

tion [67], gaming a facial-recognition system to gain unautho-

rized access [50], etc.

Task similarity analysis on backdoor

. With continued ef-

fort on backdoor attack and detection, this emerging threat

has never been fully understood. Even though new attacks and

detections continue to show up, they are mostly responding

to some speciﬁc techniques, and therefore offer little insights

into the best the adversary could do and the most effective

strategies the detector could possibly deploy.

Such understanding is related to the similarity between

the primary task that a benign model is supposed to accom-

plish and the backdoor task that a backdoored model actually

performs, which is fundamental to distinguishing between a

backdoored model and its benign counterpart. Therefore, a

Task Similarity Analysis (TSA) between these two tasks can

help us calibrate the extent to which a backdoor is detectable

(differentiable from a benign model) by not only known but

also new detection techniques, inform us which characters of

a backdoor trigger contribute to the improvement of the simi-

larity, thereby making the attack stealthy, and further guides

us to develop even stealthier backdoors so as to better un-

derstand what the adversary could possibly do and what the

limitation of detection could actually be.

Methodology and discoveries

. This paper reports the ﬁrst

TSA on backdoor attacks and detections. We formally model

the backdoor attack and deﬁne backdoor similarity based

upon the task similarity metrics utilized in multi-task learning

to measure the similarity between the backdoor task and its

related primary task. On top of the metric, we further deﬁne

the concept of

-backdoor to compare the backdoor simi-

larity across different backdoors, and present a technique to

estimate the

for an attack in practice. With the concept of

-backdoor, we analyze representative attacks proposed so

far to understand the stealthiness they intend to achieve, based

upon their effectiveness in increasing the backdoor similarity.

We ﬁnd that current attacks only marginally increased the

overall similarity between the backdoor task and the primary

tasks, due to that they failed to simultaneously increase the

similarity of inputs and that of outputs between these two

tasks. Based on this ﬁnding, we develop a new attack/analysis

technique, called TSA attack, to automatically generate a back-

doored model under a given similarity constraint. The new

technique is found to be much stealthier than existing attacks,

arXiv:2210.06509v1 [cs.CR] 12 Oct 2022

not only in terms of backdoor similarity, but also in terms of

its effectiveness in evading existing detections, as observed in

our experiments. Further, we demonstrate that the backdoor

with high backdoor similarity is indeed hard to detect through

theoretic analysis as well as extensive experimental studies

on four datasets under six representative detections using our

TSA attack together with ﬁve representative attacks proposed

in prior researches.

Contributions. Our contributions are as follows:

•

New direction on backdoor analysis. Our research has

brought a new aspect to the backdoor research, through the

lens of backdoor similarity. Our study reveals the great im-

pacts backdoor similarity has on both backdoor attack and

detection, which can potentially help determine the limits of

the adversary’s capability in a backdoor attack and therefore

enables the development of the best possible response.

•

New stealthy backdoor attack. Based upon our understand-

ing on backdoor similarity, we developed a novel technique,

TSA attack, to generate a stealthy backdoor under a given

backdoor similarity constraint, helping us better understand

the adversary’s potential and more effectively calibrate the

capability of backdoor detections,

2 Background

2.1 Neural Network

We model a neural network model

as a mapping func-

tion from the input space

to the output space

, i.e.,

f:X7→ Y

. Further, the model

can be decomposed into

two sub-functions:

f(x) = c(g(x))

. Speciﬁcally, for a clas-

siﬁcation task with

classes where the output space

{0,1,...,L−1}

, we deﬁne

g:X7→[0,1]L

c:[0,1]L7→Y

and

c(g(x)) = argmaxjg(x)j

where

g(x)j

is the

-th element of

g(x)

. According to the common understanding, after well

training,

g(x)

approximates the conditional probability of pre-

senting ygiven x, i.e., g(x)y≈Pr(y|x), for y∈Yand x∈X.

2.2 Backdoor Attack & Detection

Backdoor attack

. In our research, we focus on targeted back-

doors that cause the backdoor infected model

to map

trigger-carrying inputs

A(x)

to the target label

different from

the ground truth label of x[5,59,77,82]:

fb(A(x)) = t6=fP(x)(1)

where

is the benign model that outputs the ground truth

label for

and

is the trigger function that transfers a benign

input to its trigger-carrying counterpart. There are many attack

methods have been proposed to inject backdoors, e.g., [12,

14,20,47,49,50,61,72].

Backdoor detection

. The backdoor detection has been exten-

sively studied recently [21,25,35,44,78]. These proposed

approaches can be categorized based upon their focuses on

different model information: model outputs, model weights

and model inputs. This categorization has been used in our

research to analyze different detection approaches (Section 4).

More speciﬁcally, detection on model outputs captures

backdoored models through detecting the difference be-

tween the outputs of backdoored models and benign models

on some inputs. Such detection methods include NC [77],

K-ARM [68], MNTD [83], Spectre [27], TABOR [26],

MESA [58], STRIP [22], SentiNet [13], ABL [43], ULP [38],

etc. Detection of model weights ﬁnds a backdoored model

through distinguishing its model weights from those of be-

nign models. Such detection approaches include ABS [48],

ANP [80], NeuronInspect [31], etc. Detection of model inputs

identiﬁes a backdoored model through detecting difference

between inputs that let a backdoored model and a benign

model output similarly. Prominent detections in this category

include SCAn [72], AC [11], SS [74], etc.

2.3 Threat Model

We focus on backdoors for image classiﬁcation tasks, while

assuming a white-box attack scenario where the adversary can

access the training process. The attacker inject the backdoor

to accomplish the goal formally deﬁned in Section 3.2 and

evade from backdoor detections.

The backdoor defender aim to distinguish backdoored mod-

els from benign models. She can white-box access those back-

doored models and owns a small set of benign inputs. Besides,

the defender may obtain a set of mix inputs containing a large

number of benign inputs together with a few trigger-carrying

inputs, however which inputs carried the trigger in this set is

unknown to her.

3 TSA on Backdoor Attack

Not only does a backdoor attack aim at inducing misclassi-

ﬁcation of trigger-carrying inputs to a victim model, but it

is also meant to achieve high stealthiness against backdoor

detections. For this purpose, some attacks [17,49] reduce

the

-norm of the trigger, i.e.,

kA(x)−xkp

, to make trigger-

carrying inputs be similar to benign inputs, while some others

construct the trigger using benign features [46,66]. All these

tricks are designed to evade speciﬁc detection methods. Still

less clear is the stealthiness guarantee that those tricks can

provide against other detection methods. Understanding such

stealthiness guarantee requires to model the detectability of

backdoored models, which depends on measuring fundamen-

tal differences between backdoored and benign models that

was not studied before.

To ﬁll in this gap, we analyze the difference between the

task a backdoored model intends to accomplish (called back-

door task) and that of its benign counterpart (called primary

task), which indicates the detectability of the backdoored

model, as demonstrated by our experimental study (see Sec-

tion 4). Between these two tasks, we deﬁne the concept of

backdoor similarity – the similarity between the primary and

the backdoor task, by leveraging the task similarity metrics

used in multi-task learning studies, and further demonstrate

how to compute the similarity in practice. Applying the met-

ric to existing backdoor attacks, we analyze their impacts

on the backdoor similarity, which consequently affects their

stealthiness against detection techniques (see Section 4). We

further present a new algorithm that automatically generates

a backdoored model under a desirable backdoor similarity,

which leads to a stealthier backdoor attack.

3.1 Task Similarity

Backdoor detection, essentially, is a problem about how to

differentiate between a legitimate task (primary task) a model

is supposed to perform and the compromised task (backdoor

task), which involves backdoor activities, the backdoored

model actually runs. To this end, a detection mechanism needs

to ﬁgure out the difference between these two tasks. Accord-

ing to modern learning theory [54], a task can be fully charac-

terized by the distribution on the graph of the function [8]–a

joint distribution on the input space

and the output space

Formally, a task

is characterized by the joint distribution

T:=DT(X,Y) = {PrDT(x,y):(x,y)∈X×Y}

. Note

that, for a well-trained model

f=c◦g

(deﬁned in Section 2.2)

for task

, we have

g(x)y≈PrDT(y|x)

for all

(x,y)∈X×Y

With this task modeling, the mission of backdoor detection

becomes how to distinguish the distribution of a backdoor task

from that of its primary task. The Fisher’s discriminant theo-

rem [52] tells us that two distributions become easier to dis-

tinguish when they are less similar in terms of some distance

metrics, indicating that the distinguishability (or separability)

of two tasks is positively correlated with their distance. This

motivates us to measure the distance between the distributions

of two tasks. For this purpose, we deﬁne the

dH−W1

distance,

which covers both Wasserstein-1 distance and H-divergence,

two most common distance metrics for distributions.

Deﬁnition 1

(

dH−W1

distance)

For two distributions

and

deﬁned on

X×Y

dH−W1(D,D0)

measures the distance

between them two as:

dH−W1(D,D0) = sup

h∈H

[EPrD(x,y)h(x,y)−EPrD0(x,y)h(x,y)],

(2)

where H={h:X×Y7→ [0,1]}.

Proposition 1.

0≤dH−W1(D,D0)≤1,

dW1(D,D0)≤dH−W1(D,D0) = 1

2dH(D,D0),(3)

where

dW1(D,D0)

is the Wasserstein-1 distance [4] between

Dand D0, and dH(D,D0)is their H-divergence [7].

Proof. See Appendix 10.1

Proposition 1shows that

dH−W1

is representative: it is the

upper-bound of the Wasserstein-1 distance and the half of

the

-divergence. More importantly,

dH−W1

can be easily

computed: the optimal function

in Eq. 2that maximally

separate two distributions can be approximated with a neural

network to distinguish them.

Using the

dH−W1

distance, we can now quantify the sim-

ilarity between tasks. In particular,

dH−W1(DT1,DT2) =

indicates that tasks

and

are identical, and

dH−W1(DT1,DT2) = 1

indicates that these two tasks are to-

tally different. Without further notice, we consider the task

similarity between T1 and T2 as 1 −dH−W1(DT1,DT2).

3.2 Backdoor Similarity

Following we ﬁrst deﬁne primary task and backdoor task and

then utilize

dH−W1

to specify backdoor similarity, that is, the

similarity between the primary task and the backdoor task.

Backdoor attack

. As mentioned earlier (Section 2.2), the

well-accepted deﬁnition of the backdoor attack is speciﬁed by

Eq. 1[5,13,57,59,72,77,82]. According to the deﬁnition, the

attack aims to ﬁnd a trigger function

A(·)

that maps benign

inputs to their trigger-carrying counterparts and also ensures

that these trigger-carrying inputs are misclassﬁed to the target

class

by the backdoor infected model

. In particular, Eq. 1

requires the target class

to be different from the source class

of the benign inputs, i.e.,

t6=fP(x)

. This deﬁnition, however,

is problematic, since there exists a trivial trigger function

satisfying Eq. 1, i.e.,

A(·)

simply replaces a benign input

with another benign input

in the target class

. Under

this trigger function, even a completely clean model

fP(·)

becomes “backdoored”, as it outputs the target label on any

“trigger-carrying” inputs xt=A(x).

Clearly, this trivial trigger function does not introduce any

meaningful backdoor to the victim model, even though it

satisﬁes Eq. 1. To address this issue, we adjust the objective

of the backdoor attack (Eq. 1) as follows:

fb(A(x)) = t, where fP(x)6=t6=fP(A(x)).(4)

Here, the constraint

fP(x)6=t6=fP(A(x))

requires that under

the benign model

, not only the input

but also its trigger-

carrying version

A(x)

will not be mapped to the target class

thereby excluding the trivial attack mentioned above.

Generally speaking, the trigger function

A(·)

may not work

on a model’s whole input space. So we introduce the concept

of backdoor region:

Deﬁnition 2

(Backdoor region)

The backdoor region

B⊂X

of a backdoor with the trigger function

A(·)

is the set of inputs

on which the backdoored model fbsatisfy Eq. 4, i.e.,

fb(A(x)) = (t,6=fP(A(x)),6=fP(x),∀x∈B

fP(A(x)),∀x∈X\B.

(5)

Accordingly, we denote

A(B) = {A(x):x∈B}

as the set of

trigger-carrying inputs.

For example, the backdoor region of a source-agnostic back-

door, which maps the trigger-carrying input

A(x)

whose label

under the benign model is not

into

, is

B=X\(XfP(x)=t∪

XfP(A(x))=t)

, while the backdoor region for a source-speciﬁc

backdoor, which maps the trigger-carrying input

A(x)

with

the true label of the source class

(

6=t

) into

, is

XfP(x)=s\XfP(A(x))=t

. Here, we use

to denote the subset of

all elements in

that satisfy the condition

XC={x|x∈

X,C is True}, e.g., XfP(x)=t={x|x∈X,fP(x) = t}.

Deﬁnition of the primary and backdoor tasks

. Now we

can formally deﬁne the primary task and the backdoor task

for a backdoored model. Here we denote the prior probability

of input

(also the probability of presenting

on the primary

task) by Pr(x).

Deﬁnition 3

(Primary task

distribution)

The primary task

of a backdoored model is

, the task that its benign coun-

terpart learns to accomplish.

is characterized by the pri-

mary distribution

, a joint distribution over the input space

and the output space

. Speciﬁcally,

PrDP(x,y)

is the

probability of presenting

(x,y)

in benign scenarios, and thus

PrDP(y|x) = PrDP(x,y)/Pr(x)

is the conditional probability

that a benign model strives to approximate.

Deﬁnition 4

(Backdoor task

distribution)

The back-

door task of a backdoored model is denoted by

TA,B,t

the task that the adversary intends to accomplish by

training a backdoored model.

TA,B,t

is characterized by

the backdoor distribution

DA,B,t

, a joint distribution over

X×Y

. Speciﬁcally, the probability of presenting

(x,y)

DA,B,t

PrDA,B,t(x,y) = P(x,y)/ZA,B,t

, where

ZA,B,t=

R(x,y)∈X×YP(x,y) = 1−Pr(A(B)) + βPr(B)and

P(x,y) = (PrDA,B,t(y|x)Pr(A−1(x))β,x∈A(B)

PrDP(x,y),x∈X\A(B).

(6)

Here,

A−1(x) = {z|A(z) = x}

represents the inverse of the trig-

ger function,

PrDA,B,t(y|x)

is the conditional probability that

the adversary desires to train a backdoored model to approxi-

mate,

is a parameter selected by the adversary to amplify the

probability that the trigger-carrying inputs

A(x)

are presented

to the backdoor task. Actually, we consider

1+β

as the poi-

soning rate with the assumption that poisoned training data

is randomly drawn from the backdoor distribution. Finally, it

is worth noting that

PrDA,B,t(x,y)

is proportional to

PrDP(x,y)

except on those trigger-carrying inputs A(B).

Formalization of backdoor similarity

. Putting together the

deﬁnitions of the primary task, the backdoor task, and the

dH−W1

distance between the two tasks (Eq. 2), we are ready

to deﬁne backdoor similarity as follows:

Deﬁnition 5

(Backdoor distance & similarity)

We de-

ﬁne

dH−W1(DP,DA,B,t)

as the backdoor distance between

the primary task

and the backdoor task

TA,B,t

and

1−

dH−W1(DP,DA,B,t)as the backdoor similarity

Theorem 2

(Computing backdoor distance)

When

ZA,B,t≥

, where

ZA,B,t

is deﬁned in Eq. 6, the backdoor distance

between DPand DA,B,tis

dH−W1(DP,DA,B,t) = R

(x,y)∈A(B)×Y

max(Prgain(x,y),0),

where

Prgain(x,y) = PrDA,B,t(x,y)−PrDP(x,y).

Proof. See Appendix 10.2.

Theorem 2shows that the calculation of backdoor distance

dH−W1(DP,DA,B,t)

can be reduced to the calculation of the

probability gain of

PrDA,B,t(x,y)

over

PrDP(x,y)

on those

trigger-carrying inputs

A(B)

, when

ZA,B,t≥1

. Notably, be-

cause

ZA,B,t=1−Pr(A(B)) + βPr(B)

ZA,B,t≥1

is satis-

ﬁed if

Pr(A(B)) ≤βPr(B)

. This implies that if those trigger-

carrying inputs show up more often on the backdoor dis-

tribution than on the primary distribution, we can use the

aforementioned method to compute the backdoor distance.

Parametrization of backdoor distance

. The following

Lemma further reveals the impacts of two parameters

and

κon the backdoor distance:

Lemma 3. When, ZA,B,t≥1and Pr(B) = κPr(A(B)),

dH−W1(DP,DA,B,t) = Pr(B)R

(x,y)∈A(B)×Y

max(]

Prgain(x,y),0),

where ]

Prgain(x,y)equals to

ZA,B,t

PrDA,B,t(x)

PrDA,B,t(A(B)) PrDA,B,t(y|x)−1

Pr(x)

Pr(A(B)) PrDP(y|x).

(7)

Proof.

The derivation is straightforward, thus we omit it.

As demonstrated by Lemma 3, the two parameters

and

are important to the backdoor distance, where

is related to

the poisoning rate (Deﬁnition 4) and

describes how close

is the probability of presenting trigger-carrying inputs to the

probability of showing their benign counterparts on the pri-

mary distribution (the bigger

the farther away are these two

probabilities).

Let us ﬁrst consider the range of

. Intuitive, a large

causes the trigger-carrying inputs more likely to show up on

the backdoor distribution, and therefore could be easier de-

tected. A reasonable backdoor attack should keep

smaller

than

, which is equivalent to constraining the poisoning rate

(

1+β

) below

50%

. On the other hand, a very small

will make

the backdoor task more difﬁcult to learn by a model, which

eventually reduces the attack success rate (ASR). A reason-

able backdoor attack should use a

greater than

: that is,

the chance of seeing trigger-carrying inputs on the backdoor

distribution no lower than that on the primary distribution.

Therefore, we assume

κ≤β≤1

. Next, we consider the range

. A reasonable lower-bound of

; if

κ<1

, trigger-

carrying inputs show up even more often than their benign

counterparts on the primary distribution, which eventually

lets the backdoored model outputs differently from benign

models on such large portion of inputs and make the backdoor

be easy detected. So, we assume κ≥1.

With above assumptions on the range of

and

, we get the

following theorem to describe the range of backdoor distance.

Theorem 4

(Backdoor distance range)

Supposing

Pr(B) =

κPr(A(B))

, when

κ≥1

and

κ≤β≤1

, we have

ZA,B,t≥1

(β

ZA,B,t−1

κ(1−S))Pr(B)≤dH−W1(DP,DA,B,t)≤β

ZA,B,tPr(B),

where S =R

(x,y)∈A(B)×Y

max{∆prob,0}and

∆prob =PrDA,B,t(x)

PrDA,B,t(A(B)) PrDA,B,t(y|x)−Pr(x)

Pr(A(B)) PrDP(y|x).

(8)

Proof. See Appendix 10.3.

Corollary 5

(Effects of

)

Supposing

Pr(B) = κPr(A(B))

κ≥1and κis ﬁxed, when βvaries in range [1

κ,1], we have

SPr(B)

κ≤dH−W1(DP,DA,B,t)≤κPr(B)

κ+κPr(B)−Pr(B),

where

is deﬁned in Theorem 4. Specially, the lower-

bound

SPr(B)

is achieved when

β=1

, and the upper-bound

κPr(B)

κ+κPr(B)−Pr(B)is achieved when β=1.

Proof. See Appendix 10.4.

Corollary 6

(Effects of

)

Supposing

Pr(B) = κPr(A(B))

β≤1

and

is ﬁxed, when

varies in range

β,∞)

, we have

SβPr(B)≤dH−W1(DP,DA,B,t)≤βPr(B),

where

is deﬁned in Theorem 4. Specially, the lower-bound

SβPr(B)and the upper-bound βPr(B)are achieved, respec-

tively, when κ=1

β.

Proof. See Appendix 10.5.

3.3 α-Backdoor

Deﬁnition of α-backdoor

. Through Lemma 3and Theo-

rem 4, we show that the backdoor distance and its boundaries

are proportional to

Pr(B)

, the probability of showing benign

inputs in the backdoor region

on the prior distribution of in-

puts. However, different backdoor attacks may have different

backdoor regions, which is a factor we intend to remove so as

to compare the backdoor similarities across different attacks.

For this purpose, here we deﬁne

-backdoor, based upon the

same backdoor region Bfor different attacks, as follows:

Deﬁnition 6

(

-backdoor)

We deﬁne an

-backdoor as a

backdoor whose backdoor distribution is

DA,B,t

, primary dis-

tribution is

and the associated backdoor distance equals

to the product of αand Pr(B), i.e.,

α·Pr(B) = dH−W1(DP,DA,B,t).

Approximation of α

. Lemma 3actually provides an ap-

proach to approximate

in practice. Speciﬁcally, using the

symbol

]

Prgain

that has been deﬁned in Eq. 7, we get a sim-

ple formulation of

α=R

(x,y)∈A(B)×Y

max(]

Prgain(x,y),0)

Note that

Pr(x)

Pr(A(B)) =Pr(x|x∈A(B))

and

PrDA,B,t(x)

PrDA,B,t(A(B)) =

PrDA,B,t(x|x∈A(B))

. This enables us to approximate

through sampling only trigger-carrying inputs

x∈A(B)

Also,

PrDA,B,t(y|x)

and

PrDP(y|x)

can be approximated by

a well-trained backdoored model

fb=cb◦gb

and a well-

trained benign model

fP=cP◦gP

, respectively, i.e.,

gb(x)y≈

PrDA,B,t(y|x)

and

gP(x)y≈PrDP(y|x)

. Supposing that we have

sampled

trigger-carrying inputs

{A(x1),A(x2),...,A(xm)}

αcan be approximated by:

α≈∑m

i=1∑L−1

y=0max{β

ZA,B,tgb(A(xi))y−1

κgP(A(xi))y,0}.

(9)

In Eq 9,

is chosen by the adversary. Thus, we assume

that

is known, when using

to analyze different backdoor

attacks. Different from

is determined by the trigger func-

tion

that distinguishes different backdoor attacks from each

other. Next, we demonstrate how to estimate κ.

Estimation of κ

. Recall that

κ=Pr(B)

Pr(A(B))

. Through trivial

transformations, we get that

κ=V(B)

V(A(B))

EPr(x|x∈B)Pr(x)

EPr(x|x∈A(B)) Pr(x)

, where

V(B)

and

V(A(B))

are the volumes of set

and

A(B)

respec-

tively. Below, we demonstrate how to estimate

Pr(x)

and the

volume ratio κV=V(B)

V(A(B)) separately.

To estimate the prior probability of an input

for the pri-

mary task,

Pr(x)

, we employed a Generative Adversarial Net-

work (GAN) [34] and the GAN inversion [81] algorithms.

Speciﬁcally, we aim to build a generative network

and a

discriminator network

using adversarial learning: the dis-

criminator

attempts to distinguish the outputs of

and

the inputs (e.g., the training samples)

of the primary task,

while

takes as the input

randomly drawn from a Gaussian

distribution with the variance matrix

, i.e.,

z∼N(0,I)

and

attempts to generate the outputs that cannot be distinguished

. When the adversarial learning converges, the output of

approximately follows the prior probability distribution of

, i.e.,

Pr(x)≈Pr(G(z) = x))

. In addition, we incorporated

with a GAN inversion algorithm capable of recovering the

input

from a given

, s.t.,

G(z) = x

. Combining the

GAN and the inversion algorithm, we can estimate

Pr(x)

for

a given

: we ﬁrst compute

from

using the GAN inversion

algorithm, and then estimate Pr(x)using PrN(0,I)(z).

To estimate the volume ratio

κV

, we use a Monte Carlo algo-

rithm similar to that proposed by the prior work [32]. Brieﬂy

speaking, for estimating

V(B)

, we ﬁrst randomly select an

in the backdoor region

as the origin, and then uniformly

sample many directions from the origin and approximate

the extent (how long from the origin to the boundary of

)

along these directions, and ﬁnally, calculate the expectation

of the extents of these directions as

Ext(B)

. According to the

prior work [32],

V(B)

is approximately equal to the product

Ext(B)

and the volume of the

dimensional unit sphere,

assuming B⊂Rn. Therefore, we estimate κVby Ext(B)

Ext(A(B)) .

In general, we estimate

Ext(B)

Ext(A(B))

EPr(x|x∈B)Pr(G−1(x))

EPr(x|x∈A(B)) Pr(G−1(x))

where

G−1(x)

represents the output of a GAN inversion algo-

rithm for a given x. We defer the details to Appendix 9.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnderstandingImpactsofTaskSimilarityonBackdoorAttackandDetectionDiTangIndianaUniversityBloomingtonRuiZhuIndianaUniversityBloomingtonXiaoFengwangIndianaUniversityBloomingtonHaixuTangIndianaUniversityBloomingtonYiChenIndianaUniversityBloomingtonAbstractWithextensivestudiesonbackdoorattackanddetection,...

展开>> 收起<<

Understanding Impacts of Task Similarity on Backdoor Attack and Detection Di Tang Indiana University BloomingtonRui Zhu.pdf

共22页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Understanding Impacts of Task Similarity on Backdoor Attack and Detection Di Tang Indiana University BloomingtonRui Zhu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: