Understanding Impacts of Task Similarity on Backdoor Attack and Detection Di Tang Indiana University BloomingtonRui Zhu

2025-04-24 0 0 534.29KB 22 页 10玖币
侵权投诉
Understanding Impacts of Task Similarity on Backdoor Attack and Detection
Di Tang
Indiana University Bloomington
Rui Zhu
Indiana University Bloomington
XiaoFeng wang
Indiana University Bloomington
Haixu Tang
Indiana University Bloomington
Yi Chen
Indiana University Bloomington
Abstract
With extensive studies on backdoor attack and detection,
still fundamental questions are left unanswered regarding the
limits in the adversary’s capability to attack and the defender’s
capability to detect. We believe that answers to these ques-
tions can be found through an in-depth understanding of the
relations between the primary task that a benign model is sup-
posed to accomplish and the backdoor task that a backdoored
model actually performs. For this purpose, we leverage sim-
ilarity metrics in multi-task learning to formally define the
backdoor distance (similarity) between the primary task and
the backdoor task, and analyze existing stealthy backdoor at-
tacks, revealing that most of them fail to effectively reduce
the backdoor distance and even for those that do, still much
room is left to further improve their stealthiness. So we further
design a new method, called TSA attack, to automatically gen-
erate a backdoor model under a given distance constraint, and
demonstrate that our new attack indeed outperforms existing
attacks, making a step closer to understanding the attacker’s
limits. Most importantly, we provide both theoretic results
and experimental evidence on various datasets for the posi-
tive correlation between the backdoor distance and backdoor
detectability, demonstrating that indeed our task similarity
analysis help us better understand backdoor risks and has the
potential to identify more effective mitigations.
1 Introduction
A backdoor is a function hidden inside a machine learning
(ML) model, through which a special pattern on the model’s
input, called a trigger, can induce misclassification of the in-
put. The backdoor attack is considered to be a serious threat
to trustworthy AI, allowing the adversary to control the op-
erations of an ML model, a deep neural network (DNN) in
particular, for the purposes such as evading malware detec-
tion [67], gaming a facial-recognition system to gain unautho-
rized access [50], etc.
Task similarity analysis on backdoor
. With continued ef-
fort on backdoor attack and detection, this emerging threat
has never been fully understood. Even though new attacks and
detections continue to show up, they are mostly responding
to some specific techniques, and therefore offer little insights
into the best the adversary could do and the most effective
strategies the detector could possibly deploy.
Such understanding is related to the similarity between
the primary task that a benign model is supposed to accom-
plish and the backdoor task that a backdoored model actually
performs, which is fundamental to distinguishing between a
backdoored model and its benign counterpart. Therefore, a
Task Similarity Analysis (TSA) between these two tasks can
help us calibrate the extent to which a backdoor is detectable
(differentiable from a benign model) by not only known but
also new detection techniques, inform us which characters of
a backdoor trigger contribute to the improvement of the simi-
larity, thereby making the attack stealthy, and further guides
us to develop even stealthier backdoors so as to better un-
derstand what the adversary could possibly do and what the
limitation of detection could actually be.
Methodology and discoveries
. This paper reports the first
TSA on backdoor attacks and detections. We formally model
the backdoor attack and define backdoor similarity based
upon the task similarity metrics utilized in multi-task learning
to measure the similarity between the backdoor task and its
related primary task. On top of the metric, we further define
the concept of
α
-backdoor to compare the backdoor simi-
larity across different backdoors, and present a technique to
estimate the
α
for an attack in practice. With the concept of
α
-backdoor, we analyze representative attacks proposed so
far to understand the stealthiness they intend to achieve, based
upon their effectiveness in increasing the backdoor similarity.
We find that current attacks only marginally increased the
overall similarity between the backdoor task and the primary
tasks, due to that they failed to simultaneously increase the
similarity of inputs and that of outputs between these two
tasks. Based on this finding, we develop a new attack/analysis
technique, called TSA attack, to automatically generate a back-
doored model under a given similarity constraint. The new
technique is found to be much stealthier than existing attacks,
1
arXiv:2210.06509v1 [cs.CR] 12 Oct 2022
not only in terms of backdoor similarity, but also in terms of
its effectiveness in evading existing detections, as observed in
our experiments. Further, we demonstrate that the backdoor
with high backdoor similarity is indeed hard to detect through
theoretic analysis as well as extensive experimental studies
on four datasets under six representative detections using our
TSA attack together with five representative attacks proposed
in prior researches.
Contributions. Our contributions are as follows:
New direction on backdoor analysis. Our research has
brought a new aspect to the backdoor research, through the
lens of backdoor similarity. Our study reveals the great im-
pacts backdoor similarity has on both backdoor attack and
detection, which can potentially help determine the limits of
the adversary’s capability in a backdoor attack and therefore
enables the development of the best possible response.
New stealthy backdoor attack. Based upon our understand-
ing on backdoor similarity, we developed a novel technique,
TSA attack, to generate a stealthy backdoor under a given
backdoor similarity constraint, helping us better understand
the adversary’s potential and more effectively calibrate the
capability of backdoor detections,
2 Background
2.1 Neural Network
We model a neural network model
f
as a mapping func-
tion from the input space
X
to the output space
Y
, i.e.,
f:X7→ Y
. Further, the model
f
can be decomposed into
two sub-functions:
f(x) = c(g(x))
. Specifically, for a clas-
sification task with
L
classes where the output space
Y=
{0,1,...,L1}
, we define
g:X7→[0,1]L
,
c:[0,1]L7→Y
and
c(g(x)) = argmaxjg(x)j
where
g(x)j
is the
j
-th element of
g(x)
. According to the common understanding, after well
training,
g(x)
approximates the conditional probability of pre-
senting ygiven x, i.e., g(x)yPr(y|x), for yYand xX.
2.2 Backdoor Attack & Detection
Backdoor attack
. In our research, we focus on targeted back-
doors that cause the backdoor infected model
fb
to map
trigger-carrying inputs
A(x)
to the target label
t
different from
the ground truth label of x[5,59,77,82]:
fb(A(x)) = t6=fP(x)(1)
where
fP
is the benign model that outputs the ground truth
label for
x
and
A
is the trigger function that transfers a benign
input to its trigger-carrying counterpart. There are many attack
methods have been proposed to inject backdoors, e.g., [12,
14,20,47,49,50,61,72].
Backdoor detection
. The backdoor detection has been exten-
sively studied recently [21,25,35,44,78]. These proposed
approaches can be categorized based upon their focuses on
different model information: model outputs, model weights
and model inputs. This categorization has been used in our
research to analyze different detection approaches (Section 4).
More specifically, detection on model outputs captures
backdoored models through detecting the difference be-
tween the outputs of backdoored models and benign models
on some inputs. Such detection methods include NC [77],
K-ARM [68], MNTD [83], Spectre [27], TABOR [26],
MESA [58], STRIP [22], SentiNet [13], ABL [43], ULP [38],
etc. Detection of model weights finds a backdoored model
through distinguishing its model weights from those of be-
nign models. Such detection approaches include ABS [48],
ANP [80], NeuronInspect [31], etc. Detection of model inputs
identifies a backdoored model through detecting difference
between inputs that let a backdoored model and a benign
model output similarly. Prominent detections in this category
include SCAn [72], AC [11], SS [74], etc.
2.3 Threat Model
We focus on backdoors for image classification tasks, while
assuming a white-box attack scenario where the adversary can
access the training process. The attacker inject the backdoor
to accomplish the goal formally defined in Section 3.2 and
evade from backdoor detections.
The backdoor defender aim to distinguish backdoored mod-
els from benign models. She can white-box access those back-
doored models and owns a small set of benign inputs. Besides,
the defender may obtain a set of mix inputs containing a large
number of benign inputs together with a few trigger-carrying
inputs, however which inputs carried the trigger in this set is
unknown to her.
3 TSA on Backdoor Attack
Not only does a backdoor attack aim at inducing misclassi-
fication of trigger-carrying inputs to a victim model, but it
is also meant to achieve high stealthiness against backdoor
detections. For this purpose, some attacks [17,49] reduce
the
Lp
-norm of the trigger, i.e.,
kA(x)xkp
, to make trigger-
carrying inputs be similar to benign inputs, while some others
construct the trigger using benign features [46,66]. All these
tricks are designed to evade specific detection methods. Still
less clear is the stealthiness guarantee that those tricks can
provide against other detection methods. Understanding such
stealthiness guarantee requires to model the detectability of
backdoored models, which depends on measuring fundamen-
tal differences between backdoored and benign models that
was not studied before.
To fill in this gap, we analyze the difference between the
task a backdoored model intends to accomplish (called back-
door task) and that of its benign counterpart (called primary
task), which indicates the detectability of the backdoored
model, as demonstrated by our experimental study (see Sec-
tion 4). Between these two tasks, we define the concept of
backdoor similarity – the similarity between the primary and
the backdoor task, by leveraging the task similarity metrics
2
used in multi-task learning studies, and further demonstrate
how to compute the similarity in practice. Applying the met-
ric to existing backdoor attacks, we analyze their impacts
on the backdoor similarity, which consequently affects their
stealthiness against detection techniques (see Section 4). We
further present a new algorithm that automatically generates
a backdoored model under a desirable backdoor similarity,
which leads to a stealthier backdoor attack.
3.1 Task Similarity
Backdoor detection, essentially, is a problem about how to
differentiate between a legitimate task (primary task) a model
is supposed to perform and the compromised task (backdoor
task), which involves backdoor activities, the backdoored
model actually runs. To this end, a detection mechanism needs
to figure out the difference between these two tasks. Accord-
ing to modern learning theory [54], a task can be fully charac-
terized by the distribution on the graph of the function [8]–a
joint distribution on the input space
X
and the output space
Y
.
Formally, a task
T
is characterized by the joint distribution
DT
:
T:=DT(X,Y) = {PrDT(x,y):(x,y)X×Y}
. Note
that, for a well-trained model
f=cg
(defined in Section 2.2)
for task
T
, we have
g(x)yPrDT(y|x)
for all
(x,y)X×Y
.
With this task modeling, the mission of backdoor detection
becomes how to distinguish the distribution of a backdoor task
from that of its primary task. The Fisher’s discriminant theo-
rem [52] tells us that two distributions become easier to dis-
tinguish when they are less similar in terms of some distance
metrics, indicating that the distinguishability (or separability)
of two tasks is positively correlated with their distance. This
motivates us to measure the distance between the distributions
of two tasks. For this purpose, we define the
dHW1
distance,
which covers both Wasserstein-1 distance and H-divergence,
two most common distance metrics for distributions.
Definition 1
(
dHW1
distance)
.
For two distributions
D
and
D0
defined on
X×Y
,
dHW1(D,D0)
measures the distance
between them two as:
dHW1(D,D0) = sup
hH
[EPrD(x,y)h(x,y)EPrD0(x,y)h(x,y)],
(2)
where H={h:X×Y7→ [0,1]}.
Proposition 1.
0dHW1(D,D0)1,
dW1(D,D0)dHW1(D,D0) = 1
2dH(D,D0),(3)
where
dW1(D,D0)
is the Wasserstein-1 distance [4] between
Dand D0, and dH(D,D0)is their H-divergence [7].
Proof. See Appendix 10.1
Proposition 1shows that
dHW1
is representative: it is the
upper-bound of the Wasserstein-1 distance and the half of
the
H
-divergence. More importantly,
dHW1
can be easily
computed: the optimal function
h
in Eq. 2that maximally
separate two distributions can be approximated with a neural
network to distinguish them.
Using the
dHW1
distance, we can now quantify the sim-
ilarity between tasks. In particular,
dHW1(DT1,DT2) =
0
indicates that tasks
T1
and
T2
are identical, and
dHW1(DT1,DT2) = 1
indicates that these two tasks are to-
tally different. Without further notice, we consider the task
similarity between T1 and T2 as 1 dHW1(DT1,DT2).
3.2 Backdoor Similarity
Following we first define primary task and backdoor task and
then utilize
dHW1
to specify backdoor similarity, that is, the
similarity between the primary task and the backdoor task.
Backdoor attack
. As mentioned earlier (Section 2.2), the
well-accepted definition of the backdoor attack is specified by
Eq. 1[5,13,57,59,72,77,82]. According to the definition, the
attack aims to find a trigger function
A(·)
that maps benign
inputs to their trigger-carrying counterparts and also ensures
that these trigger-carrying inputs are misclassfied to the target
class
t
by the backdoor infected model
fb
. In particular, Eq. 1
requires the target class
t
to be different from the source class
of the benign inputs, i.e.,
t6=fP(x)
. This definition, however,
is problematic, since there exists a trivial trigger function
satisfying Eq. 1, i.e.,
A(·)
simply replaces a benign input
x
with another benign input
xt
in the target class
t
. Under
this trigger function, even a completely clean model
fP(·)
becomes “backdoored”, as it outputs the target label on any
“trigger-carrying” inputs xt=A(x).
Clearly, this trivial trigger function does not introduce any
meaningful backdoor to the victim model, even though it
satisfies Eq. 1. To address this issue, we adjust the objective
of the backdoor attack (Eq. 1) as follows:
fb(A(x)) = t, where fP(x)6=t6=fP(A(x)).(4)
Here, the constraint
fP(x)6=t6=fP(A(x))
requires that under
the benign model
fP
, not only the input
x
but also its trigger-
carrying version
A(x)
will not be mapped to the target class
t
,
thereby excluding the trivial attack mentioned above.
Generally speaking, the trigger function
A(·)
may not work
on a model’s whole input space. So we introduce the concept
of backdoor region:
Definition 2
(Backdoor region)
.
The backdoor region
BX
of a backdoor with the trigger function
A(·)
is the set of inputs
on which the backdoored model fbsatisfy Eq. 4, i.e.,
fb(A(x)) = (t,6=fP(A(x)),6=fP(x),xB
fP(A(x)),xX\B.
(5)
Accordingly, we denote
A(B) = {A(x):xB}
as the set of
trigger-carrying inputs.
For example, the backdoor region of a source-agnostic back-
door, which maps the trigger-carrying input
A(x)
whose label
under the benign model is not
t
into
t
, is
B=X\(XfP(x)=t
3
XfP(A(x))=t)
, while the backdoor region for a source-specific
backdoor, which maps the trigger-carrying input
A(x)
with
the true label of the source class
s
(
6=t
) into
t
, is
B=
XfP(x)=s\XfP(A(x))=t
. Here, we use
XC
to denote the subset of
all elements in
X
that satisfy the condition
C
:
XC={x|x
X,C is True}, e.g., XfP(x)=t={x|xX,fP(x) = t}.
Definition of the primary and backdoor tasks
. Now we
can formally define the primary task and the backdoor task
for a backdoored model. Here we denote the prior probability
of input
x
(also the probability of presenting
x
on the primary
task) by Pr(x).
Definition 3
(Primary task
&
distribution)
.
The primary task
of a backdoored model is
TP
, the task that its benign coun-
terpart learns to accomplish.
TP
is characterized by the pri-
mary distribution
DP
, a joint distribution over the input space
X
and the output space
Y
. Specifically,
PrDP(x,y)
is the
probability of presenting
(x,y)
in benign scenarios, and thus
PrDP(y|x) = PrDP(x,y)/Pr(x)
is the conditional probability
that a benign model strives to approximate.
Definition 4
(Backdoor task
&
distribution)
.
The back-
door task of a backdoored model is denoted by
TA,B,t
,
the task that the adversary intends to accomplish by
training a backdoored model.
TA,B,t
is characterized by
the backdoor distribution
DA,B,t
, a joint distribution over
X×Y
. Specifically, the probability of presenting
(x,y)
in
DA,B,t
is
PrDA,B,t(x,y) = P(x,y)/ZA,B,t
, where
ZA,B,t=
R(x,y)X×YP(x,y) = 1Pr(A(B)) + βPr(B)and
P(x,y) = (PrDA,B,t(y|x)Pr(A1(x))β,xA(B)
PrDP(x,y),xX\A(B).
(6)
Here,
A1(x) = {z|A(z) = x}
represents the inverse of the trig-
ger function,
PrDA,B,t(y|x)
is the conditional probability that
the adversary desires to train a backdoored model to approxi-
mate,
β
is a parameter selected by the adversary to amplify the
probability that the trigger-carrying inputs
A(x)
are presented
to the backdoor task. Actually, we consider
β
1+β
as the poi-
soning rate with the assumption that poisoned training data
is randomly drawn from the backdoor distribution. Finally, it
is worth noting that
PrDA,B,t(x,y)
is proportional to
PrDP(x,y)
except on those trigger-carrying inputs A(B).
Formalization of backdoor similarity
. Putting together the
definitions of the primary task, the backdoor task, and the
dHW1
distance between the two tasks (Eq. 2), we are ready
to define backdoor similarity as follows:
Definition 5
(Backdoor distance & similarity)
.
We de-
fine
dHW1(DP,DA,B,t)
as the backdoor distance between
the primary task
TP
and the backdoor task
TA,B,t
and
1
dHW1(DP,DA,B,t)as the backdoor similarity
Theorem 2
(Computing backdoor distance)
.
When
ZA,B,t
1
, where
ZA,B,t
is defined in Eq. 6, the backdoor distance
between DPand DA,B,tis
dHW1(DP,DA,B,t) = R
(x,y)A(B)×Y
max(Prgain(x,y),0),
where
Prgain(x,y) = PrDA,B,t(x,y)PrDP(x,y).
Proof. See Appendix 10.2.
Theorem 2shows that the calculation of backdoor distance
dHW1(DP,DA,B,t)
can be reduced to the calculation of the
probability gain of
PrDA,B,t(x,y)
over
PrDP(x,y)
on those
trigger-carrying inputs
A(B)
, when
ZA,B,t1
. Notably, be-
cause
ZA,B,t=1Pr(A(B)) + βPr(B)
,
ZA,B,t1
is satis-
fied if
Pr(A(B)) βPr(B)
. This implies that if those trigger-
carrying inputs show up more often on the backdoor dis-
tribution than on the primary distribution, we can use the
aforementioned method to compute the backdoor distance.
Parametrization of backdoor distance
. The following
Lemma further reveals the impacts of two parameters
β
and
κon the backdoor distance:
Lemma 3. When, ZA,B,t1and Pr(B) = κPr(A(B)),
dHW1(DP,DA,B,t) = Pr(B)R
(x,y)A(B)×Y
max(]
Prgain(x,y),0),
where ]
Prgain(x,y)equals to
β
ZA,B,t
PrDA,B,t(x)
PrDA,B,t(A(B)) PrDA,B,t(y|x)1
κ
Pr(x)
Pr(A(B)) PrDP(y|x).
(7)
Proof.
The derivation is straightforward, thus we omit it.
As demonstrated by Lemma 3, the two parameters
β
and
κ
are important to the backdoor distance, where
β
is related to
the poisoning rate (Definition 4) and
κ
describes how close
is the probability of presenting trigger-carrying inputs to the
probability of showing their benign counterparts on the pri-
mary distribution (the bigger
κ
the farther away are these two
probabilities).
Let us first consider the range of
β
. Intuitive, a large
β
causes the trigger-carrying inputs more likely to show up on
the backdoor distribution, and therefore could be easier de-
tected. A reasonable backdoor attack should keep
β
smaller
than
1
, which is equivalent to constraining the poisoning rate
(
β
1+β
) below
50%
. On the other hand, a very small
β
will make
the backdoor task more difficult to learn by a model, which
eventually reduces the attack success rate (ASR). A reason-
able backdoor attack should use a
β
greater than
1
κ
: that is,
the chance of seeing trigger-carrying inputs on the backdoor
distribution no lower than that on the primary distribution.
Therefore, we assume
1
κβ1
. Next, we consider the range
of
κ
. A reasonable lower-bound of
κ
is
1
; if
κ<1
, trigger-
carrying inputs show up even more often than their benign
counterparts on the primary distribution, which eventually
lets the backdoored model outputs differently from benign
models on such large portion of inputs and make the backdoor
be easy detected. So, we assume κ1.
With above assumptions on the range of
β
and
κ
, we get the
following theorem to describe the range of backdoor distance.
4
Theorem 4
(Backdoor distance range)
.
Supposing
Pr(B) =
κPr(A(B))
, when
κ1
and
1
κβ1
, we have
ZA,B,t1
,
(β
ZA,B,t1
κ(1S))Pr(B)dHW1(DP,DA,B,t)β
ZA,B,tPr(B),
where S =R
(x,y)A(B)×Y
max{prob,0}and
prob =PrDA,B,t(x)
PrDA,B,t(A(B)) PrDA,B,t(y|x)Pr(x)
Pr(A(B)) PrDP(y|x).
(8)
Proof. See Appendix 10.3.
Corollary 5
(Effects of
β
)
.
Supposing
Pr(B) = κPr(A(B))
,
κ1and κis fixed, when βvaries in range [1
κ,1], we have
SPr(B)
κdHW1(DP,DA,B,t)κPr(B)
κ+κPr(B)Pr(B),
where
S
is defined in Theorem 4. Specially, the lower-
bound
SPr(B)
κ
is achieved when
β=1
κ
, and the upper-bound
κPr(B)
κ+κPr(B)Pr(B)is achieved when β=1.
Proof. See Appendix 10.4.
Corollary 6
(Effects of
κ
)
.
Supposing
Pr(B) = κPr(A(B))
,
β1
and
β
is fixed, when
κ
varies in range
[1
β,)
, we have
SβPr(B)dHW1(DP,DA,B,t)βPr(B),
where
S
is defined in Theorem 4. Specially, the lower-bound
SβPr(B)and the upper-bound βPr(B)are achieved, respec-
tively, when κ=1
β.
Proof. See Appendix 10.5.
3.3 α-Backdoor
Definition of α-backdoor
. Through Lemma 3and Theo-
rem 4, we show that the backdoor distance and its boundaries
are proportional to
Pr(B)
, the probability of showing benign
inputs in the backdoor region
B
on the prior distribution of in-
puts. However, different backdoor attacks may have different
backdoor regions, which is a factor we intend to remove so as
to compare the backdoor similarities across different attacks.
For this purpose, here we define
α
-backdoor, based upon the
same backdoor region Bfor different attacks, as follows:
Definition 6
(
α
-backdoor)
.
We define an
α
-backdoor as a
backdoor whose backdoor distribution is
DA,B,t
, primary dis-
tribution is
DP
and the associated backdoor distance equals
to the product of αand Pr(B), i.e.,
α·Pr(B) = dHW1(DP,DA,B,t).
Approximation of α
. Lemma 3actually provides an ap-
proach to approximate
α
in practice. Specifically, using the
symbol
]
Prgain
that has been defined in Eq. 7, we get a sim-
ple formulation of
α
:
α=R
(x,y)A(B)×Y
max(]
Prgain(x,y),0)
.
Note that
Pr(x)
Pr(A(B)) =Pr(x|xA(B))
and
PrDA,B,t(x)
PrDA,B,t(A(B)) =
PrDA,B,t(x|xA(B))
. This enables us to approximate
α
through sampling only trigger-carrying inputs
xA(B)
.
Also,
PrDA,B,t(y|x)
and
PrDP(y|x)
can be approximated by
a well-trained backdoored model
fb=cbgb
and a well-
trained benign model
fP=cPgP
, respectively, i.e.,
gb(x)y
PrDA,B,t(y|x)
and
gP(x)yPrDP(y|x)
. Supposing that we have
sampled
m
trigger-carrying inputs
{A(x1),A(x2),...,A(xm)}
,
αcan be approximated by:
αm
i=1L1
y=0max{β
ZA,B,tgb(A(xi))y1
κgP(A(xi))y,0}.
(9)
In Eq 9,
β
is chosen by the adversary. Thus, we assume
that
β
is known, when using
α
to analyze different backdoor
attacks. Different from
β
,
κ
is determined by the trigger func-
tion
A
that distinguishes different backdoor attacks from each
other. Next, we demonstrate how to estimate κ.
Estimation of κ
. Recall that
κ=Pr(B)
Pr(A(B))
. Through trivial
transformations, we get that
κ=V(B)
V(A(B))
EPr(x|xB)Pr(x)
EPr(x|xA(B)) Pr(x)
, where
V(B)
and
V(A(B))
are the volumes of set
B
and
A(B)
respec-
tively. Below, we demonstrate how to estimate
Pr(x)
and the
volume ratio κV=V(B)
V(A(B)) separately.
To estimate the prior probability of an input
x
for the pri-
mary task,
Pr(x)
, we employed a Generative Adversarial Net-
work (GAN) [34] and the GAN inversion [81] algorithms.
Specifically, we aim to build a generative network
G
and a
discriminator network
D
using adversarial learning: the dis-
criminator
D
attempts to distinguish the outputs of
G
and
the inputs (e.g., the training samples)
x
of the primary task,
while
G
takes as the input
z
randomly drawn from a Gaussian
distribution with the variance matrix
I
, i.e.,
zN(0,I)
and
attempts to generate the outputs that cannot be distinguished
by
D
. When the adversarial learning converges, the output of
G
approximately follows the prior probability distribution of
x
, i.e.,
Pr(x)Pr(G(z) = x))
. In addition, we incorporated
with a GAN inversion algorithm capable of recovering the
input
z
of
G
from a given
x
, s.t.,
G(z) = x
. Combining the
GAN and the inversion algorithm, we can estimate
Pr(x)
for
a given
x
: we first compute
z
from
x
using the GAN inversion
algorithm, and then estimate Pr(x)using PrN(0,I)(z).
To estimate the volume ratio
κV
, we use a Monte Carlo algo-
rithm similar to that proposed by the prior work [32]. Briefly
speaking, for estimating
V(B)
, we first randomly select an
x
in the backdoor region
B
as the origin, and then uniformly
sample many directions from the origin and approximate
the extent (how long from the origin to the boundary of
B
)
along these directions, and finally, calculate the expectation
of the extents of these directions as
Ext(B)
. According to the
prior work [32],
V(B)
is approximately equal to the product
of
Ext(B)
and the volume of the
n
dimensional unit sphere,
assuming BRn. Therefore, we estimate κVby Ext(B)
Ext(A(B)) .
In general, we estimate
κ
as
Ext(B)
Ext(A(B))
EPr(x|xB)Pr(G1(x))
EPr(x|xA(B)) Pr(G1(x))
,
where
G1(x)
represents the output of a GAN inversion algo-
rithm for a given x. We defer the details to Appendix 9.
5
摘要:

UnderstandingImpactsofTaskSimilarityonBackdoorAttackandDetectionDiTangIndianaUniversityBloomingtonRuiZhuIndianaUniversityBloomingtonXiaoFengwangIndianaUniversityBloomingtonHaixuTangIndianaUniversityBloomingtonYiChenIndianaUniversityBloomingtonAbstractWithextensivestudiesonbackdoorattackanddetection,...

展开>> 收起<<
Understanding Impacts of Task Similarity on Backdoor Attack and Detection Di Tang Indiana University BloomingtonRui Zhu.pdf

共22页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:22 页 大小:534.29KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 22
客服
关注