Invariant Aggregator for Defending against Federated Backdoor Attacks Xiaoyang WangDimitrios Dimitriadis

2025-05-05 0 0 765.57KB 20 页 10玖币
侵权投诉
Invariant Aggregator for Defending against
Federated Backdoor Attacks
Xiaoyang WangDimitrios Dimitriadis
University of Illinois Urbana-Champaign
xw28@illinois.edu ddimitriadis@gmail.com
Sanmi KoyejoShruti Tople
Stanford University Azure Research
sanmi@stanford.edu Shruti.Tople@microsoft.com
Abstract
Federated learning enables training high-
utility models across several clients without
directly sharing their private data. As a down-
side, the federated setting makes the model
vulnerable to various adversarial attacks in
the presence of malicious clients. Despite the
theoretical and empirical success in defending
against attacks that aim to degrade models’
utility, defense against backdoor attacks that
increase model accuracy on backdoor sam-
ples exclusively without hurting the utility on
other samples remains challenging. To this
end, we first analyze the failure modes of exist-
ing defenses over a flat loss landscape, which
is common for well-designed neural networks
such as Resnet (He et al., 2015) but is often
overlooked by previous works. Then, we pro-
pose an invariant aggregator that redirects the
aggregated update to invariant directions that
are generally useful via selectively masking
out the update elements that favor few and
possibly malicious clients. Theoretical results
suggest that our approach provably mitigates
backdoor attacks and remains effective over
flat loss landscapes. Empirical results on three
datasets with different modalities and vary-
ing numbers of clients further demonstrate
that our approach mitigates a broad class of
backdoor attacks with a negligible cost on the
model utility.
Work partially performed while at Microsoft Reseach.
Authors are ordered alphabetically.
Proceedings of the 27
th
International Conference on Artifi-
cial Intelligence and Statistics (AISTATS) 2024, Valencia,
Spain. PMLR: Volume 238. Copyright 2024 by the au-
thor(s).
1 Introduction
Federated learning enables multiple distrusting clients
to jointly train a machine learning model without shar-
ing their private data directly. However, a rising con-
cern in this setting is the ability of potentially mali-
cious clients to perpetrate backdoor attacks and con-
trol model predictions using a backdoor trigger (Liu
et al., 2018; Bagdasaryan et al., 2020). To this end,
it has been argued that conducting backdoor attacks
in a federated learning setup is practical (Shejwalkar
et al., 2022) and can be effective (Wang et al., 2020).
The impact of such attacks is quite severe in many
mission-critical federated learning applications. For
example, anomaly detection is a common federated
learning task where multiple parties (e.g., banks or
email users) collaboratively train a model that detects
fraud or phishing emails. Backdoor attacks allow the
adversary to circumvent these detections successfully.
Motivating Setting. To better develop a defense
approach, we first analyze the vulnerability of feder-
ated learning systems against backdoor attacks over a
flat loss landscape. A flat loss landscape is considered
an essential factor in the empirical success of neural
network optimization (Li et al., 2017; Sun et al., 2020).
Although neural networks are non-convex in general
and may have complicated landscapes, recent works
(Li et al., 2017; Santurkar et al., 2018) suggest that
improved neural network architecture design such as
the Resnet with skip connections (He et al., 2015) can
significantly flatten the loss landscape and ease the
optimization. As a downside, a flat loss landscape may
allow manipulation of model parameters without hurt-
ing the utility on benign samples, which is precisely the
phenomenon that backdoor adversaries easily exploit.
A key insight is that backdoor attacks over flat loss
arXiv:2210.01834v4 [cs.LG] 8 Mar 2024
Invariant Aggregator for Defending against Federated Backdoor Attacks
landscapes can succeed without incurring significant
differences between benign and malicious client updates
due to the diminished gradient magnitudes from be-
nign clients. We further show that this phenomenon,
combined with other factors, such as the stochastic
nature of the update, can help backdoor adversaries
circumvent existing defenses. Our analysis also broadly
includes data-centric approaches such as the edge-case
attack (Wang et al., 2020) and the trigger inversion
defense (Wang et al., 2019; Zhang et al., 2023).
Our methodology. To avoid the failure modes of
existing defenses over flat loss landscapes, we propose
an invariant aggregator to defend against federated
backdoor attacks under a minority adversary setting
(Shejwalkar et al., 2022). Our defense examines each
dimension of (pseudo-)gradients
1
to avoid overlooking
any backdoor attacks that only manipulate a few ele-
ments without incurring much difference on gradient
vectors. For each dimension, we enforce the aggregated
update points to invariant directions that are generally
useful for most clients instead of favoring a few and
possibly malicious clients. As a result, our defense
remains effective with flat loss landscapes where the
magnitudes of benign gradients can be small.
Our approach. We consider the gradient sign (e.g.,
positive, negative, or zero) as a magnitude-agnostic in-
dicator of benefit. Two clients having a consistent sign
implies that going along the direction pointed by the
gradient can benefit both clients and vice versa. Fol-
lowing this intuition, we employ an AND-mask (Paras-
candolo et al., 2021) to set the gradient dimension with
sign consistency below a given threshold to zero, mask-
ing out gradient elements that benefit a few clients.
However, this alone is insufficient: the malicious clients
can still use outliers to mislead the aggregation re-
sult even if the sign consistency is high. To address
this issue, we further complement AND-mask with the
trimmed-mean estimator (Xie et al., 2020a; Lugosi and
Mendelson, 2021) as a means to remove the outliers.
We theoretically show that the combination of AND-
mask and trimmed-mean estimator is necessary and
sufficient for mitigating backdoor attacks.
Our empirical evaluation employs a broad class of back-
door attacks, as detailed in Section 6.1, to test our
defense. Empirical results on tabular (phishing emails),
visual (CIFAR-10) (Krizhevsky, 2009; McMahan et al.,
2017), and text (Twitter) (Caldas et al., 2018) datasets
demonstrate that our method is effective in defending
against backdoor attacks without degrading utility as
compared to prior works. On average, our approach
1
We overload ”gradient” to indicate any local model
update communicated to the server in the federated set-
ting, e.g., updates could be pseudo-gradients computed as
differences between model updates after several local steps.
decreases the backdoor attack success rate by 61.6%
and only loses 1.2% accuracy on benign samples com-
pared to the standard FedAvg aggregator (McMahan
et al., 2017).
Contributions. Our contributions are as follows:
We analyze the failure modes of multiple prominent
defenses against federated backdoor attacks over
a flat loss landscape.
We develop a combination of defenses using
an AND-mask and the trimmed-mean estimator
against the backdoor attack by focusing on the
dimension-wise invariant gradient directions.
We theoretically analyze our strategy and demon-
strate that a combination of an AND-mask and
the trimmed-mean estimator is necessary and suf-
ficient for mitigating backdoor attacks.
We empirically evaluate our method on three
datasets with varying modality, trigger patterns,
model architecture, and client numbers, as well as
comparing the performance to existing defenses.
2 Related Work
Backdoor Attack. Common backdoor attacks aim at
misleading the model predictions using a trigger (Liu
et al., 2018). The trigger can be digital (Bagdasaryan
et al., 2020), physical (Wenger et al., 2021), semantic
(Wang et al., 2020), or invisible (Li et al., 2021a). Re-
cent works extended backdoor attacks to the federated
learning setting and proposed effective improvements
such as gradient scaling (Bagdasaryan et al., 2020) or
generating edge-case backdoor samples (Wang et al.,
2020). The edge-case backdoor attack shows that us-
ing backdoor samples with low probability density on
benign clients (i.e., unlikely samples w.r.t. the train-
ing distribution) is hard to detect and defend in the
federated learning setting.
Centralized Defense. There is a line of work propos-
ing centralized defenses against backdoor attacks where
the main aim is either detecting the backdoor samples
(Tran et al., 2018) or purifying the model parameters
that are poisoned (Li et al., 2021b). However, applying
such centralized defense to federated learning systems
is infeasible in practice due to limited client data access
in many implementations.
Federated Defenses. Recent works have attempted
to defend against backdoor attacks in federated learning
systems. Sun et al. (2019) shows that weak differential-
private (weak-dp) federated averaging can mitigate
the backdoor attack. However, the weak-dp defense
is circumvented by the improved edge-case federated
Xiaoyang Wang, Dimitrios Dimitriadis, Sanmi Koyejo, Shruti Tople
backdoor attack (Wang et al., 2020). Nguyen et al.
(2021) suggests that the vector-wise cosine similarity
can help detect malicious clients performing backdoor
attacks. The vector-wise cosine similarity is insuffi-
cient when the backdoor attacks can succeed with few
poisoned parameters, incurring little vector-wise dif-
ference (Wu and Wang, 2021). Other defenses against
untargeted poisoning attacks (Blanchard et al., 2017;
Xie et al., 2020a) lack robustness against the backdoor
attack. Sign-SGD with majority vote (Bernstein et al.,
2018, 2019) is similar to our approach, but it always
takes the majority direction instead of focusing on the
invariant directions. Unlike existing works, our defense
encourages the model to pursue invariant directions in
the optimization procedure.
3 Preliminaries
3.1 Notation
We assume a synchronous federated learning system,
where
N
clients collaboratively train an ML model
f
:
X → Y
with parameter
wRd
coordinated by
a server. An input to the model is a sample
x∈ X
with a label
y
. There are
N<N
2
adversarial clients
aiming at corrupting the ML model during training
(Shejwalkar et al., 2022). The
ith
,
i
[1
, ..., N
], client
has
ni
data samples, being benign for
i
[1
, ..., N N
]
or being adversarial for
i
[
NN
+1
, ..., N
]. The syn-
chronous federated learning is conducted in
T
rounds.
In each round
t
[1
, ..., T
], the server broadcasts a
model parameterized by
wt1
to all the participat-
ing clients. We omit the subscript
t
while focusing
on a single round. Then, the
ith
client optimizes
wt1
on their local data samples indexed by
j
and
reports the locally optimized
wt,i
to the server. We
define pseudo-gradient
gt,i
=
wt1wt,i
being the
difference between the locally optimized model and
the broadcasted model from the previous round. For
simplicity, we often use the term “gradient” to refer
to the pseudo-gradient. Once all gradients are up-
loaded, the server aggregates them and produces a
new model with parameters
wt
using the following
rule:
wt
=
wt1PN
i=1
ni
PN
i=1 nigt,i
. The goal of fed-
erated learning is to minimize a weighted risk function
over the
N
clients:
L
(
w
) =
PN
i=1
ni
PN
i=1 niLi
(
w
) =
PN
i=1
ni
PN
i=1 ni
EDi
[
(
f
(
x
;
w
)
, y
)], where
:
R× Y R
is a loss function.
sign
(
·
) denotes an element-wise sign
operator,
denotes the Hadamard product operator,
and W1(·,·) denotes the Wasserstein-1 distance.
3.2 Threat Model
The adversary generates a backdoor data sample
x
by embedding a trigger in a benign data sample
x
and correlating the trigger with a label
y
, which is
different from the label
y
of the benign data sample.
We use
D
to denote the distribution of backdoor data
samples. Then, the malicious clients connect to the
federated learning system and insert backdoor data
samples into the training set. Since federated learning
aims to minimize the risk over all clients’ datasets, the
model can entangle the backdoor signals while trying
to minimize the risk over all clients.
3.3 Assumptions
Bounded heterogeneity is a common assumption in fed-
erated learning literature (Wang et al., 2021). Let
w
i
be a minimum in client
i
’s loss landscape. We assume
the distance between the minimum of benign clients is
bounded. Here,
w
i
is not necessarily a global minimum
or a minimum of any global federated learning model
but a parameter that a local model would converge to
alone if client ihas a sufficient amount of data.
Assumption 1. (Bounded heterogeneity)
w
iw
j∥ ≤
δ, i̸=j, i NN, j NN.
Let
W
be a convex hull of
{w
i|i
= 1
, ..., N N}
,
we assume that malicious clients aim to converge to a
model
w
that is not in the convex hull
W
of benign
clients’ minima. However, we do not assume that all
parameters in the convex hull
W
lead to zero back-
door success rate, especially since the convex hull may
increase as the diameter
δ
of
W
increases. We empiri-
cally justify this separability assumption in Appendix
D. Formally, this separability assumption is stated in
the following.
Assumption 2. (Separable minimum) Let
W
be a
convex hull with diameter
δ
of benign minima
{w
i|
i
= 1
, ..., N N}
and
w
be a minimum of malicious
client, we have w/∈ W.
The estimated gradient often differs from the expected
gradient in stochastic gradient descent. One of the
most common models for estimated gradients is the
additive noise model, which adds a noise term (e.g.,
Gaussian noise) to the expected gradient (Wu et al.,
2019). For a given noise magnitude, the directional
change of an estimated gradient may increase if its
corresponding expected gradient shrinks. Formally,
this noise assumption is stated in the following.
Assumption 3. (Noisy gradient estimation) Let
gi
be
an estimated gradient vector from client
i
, we assume
gi
=
EDi
[
gi
] +
ϵi
where the noise term
ϵi∼ N
(0
,σi
)
and
N
(0
,σi
)is a Gaussian distribution with finite-
norm covariance matrix σi.
Invariant Aggregator for Defending against Federated Backdoor Attacks
4 Motivating Setting
Many recent works (Keskar et al., 2017; Li et al., 2017;
Santurkar et al., 2018; yeh Chiang et al., 2023) suggest
that the loss landscape of neural networks is ”well-
behaved” and has a flat region around the minimum
(e.g., Figure 3 in (Li et al., 2017)). Following previous
works, we discuss the difficulty of defending against fed-
erated backdoor attacks over flat loss landscapes and
present concrete case studies where multiple prominent
defenses can fail. Specifically, we consider a backdoor
attack successful as long as the malicious clients can
control the gradient direction and subsequently mis-
lead model parameters toward the malicious minimum
(Assumption 2, Figure 1).
To begin, we formally define a flat region around a
minimum
w
i
as a path-connected set (i.e., there exists
at least one path that connects two points in the set)
where the gradient magnitude is small. Note that a flat
region may not span over the entire space but exists
within a subspace, and the flatness may depend on the
weight norm x(Petzka et al., 2020a,b).
Definition 4. (Flat region) Let
V
by a subspace of the
parameter space
Rd
, we define a
γ
-flat region that spans
over
V
around a minimum
w
as a path-connected set
B
that includes
w
where the magnitude of gradient
within Vis bounded by γ:ED[gV]∥ ≤ γ.
4.1 Backdoor Attacks over a Flat Loss
Landscape
The magnitude of benign gradients is small over flat
loss landscapes, making it easier for the adversary to
(1) mislead the aggregated gradient to the malicious
minimum
w
and (2) mimic benign clients to circum-
vent detection, e.g., by suffering a lower penalty for
attack effectiveness. Figure 1 provides some intuitive
examples. In addition, the adversary can intentionally
exploit the flatness property.
Less dimensional perturbation requirements.
Let
wt
be the parameter of a global federated learn-
ing model in round
t
, where
wt
is in regions of benign
clients’ loss landscapes with flatness at least
γ
. If a mali-
cious client wants to guarantee that the parameter
wt+1
is closer to the malicious minimum
w
along dimension
k
, the magnitude of its gradient
g
along dimension
k
needs to be at least
PN1
i=1 ni
nγ
, which decreases as
the flatness increases (i.e., smaller
γ
). Intuitively, the
more flat the loss landscapes of benign clients are, the
easier it is for the malicious client to ”overwrite” the
aggregation result (See the horizontal axis of Figure 1b
for an illustration).
Further, backdoor adversaries do not necessarily need
to “overwrite” the aggregation result along all dimen-
sions. Instead, backdoor attacks may perturb only a
few gradient elements to minimize the overall difference
between malicious and benign gradients without losing
effectiveness (See the red dashed gradient in Figure 1c
for an illustration).
Less penalty for mimicking benign clients. Since
the flat loss landscape is a general property of well-
designed neural networks, the loss landscape of a mali-
cious client in the unperturbed subspace can also be
flat. Then, the malicious clients may partially mimic
the behavior of benign clients to circumvent detection
without significantly decreasing the attack success rate.
Specifically, if the loss landscape of a malicious client
within the unperturbed subspace is
γ
-flat and the gra-
dient magnitude of a benign client is upper bounded
by
γ
, then it is easy to see that mimicking the benign
client only decreases the effectiveness of backdoor at-
tacks measured by the loss on backdoor samples by up
to
γγ
, which decreases as the flatness increases (i.e.,
smaller γ), via the Lagrange mean value theorem,
So far, we have seen how flat landscapes can reduce
the gradient perturbation requirement of backdoor at-
tacks and help attacks remain effective while malicious
clients mimic benign clients. Even worse, an adversary
may intentionally work to flatten the loss landscape,
e.g., through edge-case backdoor attacks (Wang et al.,
2020) to further increase the attack’s effectiveness and
circumvent defenses.
Edge-case attack flattens the loss landscape.
The main idea of the edge-case backdoor attack is
minimizing the marginal probability of backdoor sam-
ples in the benign data distribution (Wang et al., 2020).
If a backdoor sample appears on both benign and ma-
licious clients, it gets different label assignments on
different types of clients. Thus, for such backdoor sam-
ples, the loss on benign clients would increase because
at least one data sample in their datasets would be
mispredicted. The more the loss increases, the less
flat the loss landscape is, and vice versa. The edge-
case backdoor attack intentionally prevents backdoor
samples from appearing on benign clients and avoids
the prediction error being observed to flatten loss land-
scapes of benign clients, as is empirically verified in
Appendix D.
4.2 Limitation of Existing Defenses over a
Flat Loss Landscape
Under the flat loss landscape setting, we discuss how
existing defenses can fail to recover the correct gradient
direction, including vector-wise, dimension-wise, and
trigger inversion defenses. The following case study
shows the failure mode of FLTrust, a vector-wise de-
fense for federated learning systems.
摘要:

InvariantAggregatorforDefendingagainstFederatedBackdoorAttacksXiaoyangWang∗DimitriosDimitriadis†UniversityofIllinoisUrbana-Champaignxw28@illinois.eduddimitriadis@gmail.comSanmiKoyejo†ShrutiTople†StanfordUniversityAzureResearchsanmi@stanford.eduShruti.Tople@microsoft.comAbstractFederatedlearningenabl...

展开>> 收起<<
Invariant Aggregator for Defending against Federated Backdoor Attacks Xiaoyang WangDimitrios Dimitriadis.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:20 页 大小:765.57KB 格式:PDF 时间:2025-05-05

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注