Invariant Aggregator for Defending against Federated Backdoor Attacks Xiaoyang WangDimitrios Dimitriadis

2025-05-05 0 0 765.57KB 20 页 10玖币

侵权投诉

Invariant Aggregator for Defending against

Federated Backdoor Attacks

Xiaoyang Wang∗Dimitrios Dimitriadis†

University of Illinois Urbana-Champaign

xw28@illinois.edu ddimitriadis@gmail.com

Sanmi Koyejo†Shruti Tople†

Stanford University Azure Research

sanmi@stanford.edu Shruti.Tople@microsoft.com

Abstract

Federated learning enables training high-

utility models across several clients without

directly sharing their private data. As a down-

side, the federated setting makes the model

vulnerable to various adversarial attacks in

the presence of malicious clients. Despite the

theoretical and empirical success in defending

against attacks that aim to degrade models’

utility, defense against backdoor attacks that

increase model accuracy on backdoor sam-

ples exclusively without hurting the utility on

other samples remains challenging. To this

end, we ﬁrst analyze the failure modes of exist-

ing defenses over a ﬂat loss landscape, which

is common for well-designed neural networks

such as Resnet (He et al., 2015) but is often

overlooked by previous works. Then, we pro-

pose an invariant aggregator that redirects the

aggregated update to invariant directions that

are generally useful via selectively masking

out the update elements that favor few and

possibly malicious clients. Theoretical results

suggest that our approach provably mitigates

backdoor attacks and remains eﬀective over

ﬂat loss landscapes. Empirical results on three

datasets with diﬀerent modalities and vary-

ing numbers of clients further demonstrate

that our approach mitigates a broad class of

backdoor attacks with a negligible cost on the

model utility.

∗Work partially performed while at Microsoft Reseach.

†Authors are ordered alphabetically.

Proceedings of the 27

International Conference on Artiﬁ-

cial Intelligence and Statistics (AISTATS) 2024, Valencia,

thor(s).

1 Introduction

Federated learning enables multiple distrusting clients

to jointly train a machine learning model without shar-

ing their private data directly. However, a rising con-

cern in this setting is the ability of potentially mali-

cious clients to perpetrate backdoor attacks and con-

trol model predictions using a backdoor trigger (Liu

et al., 2018; Bagdasaryan et al., 2020). To this end,

it has been argued that conducting backdoor attacks

in a federated learning setup is practical (Shejwalkar

et al., 2022) and can be eﬀective (Wang et al., 2020).

The impact of such attacks is quite severe in many

mission-critical federated learning applications. For

example, anomaly detection is a common federated

learning task where multiple parties (e.g., banks or

email users) collaboratively train a model that detects

fraud or phishing emails. Backdoor attacks allow the

adversary to circumvent these detections successfully.

Motivating Setting. To better develop a defense

approach, we ﬁrst analyze the vulnerability of feder-

ated learning systems against backdoor attacks over a

ﬂat loss landscape. A ﬂat loss landscape is considered

an essential factor in the empirical success of neural

network optimization (Li et al., 2017; Sun et al., 2020).

Although neural networks are non-convex in general

and may have complicated landscapes, recent works

(Li et al., 2017; Santurkar et al., 2018) suggest that

improved neural network architecture design such as

the Resnet with skip connections (He et al., 2015) can

signiﬁcantly ﬂatten the loss landscape and ease the

optimization. As a downside, a ﬂat loss landscape may

allow manipulation of model parameters without hurt-

ing the utility on benign samples, which is precisely the

phenomenon that backdoor adversaries easily exploit.

A key insight is that backdoor attacks over ﬂat loss

arXiv:2210.01834v4 [cs.LG] 8 Mar 2024

Invariant Aggregator for Defending against Federated Backdoor Attacks

landscapes can succeed without incurring signiﬁcant

diﬀerences between benign and malicious client updates

due to the diminished gradient magnitudes from be-

nign clients. We further show that this phenomenon,

combined with other factors, such as the stochastic

nature of the update, can help backdoor adversaries

circumvent existing defenses. Our analysis also broadly

includes data-centric approaches such as the edge-case

attack (Wang et al., 2020) and the trigger inversion

defense (Wang et al., 2019; Zhang et al., 2023).

Our methodology. To avoid the failure modes of

existing defenses over ﬂat loss landscapes, we propose

an invariant aggregator to defend against federated

backdoor attacks under a minority adversary setting

(Shejwalkar et al., 2022). Our defense examines each

dimension of (pseudo-)gradients

to avoid overlooking

any backdoor attacks that only manipulate a few ele-

ments without incurring much diﬀerence on gradient

vectors. For each dimension, we enforce the aggregated

update points to invariant directions that are generally

useful for most clients instead of favoring a few and

possibly malicious clients. As a result, our defense

remains eﬀective with ﬂat loss landscapes where the

magnitudes of benign gradients can be small.

Our approach. We consider the gradient sign (e.g.,

positive, negative, or zero) as a magnitude-agnostic in-

dicator of beneﬁt. Two clients having a consistent sign

implies that going along the direction pointed by the

gradient can beneﬁt both clients and vice versa. Fol-

lowing this intuition, we employ an AND-mask (Paras-

candolo et al., 2021) to set the gradient dimension with

sign consistency below a given threshold to zero, mask-

ing out gradient elements that beneﬁt a few clients.

However, this alone is insuﬃcient: the malicious clients

can still use outliers to mislead the aggregation re-

sult even if the sign consistency is high. To address

this issue, we further complement AND-mask with the

trimmed-mean estimator (Xie et al., 2020a; Lugosi and

Mendelson, 2021) as a means to remove the outliers.

We theoretically show that the combination of AND-

mask and trimmed-mean estimator is necessary and

suﬃcient for mitigating backdoor attacks.

Our empirical evaluation employs a broad class of back-

door attacks, as detailed in Section 6.1, to test our

defense. Empirical results on tabular (phishing emails),

visual (CIFAR-10) (Krizhevsky, 2009; McMahan et al.,

2017), and text (Twitter) (Caldas et al., 2018) datasets

demonstrate that our method is eﬀective in defending

against backdoor attacks without degrading utility as

compared to prior works. On average, our approach

We overload ”gradient” to indicate any local model

update communicated to the server in the federated set-

ting, e.g., updates could be pseudo-gradients computed as

diﬀerences between model updates after several local steps.

decreases the backdoor attack success rate by 61.6%

and only loses 1.2% accuracy on benign samples com-

pared to the standard FedAvg aggregator (McMahan

et al., 2017).

Contributions. Our contributions are as follows:

•

We analyze the failure modes of multiple prominent

defenses against federated backdoor attacks over

a ﬂat loss landscape.

•

We develop a combination of defenses using

an AND-mask and the trimmed-mean estimator

against the backdoor attack by focusing on the

dimension-wise invariant gradient directions.

•

We theoretically analyze our strategy and demon-

strate that a combination of an AND-mask and

the trimmed-mean estimator is necessary and suf-

ﬁcient for mitigating backdoor attacks.

•

We empirically evaluate our method on three

datasets with varying modality, trigger patterns,

model architecture, and client numbers, as well as

comparing the performance to existing defenses.

2 Related Work

Backdoor Attack. Common backdoor attacks aim at

misleading the model predictions using a trigger (Liu

et al., 2018). The trigger can be digital (Bagdasaryan

et al., 2020), physical (Wenger et al., 2021), semantic

(Wang et al., 2020), or invisible (Li et al., 2021a). Re-

cent works extended backdoor attacks to the federated

learning setting and proposed eﬀective improvements

such as gradient scaling (Bagdasaryan et al., 2020) or

generating edge-case backdoor samples (Wang et al.,

2020). The edge-case backdoor attack shows that us-

ing backdoor samples with low probability density on

benign clients (i.e., unlikely samples w.r.t. the train-

ing distribution) is hard to detect and defend in the

federated learning setting.

Centralized Defense. There is a line of work propos-

ing centralized defenses against backdoor attacks where

the main aim is either detecting the backdoor samples

(Tran et al., 2018) or purifying the model parameters

that are poisoned (Li et al., 2021b). However, applying

such centralized defense to federated learning systems

is infeasible in practice due to limited client data access

in many implementations.

Federated Defenses. Recent works have attempted

to defend against backdoor attacks in federated learning

systems. Sun et al. (2019) shows that weak diﬀerential-

private (weak-dp) federated averaging can mitigate

the backdoor attack. However, the weak-dp defense

is circumvented by the improved edge-case federated

Xiaoyang Wang, Dimitrios Dimitriadis, Sanmi Koyejo, Shruti Tople

backdoor attack (Wang et al., 2020). Nguyen et al.

(2021) suggests that the vector-wise cosine similarity

can help detect malicious clients performing backdoor

attacks. The vector-wise cosine similarity is insuﬃ-

cient when the backdoor attacks can succeed with few

poisoned parameters, incurring little vector-wise dif-

ference (Wu and Wang, 2021). Other defenses against

untargeted poisoning attacks (Blanchard et al., 2017;

Xie et al., 2020a) lack robustness against the backdoor

attack. Sign-SGD with majority vote (Bernstein et al.,

2018, 2019) is similar to our approach, but it always

takes the majority direction instead of focusing on the

invariant directions. Unlike existing works, our defense

encourages the model to pursue invariant directions in

the optimization procedure.

3 Preliminaries

3.1 Notation

We assume a synchronous federated learning system,

where

clients collaboratively train an ML model

X → Y

with parameter

w∈Rd

coordinated by

a server. An input to the model is a sample

x∈ X

with a label

. There are

N′<N

adversarial clients

aiming at corrupting the ML model during training

(Shejwalkar et al., 2022). The

ith

i∈

, ..., N

], client

has

data samples, being benign for

i∈

, ..., N −N′

]

or being adversarial for

i∈

[

N−N′

, ..., N

]. The syn-

chronous federated learning is conducted in

rounds.

In each round

t∈

, ..., T

], the server broadcasts a

model parameterized by

wt−1

to all the participat-

ing clients. We omit the subscript

while focusing

on a single round. Then, the

ith

client optimizes

wt−1

on their local data samples indexed by

and

reports the locally optimized

wt,i

to the server. We

deﬁne pseudo-gradient

gt,i

wt−1−wt,i

being the

diﬀerence between the locally optimized model and

the broadcasted model from the previous round. For

simplicity, we often use the term “gradient” to refer

to the pseudo-gradient. Once all gradients are up-

loaded, the server aggregates them and produces a

new model with parameters

using the following

rule:

wt−1−PN

i=1

i=1 nigt,i

. The goal of fed-

erated learning is to minimize a weighted risk function

over the

clients:

(

) =

i=1

i=1 niLi

(

) =

i=1

i=1 ni

EDi

[

ℓ

(

;

)

, y

)], where

ℓ

R× Y → R

is a loss function.

sign

(

) denotes an element-wise sign

operator,

⊙

denotes the Hadamard product operator,

and W1(·,·) denotes the Wasserstein-1 distance.

3.2 Threat Model

The adversary generates a backdoor data sample

x′

by embedding a trigger in a benign data sample

and correlating the trigger with a label

y′

, which is

diﬀerent from the label

of the benign data sample.

We use

D′

to denote the distribution of backdoor data

samples. Then, the malicious clients connect to the

federated learning system and insert backdoor data

samples into the training set. Since federated learning

aims to minimize the risk over all clients’ datasets, the

model can entangle the backdoor signals while trying

to minimize the risk over all clients.

3.3 Assumptions

Bounded heterogeneity is a common assumption in fed-

erated learning literature (Wang et al., 2021). Let

w∗

be a minimum in client

’s loss landscape. We assume

the distance between the minimum of benign clients is

bounded. Here,

w∗

is not necessarily a global minimum

or a minimum of any global federated learning model

but a parameter that a local model would converge to

alone if client ihas a suﬃcient amount of data.

Assumption 1. (Bounded heterogeneity)

∥w∗

i−w∗

j∥ ≤

δ, ∀i̸=j, i ≤N−N′, j ≤N−N′.

Let

W∗

be a convex hull of

{w∗

i|i

= 1

, ..., N −N′}

we assume that malicious clients aim to converge to a

model

w′

that is not in the convex hull

W∗

of benign

clients’ minima. However, we do not assume that all

parameters in the convex hull

W∗

lead to zero back-

door success rate, especially since the convex hull may

increase as the diameter

W∗

increases. We empiri-

cally justify this separability assumption in Appendix

D. Formally, this separability assumption is stated in

the following.

Assumption 2. (Separable minimum) Let

W∗

be a

convex hull with diameter

of benign minima

{w∗

= 1

, ..., N −N′}

and

w′

be a minimum of malicious

client, we have w′/∈ W∗.

The estimated gradient often diﬀers from the expected

gradient in stochastic gradient descent. One of the

most common models for estimated gradients is the

additive noise model, which adds a noise term (e.g.,

Gaussian noise) to the expected gradient (Wu et al.,

2019). For a given noise magnitude, the directional

change of an estimated gradient may increase if its

corresponding expected gradient shrinks. Formally,

this noise assumption is stated in the following.

Assumption 3. (Noisy gradient estimation) Let

an estimated gradient vector from client

, we assume

EDi

[

] +

ϵi

where the noise term

ϵi∼ N

,σi

)

and

,σi

)is a Gaussian distribution with ﬁnite-

norm covariance matrix σi.

Invariant Aggregator for Defending against Federated Backdoor Attacks

4 Motivating Setting

Many recent works (Keskar et al., 2017; Li et al., 2017;

Santurkar et al., 2018; yeh Chiang et al., 2023) suggest

that the loss landscape of neural networks is ”well-

behaved” and has a ﬂat region around the minimum

(e.g., Figure 3 in (Li et al., 2017)). Following previous

works, we discuss the diﬃculty of defending against fed-

erated backdoor attacks over ﬂat loss landscapes and

present concrete case studies where multiple prominent

defenses can fail. Speciﬁcally, we consider a backdoor

attack successful as long as the malicious clients can

control the gradient direction and subsequently mis-

lead model parameters toward the malicious minimum

(Assumption 2, Figure 1).

To begin, we formally deﬁne a ﬂat region around a

minimum

w∗

as a path-connected set (i.e., there exists

at least one path that connects two points in the set)

where the gradient magnitude is small. Note that a ﬂat

region may not span over the entire space but exists

within a subspace, and the ﬂatness may depend on the

weight norm ∥x∥(Petzka et al., 2020a,b).

Deﬁnition 4. (Flat region) Let

by a subspace of the

parameter space

, we deﬁne a

-ﬂat region that spans

over

around a minimum

w∗

as a path-connected set

B∗

that includes

w∗

where the magnitude of gradient

within Vis bounded by γ:∥ED[gV]∥ ≤ γ.

4.1 Backdoor Attacks over a Flat Loss

Landscape

The magnitude of benign gradients is small over ﬂat

loss landscapes, making it easier for the adversary to

(1) mislead the aggregated gradient to the malicious

minimum

w′

and (2) mimic benign clients to circum-

vent detection, e.g., by suﬀering a lower penalty for

attack eﬀectiveness. Figure 1 provides some intuitive

examples. In addition, the adversary can intentionally

exploit the ﬂatness property.

Less dimensional perturbation requirements.

Let

be the parameter of a global federated learn-

ing model in round

, where

is in regions of benign

clients’ loss landscapes with ﬂatness at least

. If a mali-

cious client wants to guarantee that the parameter

wt+1

is closer to the malicious minimum

w′

along dimension

, the magnitude of its gradient

g′

along dimension

needs to be at least

PN−1

i=1 ni

n′γ

, which decreases as

the ﬂatness increases (i.e., smaller

). Intuitively, the

more ﬂat the loss landscapes of benign clients are, the

easier it is for the malicious client to ”overwrite” the

aggregation result (See the horizontal axis of Figure 1b

for an illustration).

Further, backdoor adversaries do not necessarily need

to “overwrite” the aggregation result along all dimen-

sions. Instead, backdoor attacks may perturb only a

few gradient elements to minimize the overall diﬀerence

between malicious and benign gradients without losing

eﬀectiveness (See the red dashed gradient in Figure 1c

for an illustration).

Less penalty for mimicking benign clients. Since

the ﬂat loss landscape is a general property of well-

designed neural networks, the loss landscape of a mali-

cious client in the unperturbed subspace can also be

ﬂat. Then, the malicious clients may partially mimic

the behavior of benign clients to circumvent detection

without signiﬁcantly decreasing the attack success rate.

Speciﬁcally, if the loss landscape of a malicious client

within the unperturbed subspace is

γ′

-ﬂat and the gra-

dient magnitude of a benign client is upper bounded

, then it is easy to see that mimicking the benign

client only decreases the eﬀectiveness of backdoor at-

tacks measured by the loss on backdoor samples by up

γγ′

, which decreases as the ﬂatness increases (i.e.,

smaller γ′), via the Lagrange mean value theorem,

So far, we have seen how ﬂat landscapes can reduce

the gradient perturbation requirement of backdoor at-

tacks and help attacks remain eﬀective while malicious

clients mimic benign clients. Even worse, an adversary

may intentionally work to ﬂatten the loss landscape,

e.g., through edge-case backdoor attacks (Wang et al.,

2020) to further increase the attack’s eﬀectiveness and

circumvent defenses.

Edge-case attack ﬂattens the loss landscape.

The main idea of the edge-case backdoor attack is

minimizing the marginal probability of backdoor sam-

ples in the benign data distribution (Wang et al., 2020).

If a backdoor sample appears on both benign and ma-

licious clients, it gets diﬀerent label assignments on

diﬀerent types of clients. Thus, for such backdoor sam-

ples, the loss on benign clients would increase because

at least one data sample in their datasets would be

mispredicted. The more the loss increases, the less

ﬂat the loss landscape is, and vice versa. The edge-

case backdoor attack intentionally prevents backdoor

samples from appearing on benign clients and avoids

the prediction error being observed to ﬂatten loss land-

scapes of benign clients, as is empirically veriﬁed in

Appendix D.

4.2 Limitation of Existing Defenses over a

Flat Loss Landscape

Under the ﬂat loss landscape setting, we discuss how

existing defenses can fail to recover the correct gradient

direction, including vector-wise, dimension-wise, and

trigger inversion defenses. The following case study

shows the failure mode of FLTrust, a vector-wise de-

fense for federated learning systems.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

InvariantAggregatorforDefendingagainstFederatedBackdoorAttacksXiaoyangWang∗DimitriosDimitriadis†UniversityofIllinoisUrbana-Champaignxw28@illinois.eduddimitriadis@gmail.comSanmiKoyejo†ShrutiTople†StanfordUniversityAzureResearchsanmi@stanford.eduShruti.Tople@microsoft.comAbstractFederatedlearningenabl...

展开>> 收起<<

Invariant Aggregator for Defending against Federated Backdoor Attacks Xiaoyang WangDimitrios Dimitriadis.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Invariant Aggregator for Defending against Federated Backdoor Attacks Xiaoyang WangDimitrios Dimitriadis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: