FedRecover Recovering from Poisoning Attacks in Federated Learning using Historical Information Xiaoyu CaoJinyuan JiaZaixi ZhangNeil Zhenqiang Gong

2025-05-06 0 0 1.5MB 18 页 10玖币

侵权投诉

FedRecover: Recovering from Poisoning Attacks in

Federated Learning using Historical Information

Xiaoyu Cao∗Jinyuan Jia∗Zaixi Zhang+Neil Zhenqiang Gong∗

∗Duke University +University of Science and Technology of China

{xiaoyu.cao, jinyuan.jia, neil.gong}@duke.edu zaixi@mail.ustc.edu.cn

Abstract—Federated learning is vulnerable to poisoning at-

tacks in which malicious clients poison the global model via

sending malicious model updates to the server. Existing defenses

focus on preventing a small number of malicious clients from

poisoning the global model via robust federated learning methods

and detecting malicious clients when there are a large number

of them. However, it is still an open challenge how to recover

the global model from poisoning attacks after the malicious

clients are detected. A naive solution is to remove the detected

malicious clients and train a new global model from scratch using

the remaining clients. However, such train-from-scratch recovery

method incurs a large computation and communication cost,

which may be intolerable for resource-constrained clients such

as smartphones and IoT devices.

In this work, we propose FedRecover, a method that can

recover an accurate global model from poisoning attacks with a

small computation and communication cost for the clients. Our

key idea is that the server estimates the clients’ model updates

instead of asking the clients to compute and communicate them

during the recovery process. In particular, the server stores the

historical information, including the global models and clients’

model updates in each round, when training the poisoned global

model before the malicious clients are detected. During the

recovery process, the server estimates a client’s model update

in each round using its stored historical information. Moreover,

we further optimize FedRecover to recover a more accurate

global model using warm-up,periodic correction,abnormality

ﬁxing, and ﬁnal tuning strategies, in which the server asks

the clients to compute and communicate their exact model

updates. Theoretically, we show that the global model recovered

by FedRecover is close to or the same as that recovered by

train-from-scratch under some assumptions. Empirically, our

evaluation on four datasets, three federated learning methods, as

well as untargeted and targeted poisoning attacks (e.g., backdoor

attacks) shows that FedRecover is both accurate and efﬁcient.

I. INTRODUCTION

Federated learning (FL) [20], [24] is an emerging machine

learning paradigm that enables many clients (e.g., smart-

phones, IoT devices, and edge devices) to collaboratively

learn a shared machine learning model (called global model).

Speciﬁcally, training data are decentralized over the clients

in FL, and a server maintains the global model. Roughly

speaking, FL performs the following three steps in each round:

the server broadcasts the current global model to (a subset of)

the clients; each client ﬁne-tunes the global model using its

local training data and reports its model update to the server;

and the server aggregates the clients’ model updates following

some aggregation rule and uses the aggregated model update

to update the global model. Different FL methods essentially

use different aggregation rules. FL has been deployed by tech

giants. For instance, Google uses FL on a virtual keyboard

app called Gboard [2] for next-word prediction; and WeBank

leverages FL for credit risk prediction [3].

However, due to its distributed setting, FL is vulnerable to

poisoning attacks [5], [18], [7], [12]. Speciﬁcally, an attacker

may have access to some malicious clients, which could be

fake clients injected into the system by the attacker [12] or

genuine clients compromised by the attacker [5], [18], [7].

The malicious clients poison the global model via sending

carefully crafted malicious model updates to the server. A

malicious client can craft its malicious model update by

poisoning its local training data and/or directly construct-

ing it without following the prescribed FL protocol. In an

untargeted poisoning attack [18], [12], the poisoned global

model indiscriminately misclassiﬁes many test inputs, i.e., the

poisoned global model has a large test error rate. In a targeted

poisoning attack [5], [7], the poisoned global model predicts

an attacker-chosen target label for attacker-chosen target test

inputs but its predictions for other test inputs are unaffected.

For instance, in backdoor attacks (one category of targeted

poisoning attacks) [5], the target test inputs could be any input

embedded with an attacker-chosen trigger.

Existing defenses against poisoning attacks to FL prevent a

small number of malicious clients from poisoning the global

model and/or detect malicious clients. Speciﬁcally, some stud-

ies proposed Byzantine-robust [11], [25], [36], [8], [16], [26]

or provably robust [13] FL methods that can prevent a small

number of malicious clients from poisoning the global model,

i.e., they can guarantee the global model learnt with malicious

clients is close to the global model learnt without them [11],

[25], [36] or guarantee a lower bound of testing accuracy under

a bounded number of malicious clients [13]. However, these

FL methods are still vulnerable to poisoning attacks with a

large number of malicious clients [18], [29]. Therefore, some

studies [23], [30], [39] further proposed to detect malicious

clients during or after the training process, which can be used

together with the prevention methods in a defense-in-depth

strategy. For instance, the server may distinguish between

the malicious clients and benign ones via some statistical

differences in their model updates sent to the server. Since

such detection methods require enough model updates to make

conﬁdent decisions, the malicious clients often have already

poisoned the global model before being detected. Therefore,

the server needs to recover an accurate global model from the

poisoned one after detecting the malicious clients.

arXiv:2210.10936v1 [cs.CR] 20 Oct 2022

However, efﬁcient model recovery in FL is largely un-

explored. Since the server does not know in which round

the attack happens, the server may not be able to simply

roll back to a clean global model in a prior round. A naive

recovery method (we call it train-from-scratch) is to remove

the detected malicious clients and train a new global model

from scratch using the remaining clients. Train-from-scratch

could recover an accurate global model. However, it introduces

substantial computation and communication cost to the clients

since it requires them to participate in the entire training

process once again. Such computation and communication cost

may be intolerable for resource-constrained clients such as

smartphones and IoT devices.

Our work: In this work, we propose FedRecover, a method

that can recover an accurate global model from a poisoned one

while introducing small computation and communication cost

for the clients. Like train-from-scratch, FedRecover removes

the detected malicious clients, re-initializes a global model,

and trains it iteratively in multiple rounds. However, unlike

train-from-scratch, FedRecover reduces the cost for the clients

by changing the way of obtaining their model updates. Our

intuition is that the historical information, including the global

models and clients’ model updates, which the server collected

when training the poisoned global model before the malicious

clients are detected, still carry valuable information for model

recovery. Based on the intuition, our key idea is that, during the

recovery process, the server estimates the remaining clients’

model updates using such historical information instead of ask-

ing the clients to compute and communicate them. FedRecover

is independent of the detection methods used to detect the

malicious clients and the aggregation rules of FL. In other

words, FedRecover can be used together with any detection

method and FL aggregation rule in a defense-in-depth strategy.

The key of FedRecover is that the server estimates the

clients’ model updates itself during the recovery process.

Speciﬁcally, the server stores the historical information when

training the poisoned global model before the malicious clients

are detected. During the recovery process, the server uses

the well-known Cauchy mean value theorem to estimate each

client’s model update in each round. However, the Cauchy

mean value theorem requires an integrated Hessian matrix for

each client, whose exact value is challenging to compute. To

address the challenge, we further leverage an L-BFGS based

algorithm to efﬁciently approximate the integrated Hessian

matrix. FedRecover introduces some storage and computation

cost to the server due to storing the historical information and

estimating the clients’ model updates. However, such cost is

acceptable since the server is powerful.

Since FedRecover estimates the clients’ model updates, the

estimation errors may accumulate over multiple rounds during

the recovery process, which eventually may result in a less ac-

curate recovered global model. We propose multiple strategies

to address the challenge. Speciﬁcally, the L-BFGS algorithm

requires the recovered global models in the previous several

rounds to estimate a client’s model update in the current round.

The accurately recovered global models in the ﬁrst several

rounds of the recovery process will help reduce the estimation

errors in the future rounds. Therefore, we propose the warm-

up strategy, in which the server asks the clients to compute and

communicate their exact model updates in the ﬁrst Twrounds

of the recovery process. Moreover, we propose the periodic

correction strategy, in which the server asks the clients to

compute and communicate their exact model updates in every

Tcrounds. When an estimated model update for a client is

large, it has large inﬂuence on the recovered global model. To

reduce the impact of potentially incorrectly estimated large

model updates, we propose the abnormality ﬁxing strategy,

in which the server asks a client to compute its exact model

update when at least one coordinate of the estimated model

update is larger than a threshold τ. Furthermore, we propose

ﬁnal tuning strategy to reduce the estimation error before the

training terminates, in which the server asks the clients to

compute and communicate their exact model updates in the

last Tfrounds. The parameters Tw,Tc,τ, and Tfcontrol the

trade-off between accuracy of the recovered global model and

computation/communication cost for the clients. In particular,

a larger Tw, a smaller Tc, a smaller τ, or a larger Tfmay

recover a more accurate global model but also introduces a

larger cost to the clients.

Theoretically, we show that the difference between the

global model recovered by FedRecover and the global model

recovered by train-from-scratch can be bounded under some

assumptions, e.g., the loss function used to learn the global

model is smooth and strongly convex. Empirically, we evaluate

FedRecover extensively using four datasets, three FL methods

(e.g., FedAvg [24], Median [36], and Trimmed-mean [36]),

as well as Trim attack (an untargeted poisoning attack) [18]

and backdoor attack (a targeted poisoning attack) [5]. Our

empirical results show that FedRecover can recover global

models that are as accurate as those recovered by train-from-

scratch while saving lots of computation/communication cost

for the clients. For instance, the backdoor attack with 40

malicious clients can achieve 1.00 attack success rate when the

dataset is MNIST and the FL method is Trimmed-mean. Both

FedRecover and train-from-scratch can recover global models

with 0.07 test error rate and 0.01 attack success rate, but Fe-

dRecover saves the clients’ computation/communication cost

by 88% on average compared to train-from-scratch. Moreover,

FedRecover can efﬁciently recover as accurate global models

as train-from-scratch even if the detection method incorrectly

detects some malicious clients as benign and/or some benign

clients as malicious.

In summary, our key contributions are as follows:

•We perform the ﬁrst systematic study on model recovery

from poisoning attacks in FL.

•We propose FedRecover to recover a global model via

estimating clients’ model updates through historical in-

formation and multiple optimization strategies.

•We evaluate FedRecover both theoretically and empiri-

cally. Our results show that FedRecover can recover a

global model both accurately and efﬁciently.

II. BACKGROUND AND RELATED WORK

A. Background on FL

Suppose the FL system has nclients, each of which

has a local training dataset Di,i= 1,2,··· , n. We use

D=n

i=1 Dito denote the joint training dataset, which

is the union of the clients’ local training datasets. The n

clients aim to collaboratively train a shared machine learning

model (called global model) based on the joint training dataset.

To achieve the goal, the nclients jointly minimize a loss

function on their training datasets, i.e., minwL(D;w) =

minwn

i=1 L(Di;w), where wrepresents the global model

parameters and Lis the empirical loss function (e.g., cross-

entropy loss). For simplicity, we let Li(w) = L(Di;w)in

the rest of this work. A server provided by a service provider

(e.g., Google, Facebook, Apple) maintains the global model.

The global model is iteratively updated in multiple rounds,

and in the tth round, FL takes the following three steps:

•Step I: The server broadcasts the current global model

wtto the clients. The server may also broadcast the

global model to a subset of the clients. Our method is

also applicable in this scenario. However, for simplicity,

we assume all clients are involved in each round.

•Step II: The ith client computes a model update gi

∂Li(wt)

∂wtbased on the received global model wtand the

client’s local training data Diusing gradient descent. The

client may also use stochastic gradient descent with a

mini-batch of its local training dataset if it is large. For

simplicity, we assume gradient descent in the description

of our method, but we adopt stochastic gradient descent

in our experiments. Then, the client reports the model

update gi

tto the server. Note that the clients calculate

their model updates in parallel.

•Step III: The server aggregates the clients’ model updates

according to an aggregation rule A. Then, the server

uses the aggregated model update to update the global

model with a learning rate η, i.e., wt+1 =wt−η·

A(g1

t,g2

t,··· ,gn

t).

In train-from-scratch, the server initializes a global model,

and then the server and the remaining clients follow the above

three steps in each round to iteratively update it. Different FL

methods essentially use different aggregation rules [8], [11],

[16], [24], [25], [36] in Step III. Next, we review several

popular aggregation rules.

FedAvg: FedAvg [24], developed by Google Inc., computes

the weighted average of the clients’ model updates as the

aggregated model update. Formally, given the model updates

t, g2

t,··· , gn

tin the tth round, the aggregated model update

is as follows:

A(g1

t,g2

t,··· ,gn

t) =



i=1

|Di|

|D|·gi

t,(1)

where |·|represents the size of a dataset.

Median: Median [36] is a coordinate-wise aggregation rule

that aggregates each coordinate of the model update separately.

In particular, for each coordinate, Median calculates the me-

dian value of the corresponding coordinates in the nmodel

updates and treats it as the corresponding coordinate of the

aggregated model update.

Trimmed-mean: Trimmed-mean [36] is also a coordinate-

wise aggregation rule. For each coordinate, Trimmed-mean

sorts the values of the corresponding coordinates in the n

model updates. Then, it removes the largest and the smallest

kvalues. Finally, it computes the average of the remaining

values as the corresponding coordinate of the aggregated

model update. k < n

2is a hyper-parameter for Trimmed-mean.

B. Poisoning Attacks to FL

Federated learning is vulnerable to poisoning attacks [5],

[6], [7], [18], [29], [12], in which malicious clients poison

the global model via sending malicious model updates to the

server in Step II of FL. The malicious clients can construct

their malicious model updates via poisoning their local training

data and/or directly manipulating the model updates without

following the prescribed FL protocol in Step II [5], [6], [7],

[18], [29], [12]. Based on the attacker’s goal, poisoning attacks

can be categorized into untargeted poisoning attacks [18], [29],

[12] and targeted poisoning attacks [5], [6], [7]. In untargeted

poisoning attacks, the poisoned global model has a large test

error rate for a large proportion of test inputs indiscriminately.

In targeted poisoning attacks, the poisoned global model

predicts an attacker-chosen target label for attacker-chosen

target test inputs; and to stay stealthy, the poisoned global

model’s test error rate for other test inputs is unaffected.

For instance, backdoor attacks [5], [6] are popular targeted

poisoning attacks, in which the attacker-chosen test inputs

are any inputs embedded with a trigger. Next, we review

Trim attack (a popular untargeted poisoning attack) [18] and

backdoor attack (a popular targeted poisoning attack) [5].

Trim attack: Fang et al. [18] formulated untargeted poisoning

attacks to FL as a general framework. Roughly speaking, the

framework aims to craft malicious model updates that max-

imize the difference between the aggregated model updates

before and after attack. The framework can be applied to dif-

ferent aggregation rules. The Trim attack is constructed based

on the Trimmed-mean aggregation rule under the framework,

and is also effective for other aggregation rules such as FedAvg

and Median.

Backdoor attack: In the backdoor attack [5], the attacker

poisons the malicious clients’ local training data via augment-

ing them with trigger-embedded duplicates. Speciﬁcally, for

each input in a malicious client’s local training dataset, the

attacker makes a copy of it and embeds a trigger into the

copy. Then, the attacker injects the trigger-embedded copy

into the malicious client’s local training dataset and relabels

it as the target label. In every round of FL, each malicious

client computes a model update based on its poisoned local

training data. To amplify the impact of the model updates, the

malicious clients further scale them up by a large factor before

reporting them to the server. We notice that some methods

[15], [32] have been proposed to detect and remove backdoor

in neural networks. However, they are insufﬁcient for FL. For

instance, [32] assumes that a clean training dataset is available,

which usually does not hold for an FL server.

C. Detecting Malicious Clients

Malicious-client detection [23], [30], [39] aims to distin-

guish malicious clients from benign ones, which is essentially

a binary classiﬁcation problem. Roughly speaking, the key

idea is to leverage some statistical difference between the

features (e.g., model updates [23]) of malicious clients and

those of benign clients. Different detection methods use dif-

ferent features and binary classiﬁers to perform the detection.

Speciﬁcally, for each client, these detection methods ﬁrst

extract features from its model updates in one or multiple

rounds and then use a classiﬁer to predict whether it is

malicious or not. For instance, Zhang et al. [39] proposed to

detect malicious clients via checking a client’s model-update

consistency. In particular, the server predicts a client’s model

update based on its historical model updates in each round. If

the received model updates are inconsistent with the predicted

ones in multiple rounds, then the server ﬂags the client as

malicious. Zhang et al. also leveraged the Cauchy mean value

theorem and the L-BFGS algorithm to predict a client’s model

update, but they used the same approximate Hessian matrix for

all clients, which we experimentally found to be ineffective for

model recovery, e.g., accuracy of the recovered model may be

nearly random guessing.

Detecting malicious clients is also related to Sybil detection

in distributed systems [17]. Therefore, conventional Sybil de-

tection methods could also be used to detect malicious clients,

where malicious clients are treated as Sybil. In particular,

these Sybil detection methods (e.g., [33], [37], [38], [19], [31])

leverage the clients’ IPs, network behaviors, and social graphs

if available.

D. Machine Unlearning

Machine unlearning aims to make a machine learning model

“forget” some training examples. For instance, a user may

desire a model to forget its data for privacy concerns. Multiple

methods [9], [14], [34] have been proposed for efﬁcient

machine unlearning. For instance, Cao et al. [14] proposed

to transform the learning algorithm used to train a machine

learning model into a summation form. Therefore, only a

small number of summations need to be updated to unlearn a

training example. Bourtoule et al. [9] broke the model training

into an aggregation of multiple constituent models and each

training example only contributes to one constituent model.

Therefore, only one constituent model needs to be retrained

when unlearning a training example. Wu et al. [34] proposed

DeltaGrad that estimates the gradient of the loss function on

the remaining training examples using the gradient on the

training examples to be unlearnt.

Model recovery from poisoning attacks in FL is related

to machine unlearning. In particular, model recovery can

be viewed as unlearning the detected malicious clients, i.e.,

making the global model forget the model updates from the

detected malicious clients. However, existing machine unlearn-

ing methods are insufﬁcient for FL because 1) they require

changing the FL algorithm to train multiple constituent models

and are inefﬁcient when multiple constituent models involve

detected malicious clients and thus require retraining [9],

and/or 2) they require access to the clients’ private local

training data [14], [34].

III. PROBLEM DEFINITION

A. Threat Model

We follow the threat model considered in previous studies

on poisoning attacks to FL [5], [7], [18], [12]. Speciﬁcally,

we discuss in detail the attacker’s goals, capabilities, and

background knowledge.

Attacker’s goals: In an untargeted poisoning attack, the

attacker’s goal is to increase the test error rate of the global

model indiscriminately for a large number of test inputs. In a

targeted poisoning attack, the attacker’s goal is to poison the

global model such that it predicts an attacker-chosen target

label for attacker-chosen target test inputs but the predictions

for other test inputs are unaffected. For instance, in a category

of targeted poisoning attacks also known as backdoor attacks,

the target test inputs include any input embedded with an

attacker-chosen trigger, e.g., a feature pattern.

Attacker’s capabilities: We assume the attacker controls

some malicious clients but does not compromise the server.

The malicious clients could be fake clients injected into the

FL system by the attacker or genuine clients in the FL system

compromised by the attacker. The malicious clients can send

arbitrary model updates to the server.

Attacker’s background knowledge: There are two common

settings for the attacker’s background knowledge about the FL

system [18], i.e., partial-knowledge setting and full-knowledge

setting. The partial-knowledge setting assumes the attacker

knows the global model, the loss function, as well as local

training data and model updates on the malicious clients. The

full-knowledge setting further assumes the attacker knows the

local training data and model updates on all clients as well

as the server’s aggregation rule. The poisoning attacks are

often stronger in the full-knowledge setting than in the partial-

knowledge setting. In this work, we consider strong poisoning

attacks in the full-knowledge setting.

B. Design Goals

We aim to design an accurate and efﬁcient model recovery

method for FL. We use train-from-scratch as a baseline to

measure the accuracy and efﬁciency of a recovery method. Our

method should recover a global model as accurate as the one

recovered by train-from-scratch, while incurring less client-

side computation and communication cost. Speciﬁcally, our

design goals are as follows:

Accurate: The global model recovered by our recovery

method should be accurate. In particular, for untargeted poi-

soning attacks, the test error rate of the recovered global model

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

FedRecover:RecoveringfromPoisoningAttacksinFederatedLearningusingHistoricalInformationXiaoyuCao∗JinyuanJia∗ZaixiZhang+NeilZhenqiangGong∗∗DukeUniversity+UniversityofScienceandTechnologyofChina{xiaoyu.cao,jinyuan.jia,neil.gong}@duke.eduzaixi@mail.ustc.edu.cnAbstract—Federatedlearningisvulnerabletopois...

展开>> 收起<<

FedRecover Recovering from Poisoning Attacks in Federated Learning using Historical Information Xiaoyu CaoJinyuan JiaZaixi ZhangNeil Zhenqiang Gong.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

FedRecover Recovering from Poisoning Attacks in Federated Learning using Historical Information Xiaoyu CaoJinyuan JiaZaixi ZhangNeil Zhenqiang Gong

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: