FedRecover Recovering from Poisoning Attacks in Federated Learning using Historical Information Xiaoyu CaoJinyuan JiaZaixi ZhangNeil Zhenqiang Gong

2025-05-06 0 0 1.5MB 18 页 10玖币
侵权投诉
FedRecover: Recovering from Poisoning Attacks in
Federated Learning using Historical Information
Xiaoyu CaoJinyuan JiaZaixi Zhang+Neil Zhenqiang Gong
Duke University +University of Science and Technology of China
{xiaoyu.cao, jinyuan.jia, neil.gong}@duke.edu zaixi@mail.ustc.edu.cn
Abstract—Federated learning is vulnerable to poisoning at-
tacks in which malicious clients poison the global model via
sending malicious model updates to the server. Existing defenses
focus on preventing a small number of malicious clients from
poisoning the global model via robust federated learning methods
and detecting malicious clients when there are a large number
of them. However, it is still an open challenge how to recover
the global model from poisoning attacks after the malicious
clients are detected. A naive solution is to remove the detected
malicious clients and train a new global model from scratch using
the remaining clients. However, such train-from-scratch recovery
method incurs a large computation and communication cost,
which may be intolerable for resource-constrained clients such
as smartphones and IoT devices.
In this work, we propose FedRecover, a method that can
recover an accurate global model from poisoning attacks with a
small computation and communication cost for the clients. Our
key idea is that the server estimates the clients’ model updates
instead of asking the clients to compute and communicate them
during the recovery process. In particular, the server stores the
historical information, including the global models and clients’
model updates in each round, when training the poisoned global
model before the malicious clients are detected. During the
recovery process, the server estimates a client’s model update
in each round using its stored historical information. Moreover,
we further optimize FedRecover to recover a more accurate
global model using warm-up,periodic correction,abnormality
fixing, and final tuning strategies, in which the server asks
the clients to compute and communicate their exact model
updates. Theoretically, we show that the global model recovered
by FedRecover is close to or the same as that recovered by
train-from-scratch under some assumptions. Empirically, our
evaluation on four datasets, three federated learning methods, as
well as untargeted and targeted poisoning attacks (e.g., backdoor
attacks) shows that FedRecover is both accurate and efficient.
I. INTRODUCTION
Federated learning (FL) [20], [24] is an emerging machine
learning paradigm that enables many clients (e.g., smart-
phones, IoT devices, and edge devices) to collaboratively
learn a shared machine learning model (called global model).
Specifically, training data are decentralized over the clients
in FL, and a server maintains the global model. Roughly
speaking, FL performs the following three steps in each round:
the server broadcasts the current global model to (a subset of)
the clients; each client fine-tunes the global model using its
local training data and reports its model update to the server;
and the server aggregates the clients’ model updates following
some aggregation rule and uses the aggregated model update
to update the global model. Different FL methods essentially
use different aggregation rules. FL has been deployed by tech
giants. For instance, Google uses FL on a virtual keyboard
app called Gboard [2] for next-word prediction; and WeBank
leverages FL for credit risk prediction [3].
However, due to its distributed setting, FL is vulnerable to
poisoning attacks [5], [18], [7], [12]. Specifically, an attacker
may have access to some malicious clients, which could be
fake clients injected into the system by the attacker [12] or
genuine clients compromised by the attacker [5], [18], [7].
The malicious clients poison the global model via sending
carefully crafted malicious model updates to the server. A
malicious client can craft its malicious model update by
poisoning its local training data and/or directly construct-
ing it without following the prescribed FL protocol. In an
untargeted poisoning attack [18], [12], the poisoned global
model indiscriminately misclassifies many test inputs, i.e., the
poisoned global model has a large test error rate. In a targeted
poisoning attack [5], [7], the poisoned global model predicts
an attacker-chosen target label for attacker-chosen target test
inputs but its predictions for other test inputs are unaffected.
For instance, in backdoor attacks (one category of targeted
poisoning attacks) [5], the target test inputs could be any input
embedded with an attacker-chosen trigger.
Existing defenses against poisoning attacks to FL prevent a
small number of malicious clients from poisoning the global
model and/or detect malicious clients. Specifically, some stud-
ies proposed Byzantine-robust [11], [25], [36], [8], [16], [26]
or provably robust [13] FL methods that can prevent a small
number of malicious clients from poisoning the global model,
i.e., they can guarantee the global model learnt with malicious
clients is close to the global model learnt without them [11],
[25], [36] or guarantee a lower bound of testing accuracy under
a bounded number of malicious clients [13]. However, these
FL methods are still vulnerable to poisoning attacks with a
large number of malicious clients [18], [29]. Therefore, some
studies [23], [30], [39] further proposed to detect malicious
clients during or after the training process, which can be used
together with the prevention methods in a defense-in-depth
strategy. For instance, the server may distinguish between
the malicious clients and benign ones via some statistical
differences in their model updates sent to the server. Since
such detection methods require enough model updates to make
confident decisions, the malicious clients often have already
poisoned the global model before being detected. Therefore,
the server needs to recover an accurate global model from the
poisoned one after detecting the malicious clients.
arXiv:2210.10936v1 [cs.CR] 20 Oct 2022
However, efficient model recovery in FL is largely un-
explored. Since the server does not know in which round
the attack happens, the server may not be able to simply
roll back to a clean global model in a prior round. A naive
recovery method (we call it train-from-scratch) is to remove
the detected malicious clients and train a new global model
from scratch using the remaining clients. Train-from-scratch
could recover an accurate global model. However, it introduces
substantial computation and communication cost to the clients
since it requires them to participate in the entire training
process once again. Such computation and communication cost
may be intolerable for resource-constrained clients such as
smartphones and IoT devices.
Our work: In this work, we propose FedRecover, a method
that can recover an accurate global model from a poisoned one
while introducing small computation and communication cost
for the clients. Like train-from-scratch, FedRecover removes
the detected malicious clients, re-initializes a global model,
and trains it iteratively in multiple rounds. However, unlike
train-from-scratch, FedRecover reduces the cost for the clients
by changing the way of obtaining their model updates. Our
intuition is that the historical information, including the global
models and clients’ model updates, which the server collected
when training the poisoned global model before the malicious
clients are detected, still carry valuable information for model
recovery. Based on the intuition, our key idea is that, during the
recovery process, the server estimates the remaining clients’
model updates using such historical information instead of ask-
ing the clients to compute and communicate them. FedRecover
is independent of the detection methods used to detect the
malicious clients and the aggregation rules of FL. In other
words, FedRecover can be used together with any detection
method and FL aggregation rule in a defense-in-depth strategy.
The key of FedRecover is that the server estimates the
clients’ model updates itself during the recovery process.
Specifically, the server stores the historical information when
training the poisoned global model before the malicious clients
are detected. During the recovery process, the server uses
the well-known Cauchy mean value theorem to estimate each
client’s model update in each round. However, the Cauchy
mean value theorem requires an integrated Hessian matrix for
each client, whose exact value is challenging to compute. To
address the challenge, we further leverage an L-BFGS based
algorithm to efficiently approximate the integrated Hessian
matrix. FedRecover introduces some storage and computation
cost to the server due to storing the historical information and
estimating the clients’ model updates. However, such cost is
acceptable since the server is powerful.
Since FedRecover estimates the clients’ model updates, the
estimation errors may accumulate over multiple rounds during
the recovery process, which eventually may result in a less ac-
curate recovered global model. We propose multiple strategies
to address the challenge. Specifically, the L-BFGS algorithm
requires the recovered global models in the previous several
rounds to estimate a client’s model update in the current round.
The accurately recovered global models in the first several
rounds of the recovery process will help reduce the estimation
errors in the future rounds. Therefore, we propose the warm-
up strategy, in which the server asks the clients to compute and
communicate their exact model updates in the first Twrounds
of the recovery process. Moreover, we propose the periodic
correction strategy, in which the server asks the clients to
compute and communicate their exact model updates in every
Tcrounds. When an estimated model update for a client is
large, it has large influence on the recovered global model. To
reduce the impact of potentially incorrectly estimated large
model updates, we propose the abnormality fixing strategy,
in which the server asks a client to compute its exact model
update when at least one coordinate of the estimated model
update is larger than a threshold τ. Furthermore, we propose
final tuning strategy to reduce the estimation error before the
training terminates, in which the server asks the clients to
compute and communicate their exact model updates in the
last Tfrounds. The parameters Tw,Tc,τ, and Tfcontrol the
trade-off between accuracy of the recovered global model and
computation/communication cost for the clients. In particular,
a larger Tw, a smaller Tc, a smaller τ, or a larger Tfmay
recover a more accurate global model but also introduces a
larger cost to the clients.
Theoretically, we show that the difference between the
global model recovered by FedRecover and the global model
recovered by train-from-scratch can be bounded under some
assumptions, e.g., the loss function used to learn the global
model is smooth and strongly convex. Empirically, we evaluate
FedRecover extensively using four datasets, three FL methods
(e.g., FedAvg [24], Median [36], and Trimmed-mean [36]),
as well as Trim attack (an untargeted poisoning attack) [18]
and backdoor attack (a targeted poisoning attack) [5]. Our
empirical results show that FedRecover can recover global
models that are as accurate as those recovered by train-from-
scratch while saving lots of computation/communication cost
for the clients. For instance, the backdoor attack with 40
malicious clients can achieve 1.00 attack success rate when the
dataset is MNIST and the FL method is Trimmed-mean. Both
FedRecover and train-from-scratch can recover global models
with 0.07 test error rate and 0.01 attack success rate, but Fe-
dRecover saves the clients’ computation/communication cost
by 88% on average compared to train-from-scratch. Moreover,
FedRecover can efficiently recover as accurate global models
as train-from-scratch even if the detection method incorrectly
detects some malicious clients as benign and/or some benign
clients as malicious.
In summary, our key contributions are as follows:
We perform the first systematic study on model recovery
from poisoning attacks in FL.
We propose FedRecover to recover a global model via
estimating clients’ model updates through historical in-
formation and multiple optimization strategies.
We evaluate FedRecover both theoretically and empiri-
cally. Our results show that FedRecover can recover a
global model both accurately and efficiently.
II. BACKGROUND AND RELATED WORK
A. Background on FL
Suppose the FL system has nclients, each of which
has a local training dataset Di,i= 1,2,··· , n. We use
D=n
i=1 Dito denote the joint training dataset, which
is the union of the clients’ local training datasets. The n
clients aim to collaboratively train a shared machine learning
model (called global model) based on the joint training dataset.
To achieve the goal, the nclients jointly minimize a loss
function on their training datasets, i.e., minwL(D;w) =
minwn
i=1 L(Di;w), where wrepresents the global model
parameters and Lis the empirical loss function (e.g., cross-
entropy loss). For simplicity, we let Li(w) = L(Di;w)in
the rest of this work. A server provided by a service provider
(e.g., Google, Facebook, Apple) maintains the global model.
The global model is iteratively updated in multiple rounds,
and in the tth round, FL takes the following three steps:
Step I: The server broadcasts the current global model
wtto the clients. The server may also broadcast the
global model to a subset of the clients. Our method is
also applicable in this scenario. However, for simplicity,
we assume all clients are involved in each round.
Step II: The ith client computes a model update gi
t=
Li(wt)
wtbased on the received global model wtand the
client’s local training data Diusing gradient descent. The
client may also use stochastic gradient descent with a
mini-batch of its local training dataset if it is large. For
simplicity, we assume gradient descent in the description
of our method, but we adopt stochastic gradient descent
in our experiments. Then, the client reports the model
update gi
tto the server. Note that the clients calculate
their model updates in parallel.
Step III: The server aggregates the clients’ model updates
according to an aggregation rule A. Then, the server
uses the aggregated model update to update the global
model with a learning rate η, i.e., wt+1 =wtη·
A(g1
t,g2
t,··· ,gn
t).
In train-from-scratch, the server initializes a global model,
and then the server and the remaining clients follow the above
three steps in each round to iteratively update it. Different FL
methods essentially use different aggregation rules [8], [11],
[16], [24], [25], [36] in Step III. Next, we review several
popular aggregation rules.
FedAvg: FedAvg [24], developed by Google Inc., computes
the weighted average of the clients’ model updates as the
aggregated model update. Formally, given the model updates
g1
t, g2
t,··· , gn
tin the tth round, the aggregated model update
is as follows:
A(g1
t,g2
t,··· ,gn
t) =
n
i=1
|Di|
|D|·gi
t,(1)
where |·|represents the size of a dataset.
Median: Median [36] is a coordinate-wise aggregation rule
that aggregates each coordinate of the model update separately.
In particular, for each coordinate, Median calculates the me-
dian value of the corresponding coordinates in the nmodel
updates and treats it as the corresponding coordinate of the
aggregated model update.
Trimmed-mean: Trimmed-mean [36] is also a coordinate-
wise aggregation rule. For each coordinate, Trimmed-mean
sorts the values of the corresponding coordinates in the n
model updates. Then, it removes the largest and the smallest
kvalues. Finally, it computes the average of the remaining
values as the corresponding coordinate of the aggregated
model update. k < n
2is a hyper-parameter for Trimmed-mean.
B. Poisoning Attacks to FL
Federated learning is vulnerable to poisoning attacks [5],
[6], [7], [18], [29], [12], in which malicious clients poison
the global model via sending malicious model updates to the
server in Step II of FL. The malicious clients can construct
their malicious model updates via poisoning their local training
data and/or directly manipulating the model updates without
following the prescribed FL protocol in Step II [5], [6], [7],
[18], [29], [12]. Based on the attacker’s goal, poisoning attacks
can be categorized into untargeted poisoning attacks [18], [29],
[12] and targeted poisoning attacks [5], [6], [7]. In untargeted
poisoning attacks, the poisoned global model has a large test
error rate for a large proportion of test inputs indiscriminately.
In targeted poisoning attacks, the poisoned global model
predicts an attacker-chosen target label for attacker-chosen
target test inputs; and to stay stealthy, the poisoned global
model’s test error rate for other test inputs is unaffected.
For instance, backdoor attacks [5], [6] are popular targeted
poisoning attacks, in which the attacker-chosen test inputs
are any inputs embedded with a trigger. Next, we review
Trim attack (a popular untargeted poisoning attack) [18] and
backdoor attack (a popular targeted poisoning attack) [5].
Trim attack: Fang et al. [18] formulated untargeted poisoning
attacks to FL as a general framework. Roughly speaking, the
framework aims to craft malicious model updates that max-
imize the difference between the aggregated model updates
before and after attack. The framework can be applied to dif-
ferent aggregation rules. The Trim attack is constructed based
on the Trimmed-mean aggregation rule under the framework,
and is also effective for other aggregation rules such as FedAvg
and Median.
Backdoor attack: In the backdoor attack [5], the attacker
poisons the malicious clients’ local training data via augment-
ing them with trigger-embedded duplicates. Specifically, for
each input in a malicious client’s local training dataset, the
attacker makes a copy of it and embeds a trigger into the
copy. Then, the attacker injects the trigger-embedded copy
into the malicious client’s local training dataset and relabels
it as the target label. In every round of FL, each malicious
client computes a model update based on its poisoned local
training data. To amplify the impact of the model updates, the
malicious clients further scale them up by a large factor before
reporting them to the server. We notice that some methods
[15], [32] have been proposed to detect and remove backdoor
in neural networks. However, they are insufficient for FL. For
instance, [32] assumes that a clean training dataset is available,
which usually does not hold for an FL server.
C. Detecting Malicious Clients
Malicious-client detection [23], [30], [39] aims to distin-
guish malicious clients from benign ones, which is essentially
a binary classification problem. Roughly speaking, the key
idea is to leverage some statistical difference between the
features (e.g., model updates [23]) of malicious clients and
those of benign clients. Different detection methods use dif-
ferent features and binary classifiers to perform the detection.
Specifically, for each client, these detection methods first
extract features from its model updates in one or multiple
rounds and then use a classifier to predict whether it is
malicious or not. For instance, Zhang et al. [39] proposed to
detect malicious clients via checking a client’s model-update
consistency. In particular, the server predicts a client’s model
update based on its historical model updates in each round. If
the received model updates are inconsistent with the predicted
ones in multiple rounds, then the server flags the client as
malicious. Zhang et al. also leveraged the Cauchy mean value
theorem and the L-BFGS algorithm to predict a client’s model
update, but they used the same approximate Hessian matrix for
all clients, which we experimentally found to be ineffective for
model recovery, e.g., accuracy of the recovered model may be
nearly random guessing.
Detecting malicious clients is also related to Sybil detection
in distributed systems [17]. Therefore, conventional Sybil de-
tection methods could also be used to detect malicious clients,
where malicious clients are treated as Sybil. In particular,
these Sybil detection methods (e.g., [33], [37], [38], [19], [31])
leverage the clients’ IPs, network behaviors, and social graphs
if available.
D. Machine Unlearning
Machine unlearning aims to make a machine learning model
“forget” some training examples. For instance, a user may
desire a model to forget its data for privacy concerns. Multiple
methods [9], [14], [34] have been proposed for efficient
machine unlearning. For instance, Cao et al. [14] proposed
to transform the learning algorithm used to train a machine
learning model into a summation form. Therefore, only a
small number of summations need to be updated to unlearn a
training example. Bourtoule et al. [9] broke the model training
into an aggregation of multiple constituent models and each
training example only contributes to one constituent model.
Therefore, only one constituent model needs to be retrained
when unlearning a training example. Wu et al. [34] proposed
DeltaGrad that estimates the gradient of the loss function on
the remaining training examples using the gradient on the
training examples to be unlearnt.
Model recovery from poisoning attacks in FL is related
to machine unlearning. In particular, model recovery can
be viewed as unlearning the detected malicious clients, i.e.,
making the global model forget the model updates from the
detected malicious clients. However, existing machine unlearn-
ing methods are insufficient for FL because 1) they require
changing the FL algorithm to train multiple constituent models
and are inefficient when multiple constituent models involve
detected malicious clients and thus require retraining [9],
and/or 2) they require access to the clients’ private local
training data [14], [34].
III. PROBLEM DEFINITION
A. Threat Model
We follow the threat model considered in previous studies
on poisoning attacks to FL [5], [7], [18], [12]. Specifically,
we discuss in detail the attacker’s goals, capabilities, and
background knowledge.
Attacker’s goals: In an untargeted poisoning attack, the
attacker’s goal is to increase the test error rate of the global
model indiscriminately for a large number of test inputs. In a
targeted poisoning attack, the attacker’s goal is to poison the
global model such that it predicts an attacker-chosen target
label for attacker-chosen target test inputs but the predictions
for other test inputs are unaffected. For instance, in a category
of targeted poisoning attacks also known as backdoor attacks,
the target test inputs include any input embedded with an
attacker-chosen trigger, e.g., a feature pattern.
Attacker’s capabilities: We assume the attacker controls
some malicious clients but does not compromise the server.
The malicious clients could be fake clients injected into the
FL system by the attacker or genuine clients in the FL system
compromised by the attacker. The malicious clients can send
arbitrary model updates to the server.
Attacker’s background knowledge: There are two common
settings for the attacker’s background knowledge about the FL
system [18], i.e., partial-knowledge setting and full-knowledge
setting. The partial-knowledge setting assumes the attacker
knows the global model, the loss function, as well as local
training data and model updates on the malicious clients. The
full-knowledge setting further assumes the attacker knows the
local training data and model updates on all clients as well
as the server’s aggregation rule. The poisoning attacks are
often stronger in the full-knowledge setting than in the partial-
knowledge setting. In this work, we consider strong poisoning
attacks in the full-knowledge setting.
B. Design Goals
We aim to design an accurate and efficient model recovery
method for FL. We use train-from-scratch as a baseline to
measure the accuracy and efficiency of a recovery method. Our
method should recover a global model as accurate as the one
recovered by train-from-scratch, while incurring less client-
side computation and communication cost. Specifically, our
design goals are as follows:
Accurate: The global model recovered by our recovery
method should be accurate. In particular, for untargeted poi-
soning attacks, the test error rate of the recovered global model
摘要:

FedRecover:RecoveringfromPoisoningAttacksinFederatedLearningusingHistoricalInformationXiaoyuCao∗JinyuanJia∗ZaixiZhang+NeilZhenqiangGong∗∗DukeUniversity+UniversityofScienceandTechnologyofChina{xiaoyu.cao,jinyuan.jia,neil.gong}@duke.eduzaixi@mail.ustc.edu.cnAbstract—Federatedlearningisvulnerabletopois...

展开>> 收起<<
FedRecover Recovering from Poisoning Attacks in Federated Learning using Historical Information Xiaoyu CaoJinyuan JiaZaixi ZhangNeil Zhenqiang Gong.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:1.5MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注