Preprint. Under review. THINKING TWO MOVES AHEAD A NTICIPATING OTHER USERS IMPROVES BACKDOOR ATTACKS IN

2025-05-06 0 0 1.53MB 17 页 10玖币
侵权投诉
Preprint. Under review.
THINKING TWO MOVES AHEAD: ANTICIPATING
OTHER USERS IMPROVES BACKDOOR ATTACKS IN
FEDERATED LEARNING
Yuxin Wen& Jonas Geiping
University of Maryland
{ywen,jgeiping}@umd.edu
Liam Fowl
Google
Hossein Souri
Johns Hopkins University
Rama Chellappa
Johns Hopkins University
Micah Goldblum
New York University
Tom Goldstein
University of Maryland
ABSTRACT
Federated learning is particularly susceptible to model poisoning and backdoor
attacks because individual users have direct control over the training data and
model updates. At the same time, the attack power of an individual user is
limited because their updates are quickly drowned out by those of many other
users. Existing attacks do not account for future behaviors of other users, and
thus require many sequential updates and their effects are quickly erased. We
propose an attack that anticipates and accounts for the entire federated learning
pipeline, including behaviors of other clients, and ensures that backdoors are ef-
fective quickly and persist even after multiple rounds of community updates. We
show that this new attack is effective in realistic scenarios where the attacker
only contributes to a small fraction of randomly sampled rounds and demon-
strate this attack on image classification, next-word prediction, and sentiment
analysis. Code is available at https://github.com/YuxinWenRick/
thinking-two-moves-ahead.
1 INTRODUCTION
When training models on private information, it is desirable to choose a learning paradigm that does
not require stockpiling user data in a central location. Federated learning (Koneˇ
cn´
y et al., 2015;
McMahan et al., 2017b) achieves this goal by offloading the work of model training and storage
to remote devices that do not directly share data with the central server. Each user device instead
receives the current state of the model from the central server, computes local updates based on user
data, and then returns only the updated model to the server.
Unfortunately, by placing responsibility for model updates in the handle of many anonymous users,
federated learning also opens up model training to a range of malicious attacks (Bagdasaryan et al.,
2019; Kairouz et al., 2021). In model poisoning attacks (Biggio & Roli, 2018; Bhagoji et al., 2019),
a user sends malicious updates to the central server to alter behavior of the model. For example in
language modeling, backdoor attacks could modify the behavior of the final model to misrepresent
specific facts, attach negative sentiment to certain groups, change behavior in edge cases, but also
attach false advertising and spam to certain key phrases.
In practical applications, however, the real threat posed by such attacks is debated (Sun et al., 2019b;
Wang et al., 2020; Shejwalkar et al., 2021). Usually only a small fraction of users are presumed to
be malicious, and their impact on the final model can be small, especially when the contributions of
each user are limited by norm-bounding (Sun et al., 2019b). Attacks as described in Bagdasaryan
& Shmatikov (2021) further require successive attacks over numerous sequential rounds of training.
This is not realistic in normal cross-device applications (Bonawitz et al., 2019; Hard et al., 2019)
Equal contribution
Work done at University of Maryland
1
arXiv:2210.09305v1 [cs.LG] 17 Oct 2022
Preprint. Under review.
Figure 1: Our method, Anticipate, reaches 100% backdoor accuracy faster than the baseline
in the setting of 100 random attacks in the first 500 rounds. Moreover, after the window of attack
passes, the attack decays much slower than the baseline. At the end of federated training, our attack
still has backdoor accuracy of 60%, while the baseline maintains just 20%. Overall, only 100 out of
a total of 20k contributions are malicious.
where users are randomly selected in each round from a larger pool, making it exceedingly unlikely
that any attacker or even group of attackers will be able to contribute to more than a fraction of the
total rounds of training. Model updates that are limited in this way are immediately less effective,
as even strong backdoor attacks can be wiped away and replaced by subsequent updates from many
benign users Sun et al. (2019b); Shejwalkar et al. (2021).
In this work we set out to discover whether strong attacks are possible in these more realistic scenar-
ios. We make the key observation that previous attack algorithms such as described in Bagdasaryan
et al. (2019); Wang et al. (2020); Zhou et al. (2021) only consider the immediate effects of a model
update, and ignore the downstream impacts of updates from benign users. We show that, by model-
ing these future updates, a savvy attacker can update model parameters in a way that is unlikely to
be over-written or undone by benign users. By backpropagating through simulated future updates,
our proposed attack directly optimizes a malicious update to maximize its permanence. Using both
vision and language tasks, and under a realistic threat model where attack opportunities are rare,
we see that these novel attacks become operational after fewer attack opportunities than baseline
methods, and remain active for much longer after the attack has passed as shown in Figure 1.
2 BACKGROUND
Federated Learning systems have been described in a series of studies and a variety of protocols.
In this work, we focus on mainly on federated averaging (fedAVG) as proposed in McMahan et al.
(2017b) and implemented in a range of recent system designs (Bonawitz et al., 2019; Paulik et al.,
2021; Dimitriadis et al., 2022), but the attack we describe can be extended to other algorithms. In
fedAVG, the server sends the current state of the model θito all users selected for the next round of
training. Each user then computed an updated local model through several iterations, for example
via local SGD. The u-th local user has data Dwhich is partitioned into batches Duand then, starting
from the global model, their local model is updated for msteps based on the training objective L:
θi+1,u =θi,u τ∇L(Du, θi,u).(1)
The updated models θi+1,u from each user are returned to the server which computes a new central
state by averaging:
θi+1 =1
n
n
X
u=1
θi+1,u.(2)
We will later summarize this procedure that depends on a group of users Uiin the i-th round as
θi+1 =Favg(Ui, θi).
Optionally, the average can be reweighted based on the amount of data controlled by each user
(Bonawitz et al., 2017), however this is unsafe without further precautions, as an attacker could
overweight their own contributions such that we only consider unweighted averages in this work.
2
Preprint. Under review.
Federated Averaging is further safeguarded against malicious users by the use of norm-bounding.
Each updated model θi,u is projected onto an ||θi,u||pC, for some clip value Cso that no user
update can dominate the average.
Norm-bounding is necessary to defend against model replacment attacks described in Bagdasaryan
et al. (2019) and Bhagoji et al. (2019) which send malicious updates with extreme magnitudes that
overpower updates from benign users. Once norm-bounding is in place as a defense though, the
potential threat posed by malicious attacks remains debated. We summarize a few related areas of
research, before returning to this question:
Adversarial Machine Learning
The attacks investigated in this paper are a special case of train-time adversarial attacks against
machine learning systems (Biggio et al., 2012; Cin`
a et al., 2022). The federated learning scenario is
naturally an online,white-box scenario. The attack happens online, while the model is training, and
can adapt to the current state of training. The attack is also white-box as all users have knowledge
of model architecture and local training hyperparameters.
Train-time Attacks
In this work we are interested in backdoor attacks, also refered to as targeted attacks, which form
a subset of model integrity attacks (Barreno et al., 2010). These attacks generally attempt to incor-
porate malicious behavior into a model without modifying its apparent performance on test data.
In the simplest case, malicious behavior could be an image classification model that misclassifies
images marked with a special patch. These attacks are in contrast to model availability attacks
which aim to undermine model performance on all hold-out data. Availability attacks are generally
considered infeasible in large-scale federated learning systems when norm-bounding is employed
(Shejwalkar et al., 2021), given that malicious users likely form only a minority of all users.
Data Poisoning
The model poisoning attacks described above are closely related to data poisoning attacks against
centralized training (Goldblum et al., 2020). The idea of anticipating future updates has been inves-
tigated in some works on data poisoning (Mu˜
noz-Gonz´
alez et al., 2017; Huang et al., 2020) where
it arises as approximation of the bilevel data poisoning objective. These attacks optimize a set of
poisoned datapoints by differentiating through several steps of the expected SGD update that the
the central server would perform on this data. However, for data poisoning, the attacker is unaware
of the model state used by the server, cannot optimize their attack for each round of training, and
has only approximate knowledge of model architecture and hyperparameters. These complications
lead Huang et al. (2020) to construct a large ensemble of model states trained to different stages to
approximate missing knowledge.
3 CAN YOU BACKDOOR FEDERATED LEARNING?
Backdoor attacks against federated learning have been described in Bagdasaryan et al. (2019). The
attacker uses local data and their malicious objective to create their own replacement model, scales
this replacement model to the largest scale allowed by the server’s norm-bounding rule and sends
it. However, as discussed in Sun et al. (2019b), for a more realistic number of malicious users and
randomly occurring attacks, backdoor success is much smaller, especially against stringent norm-
bounding. Wang et al. (2020) note that backdoor success is high in edge cases not seen in training
and that backdoors that attack “rare” samples (such as only airplanes in a specific color in images,
or a specific sentence in text) can be much more successful, as other users do not influence these
predictions significantly. A number of variants of this attack exist (Costa et al., 2021; Pang et al.,
2021; Fang et al., 2020; Baruch et al., 2019; Xie et al., 2019; Datta et al., 2021; Yoo & Kwak, 2022;
Zhang et al., 2019; Sun et al., 2022), for example allowing for collusion between multiple users or
generating additional data for the attacker. In this work we will focus broadly on the threat model of
Bagdasaryan et al. (2019); Wang et al. (2020).
Threat Model We assume a federated learning protocol running with multiple users, attacked
by online white-box model poisoning. The server orchestrates federated averaging with norm-
bounding. At each attack opportunity, the attack controls only a single user and only has knowledge
about the local data from this user. The attacker has full control over the model update that will be
returned to the server and can optimize this model freely. As a participating user in FL, the attacker
3
Preprint. Under review.
is also aware of the number of local steps and local learning rate that users are expected to use. We
will discuss two variations of this threat model with different attack opportunities. 1) An attacker
opportunity is provided every round during a limited time window as in Bagdasaryan et al. (2019).
2) Only a limited number of attack opportunities arise randomly during a limited time window as
discussed in Sun et al. (2019b).
We believe this threat model with random attack opportunities is a natural step towards the evalu-
ation of risks caused by backdoor attacks in more realistic systems. We do restrict the defense to
only norm-bounding and explore a worst-case attack against this scenario. As argued in Sun et al.
(2019b), norm-bounding is thought to be sufficient to prevent these attacks. We acknowledge that
other defenses exist, see overviews in Wang et al. (2022) and Qiu et al. (2022), yet the proposed
attack is designed to be used against norm-bounded FL systems and we verify in Appendix A.5 that
it does not break other defenses. We focus on norm bounding because it is a key defense that is
widely adopted in industrial implementations of federated learning (Bonawitz et al., 2019; Paulik
et al., 2021; Dimitriadis et al., 2022).
4 ATTACKS WITH END-TO-END OPTIMIZATION
4.1 BASELINE
As describe by Gu et al. (2017); Bagdasaryan et al. (2019), suppose an attacker holds Nclean data
points, Dc={xc
i, yc
i}N
i=1, and Mbackdoored data points, Db={xb
i, yb}M
i=1, where xb
icould be an
input with a special patch or edge-case example (Wang et al., 2020), and ybis an attacker-chosen
prediction. The goal of the attacker is to train a malicious model that predicts ybwhen it sees a
backdoored input, and to push this behavior to the central model. The attacker can optimize their
malicious objective Ladv directly to identify backdoored parameters:
θ= argmin
θ
Ladv(Db, θ)(3)
where Ladv is the loss function of the task, θare the weights of the local model. Some attacks such
as Bhagoji et al. (2019) also include an additional term that enforces that model performance on
local clean data remains good, when measured by the original objective L:
θ= argmin
θ
Ladv(Db, θ) + L(Dc, θ).
The update is then scaled to the maximal value allowed by norm-bounding and sent to the server.
This baseline encompasses the attacks proposed in Xie et al. (2019) and Bhagoji et al. (2019) in the
investigated setting.
4.2 ANTICIPATING OTHER USERS
This baseline attack can be understood as a greedy objective which optimizes the effect of the back-
door only for the current stage of training and assumes that the impact of other users is negligible
after scaling. We show that a stronger attack anticipates and involves the benign users’ contribu-
tions in current and several future rounds during the backdoor optimization. The optimal malicious
update sent by the attacker should be chosen so that it is optimal even if the update is averaged
with the contributions of other users and then used for several further rounds of training to which
the attacker has no access. We pose this criteria as a loss function to be optimized. Intuitively, this
allows the attack to optimally select which parts of the model update to modify, and to estimate and
avoid which parts would be overwritten by other users.
Formally, with nusers per round, suppose an attacker wants to anticipate ksteps (in the fol-
lowing we will use this keyword to denote the whole attack pipeline). Then, given the current local
model, θ0, the objective of the attacker is simply to compute the adversarial objective in Equation (3),
but optimize it not directly for θ, but instead future θi,1ik, which depends implicitly on the
attacker’s contribution.
To make this precise, we move through all steps now. Denote the model update that the attacker
contributes by θmal. In the next round following this contribution, the other users U0will themselves
4
摘要:

Preprint.Underreview.THINKINGTWOMOVESAHEAD:ANTICIPATINGOTHERUSERSIMPROVESBACKDOORATTACKSINFEDERATEDLEARNINGYuxinWen&JonasGeipingUniversityofMarylandfywen,jgeipingg@umd.eduLiamFowlyGoogleHosseinSouriJohnsHopkinsUniversityRamaChellappaJohnsHopkinsUniversityMicahGoldblumNewYorkUniversityTomGoldsteinU...

收起<<
Preprint. Under review. THINKING TWO MOVES AHEAD A NTICIPATING OTHER USERS IMPROVES BACKDOOR ATTACKS IN.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:1.53MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注