Preprint. Under review. THINKING TWO MOVES AHEAD A NTICIPATING OTHER USERS IMPROVES BACKDOOR ATTACKS IN

2025-05-06 0 0 1.53MB 17 页 10玖币

侵权投诉

Preprint. Under review.

THINKING TWO MOVES AHEAD: ANTICIPATING

OTHER USERS IMPROVES BACKDOOR ATTACKS IN

FEDERATED LEARNING

Yuxin Wen∗& Jonas Geiping∗

University of Maryland

{ywen,jgeiping}@umd.edu

Liam Fowl †

Google

Hossein Souri

Johns Hopkins University

Rama Chellappa

Johns Hopkins University

Micah Goldblum

New York University

Tom Goldstein

University of Maryland

ABSTRACT

Federated learning is particularly susceptible to model poisoning and backdoor

attacks because individual users have direct control over the training data and

model updates. At the same time, the attack power of an individual user is

limited because their updates are quickly drowned out by those of many other

users. Existing attacks do not account for future behaviors of other users, and

thus require many sequential updates and their effects are quickly erased. We

propose an attack that anticipates and accounts for the entire federated learning

pipeline, including behaviors of other clients, and ensures that backdoors are ef-

fective quickly and persist even after multiple rounds of community updates. We

show that this new attack is effective in realistic scenarios where the attacker

only contributes to a small fraction of randomly sampled rounds and demon-

strate this attack on image classiﬁcation, next-word prediction, and sentiment

analysis. Code is available at https://github.com/YuxinWenRick/

thinking-two-moves-ahead.

1 INTRODUCTION

When training models on private information, it is desirable to choose a learning paradigm that does

not require stockpiling user data in a central location. Federated learning (Koneˇ

cn´

y et al., 2015;

McMahan et al., 2017b) achieves this goal by ofﬂoading the work of model training and storage

to remote devices that do not directly share data with the central server. Each user device instead

receives the current state of the model from the central server, computes local updates based on user

data, and then returns only the updated model to the server.

Unfortunately, by placing responsibility for model updates in the handle of many anonymous users,

federated learning also opens up model training to a range of malicious attacks (Bagdasaryan et al.,

2019; Kairouz et al., 2021). In model poisoning attacks (Biggio & Roli, 2018; Bhagoji et al., 2019),

a user sends malicious updates to the central server to alter behavior of the model. For example in

language modeling, backdoor attacks could modify the behavior of the ﬁnal model to misrepresent

speciﬁc facts, attach negative sentiment to certain groups, change behavior in edge cases, but also

attach false advertising and spam to certain key phrases.

In practical applications, however, the real threat posed by such attacks is debated (Sun et al., 2019b;

Wang et al., 2020; Shejwalkar et al., 2021). Usually only a small fraction of users are presumed to

be malicious, and their impact on the ﬁnal model can be small, especially when the contributions of

each user are limited by norm-bounding (Sun et al., 2019b). Attacks as described in Bagdasaryan

& Shmatikov (2021) further require successive attacks over numerous sequential rounds of training.

This is not realistic in normal cross-device applications (Bonawitz et al., 2019; Hard et al., 2019)

∗Equal contribution

†Work done at University of Maryland

arXiv:2210.09305v1 [cs.LG] 17 Oct 2022

Preprint. Under review.

Figure 1: Our method, Anticipate, reaches 100% backdoor accuracy faster than the baseline

in the setting of 100 random attacks in the ﬁrst 500 rounds. Moreover, after the window of attack

passes, the attack decays much slower than the baseline. At the end of federated training, our attack

still has backdoor accuracy of 60%, while the baseline maintains just 20%. Overall, only 100 out of

a total of 20k contributions are malicious.

where users are randomly selected in each round from a larger pool, making it exceedingly unlikely

that any attacker or even group of attackers will be able to contribute to more than a fraction of the

total rounds of training. Model updates that are limited in this way are immediately less effective,

as even strong backdoor attacks can be wiped away and replaced by subsequent updates from many

benign users Sun et al. (2019b); Shejwalkar et al. (2021).

In this work we set out to discover whether strong attacks are possible in these more realistic scenar-

ios. We make the key observation that previous attack algorithms such as described in Bagdasaryan

et al. (2019); Wang et al. (2020); Zhou et al. (2021) only consider the immediate effects of a model

update, and ignore the downstream impacts of updates from benign users. We show that, by model-

ing these future updates, a savvy attacker can update model parameters in a way that is unlikely to

be over-written or undone by benign users. By backpropagating through simulated future updates,

our proposed attack directly optimizes a malicious update to maximize its permanence. Using both

vision and language tasks, and under a realistic threat model where attack opportunities are rare,

we see that these novel attacks become operational after fewer attack opportunities than baseline

methods, and remain active for much longer after the attack has passed as shown in Figure 1.

2 BACKGROUND

Federated Learning systems have been described in a series of studies and a variety of protocols.

In this work, we focus on mainly on federated averaging (fedAVG) as proposed in McMahan et al.

(2017b) and implemented in a range of recent system designs (Bonawitz et al., 2019; Paulik et al.,

2021; Dimitriadis et al., 2022), but the attack we describe can be extended to other algorithms. In

fedAVG, the server sends the current state of the model θito all users selected for the next round of

training. Each user then computed an updated local model through several iterations, for example

via local SGD. The u-th local user has data Dwhich is partitioned into batches Duand then, starting

from the global model, their local model is updated for msteps based on the training objective L:

θi+1,u =θi,u −τ∇L(Du, θi,u).(1)

The updated models θi+1,u from each user are returned to the server which computes a new central

state by averaging:

θi+1 =1

u=1

θi+1,u.(2)

We will later summarize this procedure that depends on a group of users Uiin the i-th round as

θi+1 =Favg(Ui, θi).

Optionally, the average can be reweighted based on the amount of data controlled by each user

(Bonawitz et al., 2017), however this is unsafe without further precautions, as an attacker could

overweight their own contributions such that we only consider unweighted averages in this work.

Preprint. Under review.

Federated Averaging is further safeguarded against malicious users by the use of norm-bounding.

Each updated model θi,u is projected onto an ||θi,u||p≤C, for some clip value Cso that no user

update can dominate the average.

Norm-bounding is necessary to defend against model replacment attacks described in Bagdasaryan

et al. (2019) and Bhagoji et al. (2019) which send malicious updates with extreme magnitudes that

overpower updates from benign users. Once norm-bounding is in place as a defense though, the

potential threat posed by malicious attacks remains debated. We summarize a few related areas of

research, before returning to this question:

Adversarial Machine Learning

The attacks investigated in this paper are a special case of train-time adversarial attacks against

machine learning systems (Biggio et al., 2012; Cin`

a et al., 2022). The federated learning scenario is

naturally an online,white-box scenario. The attack happens online, while the model is training, and

can adapt to the current state of training. The attack is also white-box as all users have knowledge

of model architecture and local training hyperparameters.

Train-time Attacks

In this work we are interested in backdoor attacks, also refered to as targeted attacks, which form

a subset of model integrity attacks (Barreno et al., 2010). These attacks generally attempt to incor-

porate malicious behavior into a model without modifying its apparent performance on test data.

In the simplest case, malicious behavior could be an image classiﬁcation model that misclassiﬁes

images marked with a special patch. These attacks are in contrast to model availability attacks

which aim to undermine model performance on all hold-out data. Availability attacks are generally

considered infeasible in large-scale federated learning systems when norm-bounding is employed

(Shejwalkar et al., 2021), given that malicious users likely form only a minority of all users.

Data Poisoning

The model poisoning attacks described above are closely related to data poisoning attacks against

centralized training (Goldblum et al., 2020). The idea of anticipating future updates has been inves-

tigated in some works on data poisoning (Mu˜

noz-Gonz´

alez et al., 2017; Huang et al., 2020) where

it arises as approximation of the bilevel data poisoning objective. These attacks optimize a set of

poisoned datapoints by differentiating through several steps of the expected SGD update that the

the central server would perform on this data. However, for data poisoning, the attacker is unaware

of the model state used by the server, cannot optimize their attack for each round of training, and

has only approximate knowledge of model architecture and hyperparameters. These complications

lead Huang et al. (2020) to construct a large ensemble of model states trained to different stages to

approximate missing knowledge.

3 CAN YOU BACKDOOR FEDERATED LEARNING?

Backdoor attacks against federated learning have been described in Bagdasaryan et al. (2019). The

attacker uses local data and their malicious objective to create their own replacement model, scales

this replacement model to the largest scale allowed by the server’s norm-bounding rule and sends

it. However, as discussed in Sun et al. (2019b), for a more realistic number of malicious users and

randomly occurring attacks, backdoor success is much smaller, especially against stringent norm-

bounding. Wang et al. (2020) note that backdoor success is high in edge cases not seen in training

and that backdoors that attack “rare” samples (such as only airplanes in a speciﬁc color in images,

or a speciﬁc sentence in text) can be much more successful, as other users do not inﬂuence these

predictions signiﬁcantly. A number of variants of this attack exist (Costa et al., 2021; Pang et al.,

2021; Fang et al., 2020; Baruch et al., 2019; Xie et al., 2019; Datta et al., 2021; Yoo & Kwak, 2022;

Zhang et al., 2019; Sun et al., 2022), for example allowing for collusion between multiple users or

generating additional data for the attacker. In this work we will focus broadly on the threat model of

Bagdasaryan et al. (2019); Wang et al. (2020).

Threat Model We assume a federated learning protocol running with multiple users, attacked

by online white-box model poisoning. The server orchestrates federated averaging with norm-

bounding. At each attack opportunity, the attack controls only a single user and only has knowledge

about the local data from this user. The attacker has full control over the model update that will be

returned to the server and can optimize this model freely. As a participating user in FL, the attacker

Preprint. Under review.

is also aware of the number of local steps and local learning rate that users are expected to use. We

will discuss two variations of this threat model with different attack opportunities. 1) An attacker

opportunity is provided every round during a limited time window as in Bagdasaryan et al. (2019).

2) Only a limited number of attack opportunities arise randomly during a limited time window as

discussed in Sun et al. (2019b).

We believe this threat model with random attack opportunities is a natural step towards the evalu-

ation of risks caused by backdoor attacks in more realistic systems. We do restrict the defense to

only norm-bounding and explore a worst-case attack against this scenario. As argued in Sun et al.

(2019b), norm-bounding is thought to be sufﬁcient to prevent these attacks. We acknowledge that

other defenses exist, see overviews in Wang et al. (2022) and Qiu et al. (2022), yet the proposed

attack is designed to be used against norm-bounded FL systems and we verify in Appendix A.5 that

it does not break other defenses. We focus on norm bounding because it is a key defense that is

widely adopted in industrial implementations of federated learning (Bonawitz et al., 2019; Paulik

et al., 2021; Dimitriadis et al., 2022).

4 ATTACKS WITH END-TO-END OPTIMIZATION

4.1 BASELINE

As describe by Gu et al. (2017); Bagdasaryan et al. (2019), suppose an attacker holds Nclean data

points, Dc={xc

i, yc

i}N

i=1, and Mbackdoored data points, Db={xb

i, yb}M

i=1, where xb

icould be an

input with a special patch or edge-case example (Wang et al., 2020), and ybis an attacker-chosen

prediction. The goal of the attacker is to train a malicious model that predicts ybwhen it sees a

backdoored input, and to push this behavior to the central model. The attacker can optimize their

malicious objective Ladv directly to identify backdoored parameters:

θ∗= argmin

Ladv(Db, θ)(3)

where Ladv is the loss function of the task, θare the weights of the local model. Some attacks such

as Bhagoji et al. (2019) also include an additional term that enforces that model performance on

local clean data remains good, when measured by the original objective L:

θ∗= argmin

Ladv(Db, θ) + L(Dc, θ).

The update is then scaled to the maximal value allowed by norm-bounding and sent to the server.

This baseline encompasses the attacks proposed in Xie et al. (2019) and Bhagoji et al. (2019) in the

investigated setting.

4.2 ANTICIPATING OTHER USERS

This baseline attack can be understood as a greedy objective which optimizes the effect of the back-

door only for the current stage of training and assumes that the impact of other users is negligible

after scaling. We show that a stronger attack anticipates and involves the benign users’ contribu-

tions in current and several future rounds during the backdoor optimization. The optimal malicious

update sent by the attacker should be chosen so that it is optimal even if the update is averaged

with the contributions of other users and then used for several further rounds of training to which

the attacker has no access. We pose this criteria as a loss function to be optimized. Intuitively, this

allows the attack to optimally select which parts of the model update to modify, and to estimate and

avoid which parts would be overwritten by other users.

Formally, with nusers per round, suppose an attacker wants to anticipate ksteps (in the fol-

lowing we will use this keyword to denote the whole attack pipeline). Then, given the current local

model, θ0, the objective of the attacker is simply to compute the adversarial objective in Equation (3),

but optimize it not directly for θ, but instead future θi,1≤i≤k, which depends implicitly on the

attacker’s contribution.

To make this precise, we move through all steps now. Denote the model update that the attacker

contributes by θmal. In the next round following this contribution, the other users U0will themselves

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Preprint.Underreview.THINKINGTWOMOVESAHEAD:ANTICIPATINGOTHERUSERSIMPROVESBACKDOORATTACKSINFEDERATEDLEARNINGYuxinWen&JonasGeipingUniversityofMarylandfywen,jgeipingg@umd.eduLiamFowlyGoogleHosseinSouriJohnsHopkinsUniversityRamaChellappaJohnsHopkinsUniversityMicahGoldblumNewYorkUniversityTomGoldsteinU...

收起<<

Preprint. Under review. THINKING TWO MOVES AHEAD A NTICIPATING OTHER USERS IMPROVES BACKDOOR ATTACKS IN.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Preprint. Under review. THINKING TWO MOVES AHEAD A NTICIPATING OTHER USERS IMPROVES BACKDOOR ATTACKS IN

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: