
Preprint. Under review.
Federated Averaging is further safeguarded against malicious users by the use of norm-bounding.
Each updated model θi,u is projected onto an ||θi,u||p≤C, for some clip value Cso that no user
update can dominate the average.
Norm-bounding is necessary to defend against model replacment attacks described in Bagdasaryan
et al. (2019) and Bhagoji et al. (2019) which send malicious updates with extreme magnitudes that
overpower updates from benign users. Once norm-bounding is in place as a defense though, the
potential threat posed by malicious attacks remains debated. We summarize a few related areas of
research, before returning to this question:
Adversarial Machine Learning
The attacks investigated in this paper are a special case of train-time adversarial attacks against
machine learning systems (Biggio et al., 2012; Cin`
a et al., 2022). The federated learning scenario is
naturally an online,white-box scenario. The attack happens online, while the model is training, and
can adapt to the current state of training. The attack is also white-box as all users have knowledge
of model architecture and local training hyperparameters.
Train-time Attacks
In this work we are interested in backdoor attacks, also refered to as targeted attacks, which form
a subset of model integrity attacks (Barreno et al., 2010). These attacks generally attempt to incor-
porate malicious behavior into a model without modifying its apparent performance on test data.
In the simplest case, malicious behavior could be an image classification model that misclassifies
images marked with a special patch. These attacks are in contrast to model availability attacks
which aim to undermine model performance on all hold-out data. Availability attacks are generally
considered infeasible in large-scale federated learning systems when norm-bounding is employed
(Shejwalkar et al., 2021), given that malicious users likely form only a minority of all users.
Data Poisoning
The model poisoning attacks described above are closely related to data poisoning attacks against
centralized training (Goldblum et al., 2020). The idea of anticipating future updates has been inves-
tigated in some works on data poisoning (Mu˜
noz-Gonz´
alez et al., 2017; Huang et al., 2020) where
it arises as approximation of the bilevel data poisoning objective. These attacks optimize a set of
poisoned datapoints by differentiating through several steps of the expected SGD update that the
the central server would perform on this data. However, for data poisoning, the attacker is unaware
of the model state used by the server, cannot optimize their attack for each round of training, and
has only approximate knowledge of model architecture and hyperparameters. These complications
lead Huang et al. (2020) to construct a large ensemble of model states trained to different stages to
approximate missing knowledge.
3 CAN YOU BACKDOOR FEDERATED LEARNING?
Backdoor attacks against federated learning have been described in Bagdasaryan et al. (2019). The
attacker uses local data and their malicious objective to create their own replacement model, scales
this replacement model to the largest scale allowed by the server’s norm-bounding rule and sends
it. However, as discussed in Sun et al. (2019b), for a more realistic number of malicious users and
randomly occurring attacks, backdoor success is much smaller, especially against stringent norm-
bounding. Wang et al. (2020) note that backdoor success is high in edge cases not seen in training
and that backdoors that attack “rare” samples (such as only airplanes in a specific color in images,
or a specific sentence in text) can be much more successful, as other users do not influence these
predictions significantly. A number of variants of this attack exist (Costa et al., 2021; Pang et al.,
2021; Fang et al., 2020; Baruch et al., 2019; Xie et al., 2019; Datta et al., 2021; Yoo & Kwak, 2022;
Zhang et al., 2019; Sun et al., 2022), for example allowing for collusion between multiple users or
generating additional data for the attacker. In this work we will focus broadly on the threat model of
Bagdasaryan et al. (2019); Wang et al. (2020).
Threat Model We assume a federated learning protocol running with multiple users, attacked
by online white-box model poisoning. The server orchestrates federated averaging with norm-
bounding. At each attack opportunity, the attack controls only a single user and only has knowledge
about the local data from this user. The attacker has full control over the model update that will be
returned to the server and can optimize this model freely. As a participating user in FL, the attacker
3