
Invariant Aggregator for Defending against Federated Backdoor Attacks
landscapes can succeed without incurring significant
differences between benign and malicious client updates
due to the diminished gradient magnitudes from be-
nign clients. We further show that this phenomenon,
combined with other factors, such as the stochastic
nature of the update, can help backdoor adversaries
circumvent existing defenses. Our analysis also broadly
includes data-centric approaches such as the edge-case
attack (Wang et al., 2020) and the trigger inversion
defense (Wang et al., 2019; Zhang et al., 2023).
Our methodology. To avoid the failure modes of
existing defenses over flat loss landscapes, we propose
an invariant aggregator to defend against federated
backdoor attacks under a minority adversary setting
(Shejwalkar et al., 2022). Our defense examines each
dimension of (pseudo-)gradients
1
to avoid overlooking
any backdoor attacks that only manipulate a few ele-
ments without incurring much difference on gradient
vectors. For each dimension, we enforce the aggregated
update points to invariant directions that are generally
useful for most clients instead of favoring a few and
possibly malicious clients. As a result, our defense
remains effective with flat loss landscapes where the
magnitudes of benign gradients can be small.
Our approach. We consider the gradient sign (e.g.,
positive, negative, or zero) as a magnitude-agnostic in-
dicator of benefit. Two clients having a consistent sign
implies that going along the direction pointed by the
gradient can benefit both clients and vice versa. Fol-
lowing this intuition, we employ an AND-mask (Paras-
candolo et al., 2021) to set the gradient dimension with
sign consistency below a given threshold to zero, mask-
ing out gradient elements that benefit a few clients.
However, this alone is insufficient: the malicious clients
can still use outliers to mislead the aggregation re-
sult even if the sign consistency is high. To address
this issue, we further complement AND-mask with the
trimmed-mean estimator (Xie et al., 2020a; Lugosi and
Mendelson, 2021) as a means to remove the outliers.
We theoretically show that the combination of AND-
mask and trimmed-mean estimator is necessary and
sufficient for mitigating backdoor attacks.
Our empirical evaluation employs a broad class of back-
door attacks, as detailed in Section 6.1, to test our
defense. Empirical results on tabular (phishing emails),
visual (CIFAR-10) (Krizhevsky, 2009; McMahan et al.,
2017), and text (Twitter) (Caldas et al., 2018) datasets
demonstrate that our method is effective in defending
against backdoor attacks without degrading utility as
compared to prior works. On average, our approach
1
We overload ”gradient” to indicate any local model
update communicated to the server in the federated set-
ting, e.g., updates could be pseudo-gradients computed as
differences between model updates after several local steps.
decreases the backdoor attack success rate by 61.6%
and only loses 1.2% accuracy on benign samples com-
pared to the standard FedAvg aggregator (McMahan
et al., 2017).
Contributions. Our contributions are as follows:
•
We analyze the failure modes of multiple prominent
defenses against federated backdoor attacks over
a flat loss landscape.
•
We develop a combination of defenses using
an AND-mask and the trimmed-mean estimator
against the backdoor attack by focusing on the
dimension-wise invariant gradient directions.
•
We theoretically analyze our strategy and demon-
strate that a combination of an AND-mask and
the trimmed-mean estimator is necessary and suf-
ficient for mitigating backdoor attacks.
•
We empirically evaluate our method on three
datasets with varying modality, trigger patterns,
model architecture, and client numbers, as well as
comparing the performance to existing defenses.
2 Related Work
Backdoor Attack. Common backdoor attacks aim at
misleading the model predictions using a trigger (Liu
et al., 2018). The trigger can be digital (Bagdasaryan
et al., 2020), physical (Wenger et al., 2021), semantic
(Wang et al., 2020), or invisible (Li et al., 2021a). Re-
cent works extended backdoor attacks to the federated
learning setting and proposed effective improvements
such as gradient scaling (Bagdasaryan et al., 2020) or
generating edge-case backdoor samples (Wang et al.,
2020). The edge-case backdoor attack shows that us-
ing backdoor samples with low probability density on
benign clients (i.e., unlikely samples w.r.t. the train-
ing distribution) is hard to detect and defend in the
federated learning setting.
Centralized Defense. There is a line of work propos-
ing centralized defenses against backdoor attacks where
the main aim is either detecting the backdoor samples
(Tran et al., 2018) or purifying the model parameters
that are poisoned (Li et al., 2021b). However, applying
such centralized defense to federated learning systems
is infeasible in practice due to limited client data access
in many implementations.
Federated Defenses. Recent works have attempted
to defend against backdoor attacks in federated learning
systems. Sun et al. (2019) shows that weak differential-
private (weak-dp) federated averaging can mitigate
the backdoor attack. However, the weak-dp defense
is circumvented by the improved edge-case federated