Backdoor Attacks in the Supply Chain of Masked Image Modeling
Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1
1CISPA Helmholtz Center for Information Security 2NetApp
Abstract
Masked image modeling (MIM) revolutionizes self-
supervised learning (SSL) for image pre-training. In contrast
to previous dominating self-supervised methods, i.e., con-
trastive learning, MIM attains state-of-the-art performance
by masking and reconstructing random patches of the input
image. However, the associated security and privacy risks
of this novel generative method are unexplored. In this
paper, we perform the first security risk quantification of
MIM through the lens of backdoor attacks. Different from
previous work, we are the first to systematically threat
modeling on SSL in every phase of the model supply chain,
i.e., pre-training, release, and downstream phases. Our
evaluation shows that models built with MIM are vulnerable
to existing backdoor attacks in release and downstream
phases and are compromised by our proposed method in
pre-training phase. For instance, on CIFAR10, the attack
success rate can reach 99.62%, 96.48%, and 98.89% in the
downstream phase, release phase, and pre-training phase,
respectively. We also take the first step to investigate the
success factors of backdoor attacks in the pre-training phase
and find the trigger number and trigger pattern play key roles
in the success of backdoor attacks while trigger location
has only tiny effects. In the end, our empirical study of the
defense mechanisms across three detection-level on model
supply chain phases indicates that different defenses are
suitable for backdoor attacks in different phases. However,
backdoor attacks in the release phase cannot be detected by
all three detection-level methods, calling for more effective
defenses in future research.
1 Introduction
The self-supervised pre-training task has been dominant by
contrastive learning, a discriminative method, in the com-
puter vision domain since 2018 [35]. Recently, with the ad-
vent of the Transformer architecture, masked image model-
ing (MIM), a generative method, has successfully surpassed
contrastive learning and reached state-of-the-art performance
on self-supervised pre-training tasks [6,8,15,32]. Com-
pared with contrastive learning which aims to align different
augmented views of the same image, MIM learns from pre-
dicting properties of masked patches from unmasked parts.
It plays as a milestone that bridges the gap between vi-
sual and linguistic self-supervised pre-training methods, and
has quickly emerged variants in applications such as im-
ages [3,5], video [25,29], audio [4], and graph [23]. How-
ever, as an iconic method settling in another branch of SSL,
the associated security risks caused by the mask-and-predict
mechanism and novel architectures of MIM are still unex-
plored.
Our Contributions. In this paper, we perform the first secu-
rity risk quantification of MIM through the lens of backdoor
attacks. Different from previous work, we are the first to
systematically categorize the threat models on MIM in ev-
ery phase of model supply chain, i.e., pre-training, release,
and downstream phases (see Section 3 for more details). Our
evaluation shows that models built with MIM are vulnera-
ble to existing backdoor attacks in release and downstream
phases. For instance, in the downstream phase, with only
0.1% poisoning rate (e.g., only 50 training samples on CI-
FAR10) and 0.05% occupied area of the image, the attacker
can achieve 89.37% ASR on CIFAR10.
We also observe that previous attack [21], which suc-
cessfully backdoors contrastive learning in the pre-training
phase, cannot achieve satisfying attack performance on
MIM. The ASR is only 2.83% and 13.78% higher than the
baseline on CIFAR10 and STL10, respectively. To improve
the attack performance in the pre-training phase, we propose
a simple yet effective method: increasing the number of trig-
gers in the span of the whole image. We observe that, with
our method, the ASR rises to 98.89% and 97.74% on CI-
FAR10 and STL10 datasets, respectively.
To further investigate the hardest yet rarely explored sce-
nario, i.e., the pre-training phase, we conduct comprehensive
ablation studies on the properties of triggers, i.e., pattern, lo-
cation, number, size, and poisoning rate. We find that trig-
ger pattern and trigger number are key components that af-
fect attack performance on MIM, which is different from a
previous study on contrastive learning [21]. We utilize the
white trigger and publicly released triggers of Hidden Trig-
ger Backdoor Attacks (HTBA) to evaluate the effects of trig-
ger pattern [20]. We observe that the white triggers only get
7.19% ASR on STL10, while the ASRs of trigger HTBA-10,
HTBA-12, and HTBA-14 are 97.74%, 98.05%, 62.74%.
Our fourth contribution is the empirical study of the de-
fense mechanisms. Concretely, we investigate the detection
performance from three detection-level on all model sup-
ply chain phases. Our evaluation shows that both model-
level [27] and input-level [12] defenses can detect backdoor
1
arXiv:2210.01632v1 [cs.CR] 4 Oct 2022