Backdoor Attacks in the Supply Chain of Masked Image Modeling Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2NetApp

2025-04-27 0 0 1.1MB 13 页 10玖币
侵权投诉
Backdoor Attacks in the Supply Chain of Masked Image Modeling
Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1
1CISPA Helmholtz Center for Information Security 2NetApp
Abstract
Masked image modeling (MIM) revolutionizes self-
supervised learning (SSL) for image pre-training. In contrast
to previous dominating self-supervised methods, i.e., con-
trastive learning, MIM attains state-of-the-art performance
by masking and reconstructing random patches of the input
image. However, the associated security and privacy risks
of this novel generative method are unexplored. In this
paper, we perform the first security risk quantification of
MIM through the lens of backdoor attacks. Different from
previous work, we are the first to systematically threat
modeling on SSL in every phase of the model supply chain,
i.e., pre-training, release, and downstream phases. Our
evaluation shows that models built with MIM are vulnerable
to existing backdoor attacks in release and downstream
phases and are compromised by our proposed method in
pre-training phase. For instance, on CIFAR10, the attack
success rate can reach 99.62%, 96.48%, and 98.89% in the
downstream phase, release phase, and pre-training phase,
respectively. We also take the first step to investigate the
success factors of backdoor attacks in the pre-training phase
and find the trigger number and trigger pattern play key roles
in the success of backdoor attacks while trigger location
has only tiny effects. In the end, our empirical study of the
defense mechanisms across three detection-level on model
supply chain phases indicates that different defenses are
suitable for backdoor attacks in different phases. However,
backdoor attacks in the release phase cannot be detected by
all three detection-level methods, calling for more effective
defenses in future research.
1 Introduction
The self-supervised pre-training task has been dominant by
contrastive learning, a discriminative method, in the com-
puter vision domain since 2018 [35]. Recently, with the ad-
vent of the Transformer architecture, masked image model-
ing (MIM), a generative method, has successfully surpassed
contrastive learning and reached state-of-the-art performance
on self-supervised pre-training tasks [6,8,15,32]. Com-
pared with contrastive learning which aims to align different
augmented views of the same image, MIM learns from pre-
dicting properties of masked patches from unmasked parts.
It plays as a milestone that bridges the gap between vi-
sual and linguistic self-supervised pre-training methods, and
has quickly emerged variants in applications such as im-
ages [3,5], video [25,29], audio [4], and graph [23]. How-
ever, as an iconic method settling in another branch of SSL,
the associated security risks caused by the mask-and-predict
mechanism and novel architectures of MIM are still unex-
plored.
Our Contributions. In this paper, we perform the first secu-
rity risk quantification of MIM through the lens of backdoor
attacks. Different from previous work, we are the first to
systematically categorize the threat models on MIM in ev-
ery phase of model supply chain, i.e., pre-training, release,
and downstream phases (see Section 3 for more details). Our
evaluation shows that models built with MIM are vulnera-
ble to existing backdoor attacks in release and downstream
phases. For instance, in the downstream phase, with only
0.1% poisoning rate (e.g., only 50 training samples on CI-
FAR10) and 0.05% occupied area of the image, the attacker
can achieve 89.37% ASR on CIFAR10.
We also observe that previous attack [21], which suc-
cessfully backdoors contrastive learning in the pre-training
phase, cannot achieve satisfying attack performance on
MIM. The ASR is only 2.83% and 13.78% higher than the
baseline on CIFAR10 and STL10, respectively. To improve
the attack performance in the pre-training phase, we propose
a simple yet effective method: increasing the number of trig-
gers in the span of the whole image. We observe that, with
our method, the ASR rises to 98.89% and 97.74% on CI-
FAR10 and STL10 datasets, respectively.
To further investigate the hardest yet rarely explored sce-
nario, i.e., the pre-training phase, we conduct comprehensive
ablation studies on the properties of triggers, i.e., pattern, lo-
cation, number, size, and poisoning rate. We find that trig-
ger pattern and trigger number are key components that af-
fect attack performance on MIM, which is different from a
previous study on contrastive learning [21]. We utilize the
white trigger and publicly released triggers of Hidden Trig-
ger Backdoor Attacks (HTBA) to evaluate the effects of trig-
ger pattern [20]. We observe that the white triggers only get
7.19% ASR on STL10, while the ASRs of trigger HTBA-10,
HTBA-12, and HTBA-14 are 97.74%, 98.05%, 62.74%.
Our fourth contribution is the empirical study of the de-
fense mechanisms. Concretely, we investigate the detection
performance from three detection-level on all model sup-
ply chain phases. Our evaluation shows that both model-
level [27] and input-level [12] defenses can detect backdoor
1
arXiv:2210.01632v1 [cs.CR] 4 Oct 2022
attacks in the downstream phase while dataset-level [26] de-
fense works well in recognizing poisoned samples in the pre-
training dataset. To our surprise, backdoor attacks in the re-
lease phase, called Type II attack in our paper, cannot be de-
tected by all three detection-level methods, which prompts
the call for more effective defenses in future research.
2 Preliminary
2.1 Masked Image Modeling (MIM)
The core idea of MIM is masking random parts of the image
and then learning to reconstruct the missing parts. It follows
the autoencoder design with the transformer architecture as
the building blocks to perform the task. The input image
is first cropped to patches, e.g., 16 ×16 patches, and MIM
randomly masks certain portions of patches. The encoder
then maps the unmasked patches to a latent representation
and uses the decoder to predict properties of masked patches
from the latent representation. The predicted property can
be the original pixels [15], latent representation [29], or vi-
sual tokens [6,8]. The objective of MIM is to minimize the
difference between predicted properties and real properties
of masked patches. Generally speaking, MIM can be con-
cluded into two categories, tokenizer-based methods [8] and
end-to-end methods [35].
Tokenizer-Based MIM. Inspired by the success of
masked language modeling, tokenizer-based MIM mimics
BERT [10] to reconstruct visual tokens. It includes two steps:
utilizes an image tokenizer to generate tokens of masked
patches and then optimizes the loss by predicting the correct
tokens via visual patches.
End-to-End MIM. As the name implies, end-to-end MIM
is a one-stage method without the pre-trained tokenizer. The
method is straightforward and effective. By directly predict-
ing large portions of masked patches with the help of small
portions of unmasked patches, it can achieve impressive per-
formance.
2.2 Supply Chain of Self-Supervised Models
As Figure 1 displays, the supply chain of self-supervised
models can be generally summarized into three phases. The
first phase is the pre-training phase, where the model owner
utilizes images collected by the data donor to train the self-
supervised model. The second phase is the release phase
where the model owner makes the trained model available
online via public platforms such as ModelZoo 1and Hug-
ging Face 2. The third phase is the downstream phase. In this
phase, the downstream model owner adopts the pre-trained
encoder as the backbone and fine-tunes an extra classifica-
tion layer, i.e., MLP layer, to perform the downstream tasks.
The new model (containing an encoder and a classifier) is
called downstream model.
1https://modelzoo.co/
2https://huggingface.co/
Enc Dec
Release Phase Downstream Phase
Cat
Dog
Truck
Airplane
Pre-training Phase
Enc Enc
M
L
P
Figure 1: Supply chain of self-supervised models.
2.3 Backdoor Attacks
In general, backdoor attacks inject hidden backdoors into
machine learning models so that the infected models perform
well on clean images but misclassify images with a specific
trigger into a target class. As an emerging and rapidly grow-
ing research area, various backdoor attacks have been pro-
posed [7,14,16,18,20,21,28] and can be broadly summarized
into two categories, i.e., poisoning-based and non-poisoning-
based backdoor attacks [17].
Poisoning-based Backdoor Attack. Give a training set
(X,Y)Dtrain, we first denote a target model as f:XY
where XRdis a set of data samples and Y={1,2, ..., K}
is a set of labels. Given a sample xwith its label y, we as-
sume the adversary has a target label e
yand a trigger patch
t. The attacker constructs a poisoned pair (e
x,
e
y)by replacing
the label yto e
yand pasting the trigger ton the image xto
get the patched image e
x. Then, the attacker injects a portion
pof poisoned pair (e
x,
e
y)into Dtrain ( 0 <p<1). Since the
victim is not aware that the training set has been modified,
the backdoor would be successfully embedded in the model
after the training process.
Non-poisoning-based Backdoor Attacks. Different from
poisoning-based backdoor attacks, non-poisoning-based
backdoor attacks [16,19] directly modify model parameters
to inject backdoors without poisoning the training set. Given
a clean model f, the attacker aims to optimize it to a back-
doored model f0. Concretely, the attacker collects a shadow
dataset Dshadow poisoned with trigger tand adopt a reference
image rfrom the target class e
y. The optimization problem
aims to minimize the distance between Dshadow and r.
3 Attack Taxonomy and Methodology
As we are the first to investigate backdoor attacks on masked
image modeling, we begin by defining our adversary’s goal
with a unified attack taxonomy covering all phases in the
model supply chain. Note, the attack taxonomy can also be
generally extended to self-supervised models.
Adversary’s Goal. Following previous work [14,16], we as-
sume the adversary aims to backdoor the downstream model
so that the model performs well on clean images but mis-
classifies images with a specific trigger into a target class. To
achieve this goal, the adversary can perform backdoor attacks
from different phases in MIM model’s supply chain.
Attack Taxonomy and Adversary’s Capability. Different
from previous work, we are the first to systematically threat
modeling on MIM in every phase of model supply chain, i.e.,
pre-training, release, and downstream phases. Table 1 shows
our proposed attack taxonomy and the attacker’s correspond-
ing capabilities. We name the backdoor attacks in each phase
2
Table 1: Attack Taxonomy. The attacks are increasingly harder in row order. : Applicable or Necessary; : Inapplicable or
Unnecessary; : Partially Applicable or Necessary.
Phase Pre-training Release Downstream
Attack , Capability Pre-training set Model Downstream set Downstream model Inference pipeline
Type I
Type II
Type III
as Type I, Type II, and Type III attacks, respectively, and
adopt three representative backdoor attacks [14,16,21] as
well as our proposed method to quantify the security risk of
each phase.
Type I attack is a poisoning-based backdoor attack that
happens at the downstream phase. We assume that the adver-
sary knows the downstream tasks and has capability to inject
a small number of labeled poisoned samples into the down-
stream training set. However, they have no knowledge of
pre-trained model and pre-trained dataset. Concretely, given
a downstream training set (X,Y)Ddown and downstream
classifier F, Type I attack poisons pportion of samples with
trigger tin Ddown. The victim then uses the poisoned down-
stream dataset
e
Ddown to optimize the downstream model.
Type II attack is a non-poisoning-based backdoor attack
and takes place in the release phase. The attacker can be
either an untrusted service provider who injects a backdoor
into its pre-trained model or a malicious third-party who
downloads the released pre-trained model, injects a back-
door into it, and then re-publishes it online [16]. In this sce-
nario, the attacker has full access to the pre-trained model
but has no knowledge of the pre-training dataset, down-
stream dataset, and downstream training schedule. Specifi-
cally, given a clean MIM model M, we have ˆx=M(x) =
Dec(Enc(x)), where Enc is the encoder and Dec is the de-
coder. To train a downstream task, the decoder Dec will be
discard and the victim will build a new model Fso that ˆy=
F(x) = MLP(Enc(x)). The goal of attacker is to optimize
Enc to a poisoned
g
Enc so that e
y=
e
F(x) = MLP(
g
Enc(e
x))
where e
yis the target class and e
xis a poisoned sample.
Type III attack is a poisoning-based backdoor attack. Sim-
ilar to Type I attack, the attackers have no knowledge of the
model hyperparameters and can only poison a small fraction
of the pre-training dataset. However, unlike Type I attack
where the attacker can directly change the label of poisoned
samples in the downstream dataset, the pre-training dataset
has no label. To address this issue, the attacker in Type III
attack only poisons samples from the target class by adding
triggers to them and expects the pre-trained model to recog-
nize the triggers as a part of the target class to establish an
inner connection between the trigger and the specific target
class. In reality, Type III attacker can be a malicious data
donor who releases poisoned images on the Internet. Once
the poisoned images are scraped by the model owner with-
out censoring, they can inject backdoors into the pre-trained
models.
4 Evaluation
4.1 Experimental Settings
Datasets. We utilize four datasets in our experiments. For
Type I and Type II attacks, we use publicly available Im-
ageNet pre-trained MIM models and use CIFAR10, CI-
FAR100, and STL10 as the datasets to perform the down-
stream tasks. For Type III attack, we use ImageNet20 to
pre-train MIM models and consider CIFAR10, STL10, and
ImageNet20 as the downstream datasets. All images are re-
sized to 224 ×224 to fit the input requirement of the models,
which is also a common practice in related work [11,16].
Target Model. We consider two MIM architectures as the
target models, i.e., Masked Autoencoder (MAE) [15] for
end-to-end MIM and Contextual Autoencoder (CAE) [8] for
tokenizer-based MIM. For both the two target models, we
adopt the same base variant of ViT (ViT-B) with 224 × 224
input image size and 16 × 16 patch size.
Concretely, for Type I and Type II attacks, as the adver-
sary does not involve in the pre-training phase, we utilize the
public MAE 3and CAE 4as our target model. This aligns
with the threat model that attackers can only get access to
the released models. For Type III attack, we use ImageNet
dataset to train the two target models from scratch. Note, the
models contain around 89M and 149M parameters, which
costs huge time and computing resources to train it on the
complete ImageNet dataset from scratch. Therefore, we in-
stead use a subset of ImageNet to perform a quick evaluation
in the pre-training phase. The subset contains 20 randomly-
extract labels (see Table 10 in Appendix). This is also a com-
mon way to do the evaluation [21,24]. Note that in Type III
attack, we replace the CIFAR100 with ImageNet20 as the
downstream dataset as the pre-training dataset ImageNet20
does not cover all classes on CIFAR100, which yields less
satisfying clean accuracy. Also, previous work [8,15] lever-
ages the pre-training dataset as the downstream dataset as
well.
Metric. We consider four evaluation metrics. Test accu-
racy (TA)/clean accuracy (CA) measures the classification
accuracy of the backdoored/clean model on clean testing im-
ages. Attack success rate (ASR)/attack success rate-baseline
(ASR-B) denotes the classification accuracy of the back-
doored/clean model on poisoned testing images with trig-
gers.
We refer the readers to Section A.1 for detailed descrip-
tions of the datasets, triggers, and configurations of pre-
training tasks, downstream tasks, backdoor attacks, and de-
3https://github.com/facebookresearch/mae
4https://github.com/lxtGH/CAE
3
摘要:

BackdoorAttacksintheSupplyChainofMaskedImageModelingXinyueShen1XinleiHe1ZhengLi1YunShen2MichaelBackes1YangZhang11CISPAHelmholtzCenterforInformationSecurity2NetAppAbstractMaskedimagemodeling(MIM)revolutionizesself-supervisedlearning(SSL)forimagepre-training.Incontrasttopreviousdominatingself-supervis...

展开>> 收起<<
Backdoor Attacks in the Supply Chain of Masked Image Modeling Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2NetApp.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:1.1MB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注