Backdoor Attacks in the Supply Chain of Masked Image Modeling Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2NetApp

2025-04-27 0 0 1.1MB 13 页 10玖币

侵权投诉

Backdoor Attacks in the Supply Chain of Masked Image Modeling

Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1

1CISPA Helmholtz Center for Information Security 2NetApp

Abstract

Masked image modeling (MIM) revolutionizes self-

supervised learning (SSL) for image pre-training. In contrast

to previous dominating self-supervised methods, i.e., con-

trastive learning, MIM attains state-of-the-art performance

by masking and reconstructing random patches of the input

image. However, the associated security and privacy risks

of this novel generative method are unexplored. In this

paper, we perform the ﬁrst security risk quantiﬁcation of

MIM through the lens of backdoor attacks. Different from

previous work, we are the ﬁrst to systematically threat

modeling on SSL in every phase of the model supply chain,

i.e., pre-training, release, and downstream phases. Our

evaluation shows that models built with MIM are vulnerable

to existing backdoor attacks in release and downstream

phases and are compromised by our proposed method in

pre-training phase. For instance, on CIFAR10, the attack

success rate can reach 99.62%, 96.48%, and 98.89% in the

downstream phase, release phase, and pre-training phase,

respectively. We also take the ﬁrst step to investigate the

success factors of backdoor attacks in the pre-training phase

and ﬁnd the trigger number and trigger pattern play key roles

in the success of backdoor attacks while trigger location

has only tiny effects. In the end, our empirical study of the

defense mechanisms across three detection-level on model

supply chain phases indicates that different defenses are

suitable for backdoor attacks in different phases. However,

backdoor attacks in the release phase cannot be detected by

all three detection-level methods, calling for more effective

defenses in future research.

1 Introduction

The self-supervised pre-training task has been dominant by

contrastive learning, a discriminative method, in the com-

puter vision domain since 2018 [35]. Recently, with the ad-

vent of the Transformer architecture, masked image model-

ing (MIM), a generative method, has successfully surpassed

contrastive learning and reached state-of-the-art performance

on self-supervised pre-training tasks [6,8,15,32]. Com-

pared with contrastive learning which aims to align different

augmented views of the same image, MIM learns from pre-

dicting properties of masked patches from unmasked parts.

It plays as a milestone that bridges the gap between vi-

sual and linguistic self-supervised pre-training methods, and

has quickly emerged variants in applications such as im-

ages [3,5], video [25,29], audio [4], and graph [23]. How-

ever, as an iconic method settling in another branch of SSL,

the associated security risks caused by the mask-and-predict

mechanism and novel architectures of MIM are still unex-

plored.

Our Contributions. In this paper, we perform the ﬁrst secu-

rity risk quantiﬁcation of MIM through the lens of backdoor

attacks. Different from previous work, we are the ﬁrst to

systematically categorize the threat models on MIM in ev-

ery phase of model supply chain, i.e., pre-training, release,

and downstream phases (see Section 3 for more details). Our

evaluation shows that models built with MIM are vulnera-

ble to existing backdoor attacks in release and downstream

phases. For instance, in the downstream phase, with only

0.1% poisoning rate (e.g., only 50 training samples on CI-

FAR10) and 0.05% occupied area of the image, the attacker

can achieve 89.37% ASR on CIFAR10.

We also observe that previous attack [21], which suc-

cessfully backdoors contrastive learning in the pre-training

phase, cannot achieve satisfying attack performance on

MIM. The ASR is only 2.83% and 13.78% higher than the

baseline on CIFAR10 and STL10, respectively. To improve

the attack performance in the pre-training phase, we propose

a simple yet effective method: increasing the number of trig-

gers in the span of the whole image. We observe that, with

our method, the ASR rises to 98.89% and 97.74% on CI-

FAR10 and STL10 datasets, respectively.

To further investigate the hardest yet rarely explored sce-

nario, i.e., the pre-training phase, we conduct comprehensive

ablation studies on the properties of triggers, i.e., pattern, lo-

cation, number, size, and poisoning rate. We ﬁnd that trig-

ger pattern and trigger number are key components that af-

fect attack performance on MIM, which is different from a

previous study on contrastive learning [21]. We utilize the

white trigger and publicly released triggers of Hidden Trig-

ger Backdoor Attacks (HTBA) to evaluate the effects of trig-

ger pattern [20]. We observe that the white triggers only get

7.19% ASR on STL10, while the ASRs of trigger HTBA-10,

HTBA-12, and HTBA-14 are 97.74%, 98.05%, 62.74%.

Our fourth contribution is the empirical study of the de-

fense mechanisms. Concretely, we investigate the detection

performance from three detection-level on all model sup-

ply chain phases. Our evaluation shows that both model-

level [27] and input-level [12] defenses can detect backdoor

arXiv:2210.01632v1 [cs.CR] 4 Oct 2022

attacks in the downstream phase while dataset-level [26] de-

fense works well in recognizing poisoned samples in the pre-

training dataset. To our surprise, backdoor attacks in the re-

lease phase, called Type II attack in our paper, cannot be de-

tected by all three detection-level methods, which prompts

the call for more effective defenses in future research.

2 Preliminary

2.1 Masked Image Modeling (MIM)

The core idea of MIM is masking random parts of the image

and then learning to reconstruct the missing parts. It follows

the autoencoder design with the transformer architecture as

the building blocks to perform the task. The input image

is ﬁrst cropped to patches, e.g., 16 ×16 patches, and MIM

randomly masks certain portions of patches. The encoder

then maps the unmasked patches to a latent representation

and uses the decoder to predict properties of masked patches

from the latent representation. The predicted property can

be the original pixels [15], latent representation [29], or vi-

sual tokens [6,8]. The objective of MIM is to minimize the

difference between predicted properties and real properties

of masked patches. Generally speaking, MIM can be con-

cluded into two categories, tokenizer-based methods [8] and

end-to-end methods [35].

Tokenizer-Based MIM. Inspired by the success of

masked language modeling, tokenizer-based MIM mimics

BERT [10] to reconstruct visual tokens. It includes two steps:

utilizes an image tokenizer to generate tokens of masked

patches and then optimizes the loss by predicting the correct

tokens via visual patches.

End-to-End MIM. As the name implies, end-to-end MIM

is a one-stage method without the pre-trained tokenizer. The

method is straightforward and effective. By directly predict-

ing large portions of masked patches with the help of small

portions of unmasked patches, it can achieve impressive per-

formance.

2.2 Supply Chain of Self-Supervised Models

As Figure 1 displays, the supply chain of self-supervised

models can be generally summarized into three phases. The

ﬁrst phase is the pre-training phase, where the model owner

utilizes images collected by the data donor to train the self-

supervised model. The second phase is the release phase

where the model owner makes the trained model available

online via public platforms such as ModelZoo 1and Hug-

ging Face 2. The third phase is the downstream phase. In this

phase, the downstream model owner adopts the pre-trained

encoder as the backbone and ﬁne-tunes an extra classiﬁca-

tion layer, i.e., MLP layer, to perform the downstream tasks.

The new model (containing an encoder and a classiﬁer) is

called downstream model.

1https://modelzoo.co/

2https://huggingface.co/

Enc Dec

Release Phase Downstream Phase

Cat

Dog

Truck

…

Airplane

Pre-training Phase

Enc Enc

Figure 1: Supply chain of self-supervised models.

2.3 Backdoor Attacks

In general, backdoor attacks inject hidden backdoors into

machine learning models so that the infected models perform

well on clean images but misclassify images with a speciﬁc

trigger into a target class. As an emerging and rapidly grow-

ing research area, various backdoor attacks have been pro-

posed [7,14,16,18,20,21,28] and can be broadly summarized

into two categories, i.e., poisoning-based and non-poisoning-

based backdoor attacks [17].

Poisoning-based Backdoor Attack. Give a training set

(X,Y)∈Dtrain, we ﬁrst denote a target model as f:X→Y

where X⊂Rdis a set of data samples and Y={1,2, ..., K}

is a set of labels. Given a sample xwith its label y, we as-

sume the adversary has a target label e

yand a trigger patch

t. The attacker constructs a poisoned pair (e

y)by replacing

the label yto e

yand pasting the trigger ton the image xto

get the patched image e

x. Then, the attacker injects a portion

pof poisoned pair (e

y)into Dtrain ( 0 <p<1). Since the

victim is not aware that the training set has been modiﬁed,

the backdoor would be successfully embedded in the model

after the training process.

Non-poisoning-based Backdoor Attacks. Different from

poisoning-based backdoor attacks, non-poisoning-based

backdoor attacks [16,19] directly modify model parameters

to inject backdoors without poisoning the training set. Given

a clean model f, the attacker aims to optimize it to a back-

doored model f0. Concretely, the attacker collects a shadow

dataset Dshadow poisoned with trigger tand adopt a reference

image rfrom the target class e

y. The optimization problem

aims to minimize the distance between Dshadow and r.

3 Attack Taxonomy and Methodology

As we are the ﬁrst to investigate backdoor attacks on masked

image modeling, we begin by deﬁning our adversary’s goal

with a uniﬁed attack taxonomy covering all phases in the

model supply chain. Note, the attack taxonomy can also be

generally extended to self-supervised models.

Adversary’s Goal. Following previous work [14,16], we as-

sume the adversary aims to backdoor the downstream model

so that the model performs well on clean images but mis-

classiﬁes images with a speciﬁc trigger into a target class. To

achieve this goal, the adversary can perform backdoor attacks

from different phases in MIM model’s supply chain.

Attack Taxonomy and Adversary’s Capability. Different

from previous work, we are the ﬁrst to systematically threat

modeling on MIM in every phase of model supply chain, i.e.,

pre-training, release, and downstream phases. Table 1 shows

our proposed attack taxonomy and the attacker’s correspond-

ing capabilities. We name the backdoor attacks in each phase

Table 1: Attack Taxonomy. The attacks are increasingly harder in row order. : Applicable or Necessary; : Inapplicable or

Unnecessary; : Partially Applicable or Necessary.

Phase →Pre-training Release Downstream

Attack ↓, Capability →Pre-training set Model Downstream set Downstream model Inference pipeline

Type I

Type II

Type III

as Type I, Type II, and Type III attacks, respectively, and

adopt three representative backdoor attacks [14,16,21] as

well as our proposed method to quantify the security risk of

each phase.

Type I attack is a poisoning-based backdoor attack that

happens at the downstream phase. We assume that the adver-

sary knows the downstream tasks and has capability to inject

a small number of labeled poisoned samples into the down-

stream training set. However, they have no knowledge of

pre-trained model and pre-trained dataset. Concretely, given

a downstream training set (X,Y)∈Ddown and downstream

classiﬁer F, Type I attack poisons pportion of samples with

trigger tin Ddown. The victim then uses the poisoned down-

stream dataset

Ddown to optimize the downstream model.

Type II attack is a non-poisoning-based backdoor attack

and takes place in the release phase. The attacker can be

either an untrusted service provider who injects a backdoor

into its pre-trained model or a malicious third-party who

downloads the released pre-trained model, injects a back-

door into it, and then re-publishes it online [16]. In this sce-

nario, the attacker has full access to the pre-trained model

but has no knowledge of the pre-training dataset, down-

stream dataset, and downstream training schedule. Speciﬁ-

cally, given a clean MIM model M, we have ˆx=M(x) =

Dec(Enc(x)), where Enc is the encoder and Dec is the de-

coder. To train a downstream task, the decoder Dec will be

discard and the victim will build a new model Fso that ˆy=

F(x) = MLP(Enc(x)). The goal of attacker is to optimize

Enc to a poisoned

Enc so that e

F(x) = MLP(

Enc(e

x))

where e

yis the target class and e

xis a poisoned sample.

Type III attack is a poisoning-based backdoor attack. Sim-

ilar to Type I attack, the attackers have no knowledge of the

model hyperparameters and can only poison a small fraction

of the pre-training dataset. However, unlike Type I attack

where the attacker can directly change the label of poisoned

samples in the downstream dataset, the pre-training dataset

has no label. To address this issue, the attacker in Type III

attack only poisons samples from the target class by adding

triggers to them and expects the pre-trained model to recog-

nize the triggers as a part of the target class to establish an

inner connection between the trigger and the speciﬁc target

class. In reality, Type III attacker can be a malicious data

donor who releases poisoned images on the Internet. Once

the poisoned images are scraped by the model owner with-

out censoring, they can inject backdoors into the pre-trained

models.

4 Evaluation

4.1 Experimental Settings

Datasets. We utilize four datasets in our experiments. For

Type I and Type II attacks, we use publicly available Im-

ageNet pre-trained MIM models and use CIFAR10, CI-

FAR100, and STL10 as the datasets to perform the down-

stream tasks. For Type III attack, we use ImageNet20 to

pre-train MIM models and consider CIFAR10, STL10, and

ImageNet20 as the downstream datasets. All images are re-

sized to 224 ×224 to ﬁt the input requirement of the models,

which is also a common practice in related work [11,16].

Target Model. We consider two MIM architectures as the

target models, i.e., Masked Autoencoder (MAE) [15] for

end-to-end MIM and Contextual Autoencoder (CAE) [8] for

tokenizer-based MIM. For both the two target models, we

adopt the same base variant of ViT (ViT-B) with 224 × 224

input image size and 16 × 16 patch size.

Concretely, for Type I and Type II attacks, as the adver-

sary does not involve in the pre-training phase, we utilize the

public MAE 3and CAE 4as our target model. This aligns

with the threat model that attackers can only get access to

the released models. For Type III attack, we use ImageNet

dataset to train the two target models from scratch. Note, the

models contain around 89M and 149M parameters, which

costs huge time and computing resources to train it on the

complete ImageNet dataset from scratch. Therefore, we in-

stead use a subset of ImageNet to perform a quick evaluation

in the pre-training phase. The subset contains 20 randomly-

extract labels (see Table 10 in Appendix). This is also a com-

mon way to do the evaluation [21,24]. Note that in Type III

attack, we replace the CIFAR100 with ImageNet20 as the

downstream dataset as the pre-training dataset ImageNet20

does not cover all classes on CIFAR100, which yields less

satisfying clean accuracy. Also, previous work [8,15] lever-

ages the pre-training dataset as the downstream dataset as

well.

Metric. We consider four evaluation metrics. Test accu-

racy (TA)/clean accuracy (CA) measures the classiﬁcation

accuracy of the backdoored/clean model on clean testing im-

ages. Attack success rate (ASR)/attack success rate-baseline

(ASR-B) denotes the classiﬁcation accuracy of the back-

doored/clean model on poisoned testing images with trig-

gers.

We refer the readers to Section A.1 for detailed descrip-

tions of the datasets, triggers, and conﬁgurations of pre-

training tasks, downstream tasks, backdoor attacks, and de-

3https://github.com/facebookresearch/mae

4https://github.com/lxtGH/CAE

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BackdoorAttacksintheSupplyChainofMaskedImageModelingXinyueShen1XinleiHe1ZhengLi1YunShen2MichaelBackes1YangZhang11CISPAHelmholtzCenterforInformationSecurity2NetAppAbstractMaskedimagemodeling(MIM)revolutionizesself-supervisedlearning(SSL)forimagepre-training.Incontrasttopreviousdominatingself-supervis...

展开>> 收起<<

Backdoor Attacks in the Supply Chain of Masked Image Modeling Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2NetApp.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Backdoor Attacks in the Supply Chain of Masked Image Modeling Xinyue Shen1Xinlei He1Zheng Li1Yun Shen2Michael Backes1Yang Zhang1 1CISPA Helmholtz Center for Information Security2NetApp

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: