Are You Stealing My Model Sample Correlation for Fingerprinting Deep Neural Networks Jiyang Guan12 Jian Liang12 Ran He12

2025-04-30 1 0 463.25KB 14 页 10玖币

侵权投诉

Are You Stealing My Model? Sample Correlation for

Fingerprinting Deep Neural Networks

Jiyang Guan1,2, Jian Liang1,2, Ran He1,2∗

1NLPR &CRIPAC, Institute of Automation, Chinese Academy of Sciences, China

2School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences, China

guanjiyang2020@ia.ac.cn, liangjian92@gmail.com, rhe@nlpr.ia.ac.cn

Abstract

An off-the-shelf model as a commercial service could be stolen by model stealing

attacks, posing great threats to the rights of the model owner. Model ﬁngerprinting

aims to verify whether a suspect model is stolen from the victim model, which

gains more and more attention nowadays. Previous methods always leverage

the transferable adversarial examples as the model ﬁngerprint, which is sensitive

to adversarial defense or transfer learning scenarios. To address this issue, we

consider the pairwise relationship between samples instead and propose a novel

yet simple model stealing detection method based on SAmple Correlation (SAC).

Speciﬁcally, we present SAC-w that selects wrongly classiﬁed normal samples as

model inputs and calculates the mean correlation among their model outputs. To

reduce the training time, we further develop SAC-m that selects CutMix Augmented

samples as model inputs, without the need for training the surrogate models or

generating adversarial examples. Extensive results validate that SAC successfully

defends against various model stealing attacks, even including adversarial training

or transfer learning, and detects the stolen models with the best performance in

terms of AUC across different datasets and model architectures. The codes are

available at https://github.com/guanjiyang/SAC.

1 Introduction

Over the past years, Deep Neural Networks (DNNs) have played an important role in many critical

ﬁelds, e.g., face recognition [

], medical diagnosis [

] and autonomous driving [

]. As a popular

option, the model owners always provide their models as a cloud service or client-sided software to

the clients. But, training a deep neural network is costly and involves expensive data collection and

large computation resource consumption, and thus, models trained for inference constitute valuable

intellectual property and should be protected [

]. However, model stealing attacks can steal the

valuable model with only API access to the model owner’s well-performed model (source model) [

causing serious threats to the model owner’s intellectual property (IP).

Model stealing attacks aim to illegally steal functionally equivalent copies of the source model with

white-box or even black-box access to the model. As for the white-box case, the attacker can access

all the inner parameters of the source model and evade the model owner’s detection by source model

modiﬁcation such as pruning [

] or ﬁne-tuning the source model. By contrast, model extraction

attack [

] as a more powerful attack, has been proposed with the black-box access to the model.

That is to say, the model extraction attack only requires model outputs rather than the inner parameters

to steal the function of the source model, and thus is more threatening.

∗Corresponding Author

Preprint. Under review.

arXiv:2210.15427v1 [cs.CR] 21 Oct 2022

As model stealing has raised considerable concerns about model ownership, an increasing number

of model IP protection methods have been proposed in the last few years. Generally, there are two

categories to validate and protect the source model’s IP, i.e., the watermarking methods [

–

]

and the ﬁngerprinting methods [

–

]. The watermarking methods use weight regularization

[

–

] or backdoor inserting [

] during model training and leave the speciﬁc watermark in

the model. However, they need to involve the model’s training procedure and sacriﬁce the model’s

performance on the main task. A typical example is EWE [

] that witnesses a

classiﬁcation

accuracy drop on CIFAR10. On the contrary, the ﬁngerprinting methods make use of adversarial

examples’ transferability and identify the stolen models by adversarial examples’ attack success rates

on the suspect model. Since they do not involve in the training procedure, model ﬁngerprinting does

not inﬂuence the model’s accuracy. Nevertheless, these adversarial example based ﬁngerprinting

methods are still sensitive to adversarial training [

]. They also take up a large amount of time

for the model owner to train the surrogate models (models that the model owner trains using model

extraction by themselves) and generate adversarial examples. In addition, with the changes in label

space on transfer-learning, the adversarial examples’ target label disappears, meaning that they also

can not identify model stealing attacks with transfer learning techniques [25].

As stated above, existing ﬁngerprinting methods, leveraging the suspect model’s output as a point-

wise indicator to detect the stolen models, are sensitive to adversarial training or transfer learning. To

address this problem, we focus on the pair-wise relationship between the outputs and develop a new

method called SAC. Intuitively, samples with similar outputs in the source model are more likely to

also have similar outputs in the stolen models. In particular, we employ the correlation difference

between the source model and the suspect model as the indicator to detect the stolen model. However,

calculating correlation using all the samples from the defender’s dataset will be inﬂuenced by the

common knowledge shared by most models trained for the same task, on which most models will

output the same label. To avoid it, we leverage the normal samples which are wrongly predicted by

both the source and the surrogate models as the model input and propose to ﬁngerprint using sample

correlation with wrongly predicted samples (SAC-w). Furthermore, to reduce the needs for a large

number of normal samples and save time consumption for the defender, we use CutMix Augmented

samples directly to calculate the correlation difference (SAC-m), which does not need to train the

surrogate models or generate adversarial examples. To verify the effectiveness of SAC-w and SAC-m,

we investigate 5 types of attacks (i.e., ﬁne-tuning, pruning, transfer learning, model extraction, and

adversarial training), and compare the performance against these attacks across different model

architectures and datasets.

Our main contributions are summarized as follows:

•

We introduce sample correlation into model IP protection, and propose to leverage the correlation

difference as the robust indicator to identify the model stealing attacks and provide a new insight

into model IP protection.

•

We introduce the wrongly-predicted normal samples and CutMix Augmented samples to replace ad-

versarial examples as the model inputs, providing two robust correlation-based model ﬁngerprinting

methods.

•

Extensive results verify that SAC is able to identify different model stealing attacks across different

architectures and datasets with

AUC = 1

in most cases, performing better than previous methods.

Besides, SAC-m only takes up 4.45 seconds on the CIFAR10 dataset, greatly lowering the model

owner’s computation burden.

2 Related Work

Model stealing attacks greatly threaten the rights of the model owner. In general, we can summarize

them in several categories as follows: (1) Fine-tuning [

]: The attacker updates the parameters of the

source model using the labeled training data with several epochs. (2) Pruning [

]: The attacker

prunes less signiﬁcant weights of the source model based on some indicators such as activation. (3)

Transfer learning [

]: The attacker transfers the source model to some similar tasks and makes use

of the source model’s knowledge. (4) Model extraction [

]: Because data labeling is costly and

time-consuming, and there is a large amount of unlabeled data on the Internet, the attacker can steal

the function of the source model using only the unlabeled same-distribution data. Different from the

above attacks which need access to the inner parameters of the model, the model extraction attack

Source model’s

correlation

Model B

Model A Correlation Distance Calculation

CutMix

CutMix Augmented Sample Generation

Wrongly Predicted Sample Generation

Input

samples

Source

model

Surrogate

model

Predict wrong

Similar

Different

Irrelevant model Stolen model

Model owner

flip

Figure 1: Correlation ﬁngerprinting framework. We ﬁrst generate CutMix Augmented samples or

misclassiﬁed samples as model inputs, represented by colored balls. Then we calculate the correlation

difference and any suspect model with a similar correlation will be recognized as a stolen model.

can steal the source model with only the source model’s output. (5) Adversarial training [

]: The

attacker can train models with both the normal examples and the adversarial examples, which help

evade most ﬁngerprinting detection.

Because of the threat of model stealing attacks, there are many model intellectual property (IP) protec-

tion methods having been proposed. Generally, there are two main categories to validate and protect

the model IP, the watermarking methods and the ﬁngerprinting methods. Watermarking methods

usually leverage weight regularization [

–

] to put secret watermark in the model parameters or

train models on triggered set to leave backdoor [

] in them. However, these methods can not

detect newer attacks such as model extraction which trains a stolen model from scratch [

]. There

are also some watermarking methods such as VEF [

] or EWE [

], which can survive during the

model extraction. However, VEF needs a white-box access to the suspect model, limiting the scope

of its application. Also, they all need to involve the training process, which sacriﬁces the model’s

accuracy [5, 18, 19], and in many critical domains, even 1% accuracy loss is intolerable [20].

Fingerprinting, on the contrary, utilizes the transferability of the adversarial examples and can

verify models’ ownership without participating in the models’ training process, guaranteeing no

accuracy loss. Lukas et al.

[7]

proposes conferrable adversarial examples to maximize adversarial

examples’ transferability to stolen models and minimize transferability to independently trained

models (irrelevant models). Besides, ModelDiff [

], FUAP [

], DFA [

] leverage different

kinds of adversarial examples such as DeepFool [

] and UAP [

], to ﬁngerprint the source model.

However, all these methods rely on adversarial examples and can be easily cleared out by adversarial

defense such as adversarial training [

] or transfer learning. Furthermore, these methods usually

need to train lots of surrogate models and irrelevant models with different model architectures to

form well-established ﬁngerprints, causing a great computation burden for the model owner. Besides,

DeepJudge [

] proposes a uniﬁed framework to use different indicators to detect model stealing

attacks in both the white-box and the black-box settings. Unlike previous methods, our method

makes use of the correlation between samples rather than just the instance level difference in previous

methods and can ﬁngerprint models much faster and more robust with data-augmented samples

instead of adversarial examples. Moreover, different from Teacher Model Fingerprinting [

], which

uses the paired samples generated from the matching of the representation layers to detect the transfer

learning attack, our method makes use of the correlation of independent samples and can detect more

categories of model stealing attacks.

3 Proposed Method

3.1 Problem Deﬁnition

There are two parties in the model IP protection, the defender and the attacker. The defender is the

model owner, who trains a well-performed model with a (proprietary) training dataset and algorithm

[

]. The defender can deploy their well-trained models as a cloud service or client-sided software

[

]. In cloud service setting, the attacker can only get the output of the model. On the contrary, in

client-sided software setting, the attacker can get access to all the inner parameters of the models. The

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AreYouStealingMyModel?SampleCorrelationforFingerprintingDeepNeuralNetworksJiyangGuan1;2,JianLiang1;2,RanHe1;21NLPR&CRIPAC,InstituteofAutomation,ChineseAcademyofSciences,China2SchoolofArticialIntelligence,UniversityofChineseAcademyofSciences,Chinaguanjiyang2020@ia.ac.cn,liangjian92@gmail.com,rhe@nl...

展开>> 收起<<

Are You Stealing My Model Sample Correlation for Fingerprinting Deep Neural Networks Jiyang Guan12 Jian Liang12 Ran He12.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Are You Stealing My Model Sample Correlation for Fingerprinting Deep Neural Networks Jiyang Guan12 Jian Liang12 Ran He12

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: