Are You Stealing My Model Sample Correlation for Fingerprinting Deep Neural Networks Jiyang Guan12 Jian Liang12 Ran He12

2025-04-30 0 0 463.25KB 14 页 10玖币
侵权投诉
Are You Stealing My Model? Sample Correlation for
Fingerprinting Deep Neural Networks
Jiyang Guan1,2, Jian Liang1,2, Ran He1,2
1NLPR &CRIPAC, Institute of Automation, Chinese Academy of Sciences, China
2School of Artificial Intelligence, University of Chinese Academy of Sciences, China
guanjiyang2020@ia.ac.cn, liangjian92@gmail.com, rhe@nlpr.ia.ac.cn
Abstract
An off-the-shelf model as a commercial service could be stolen by model stealing
attacks, posing great threats to the rights of the model owner. Model fingerprinting
aims to verify whether a suspect model is stolen from the victim model, which
gains more and more attention nowadays. Previous methods always leverage
the transferable adversarial examples as the model fingerprint, which is sensitive
to adversarial defense or transfer learning scenarios. To address this issue, we
consider the pairwise relationship between samples instead and propose a novel
yet simple model stealing detection method based on SAmple Correlation (SAC).
Specifically, we present SAC-w that selects wrongly classified normal samples as
model inputs and calculates the mean correlation among their model outputs. To
reduce the training time, we further develop SAC-m that selects CutMix Augmented
samples as model inputs, without the need for training the surrogate models or
generating adversarial examples. Extensive results validate that SAC successfully
defends against various model stealing attacks, even including adversarial training
or transfer learning, and detects the stolen models with the best performance in
terms of AUC across different datasets and model architectures. The codes are
available at https://github.com/guanjiyang/SAC.
1 Introduction
Over the past years, Deep Neural Networks (DNNs) have played an important role in many critical
fields, e.g., face recognition [
1
], medical diagnosis [
2
,
3
] and autonomous driving [
4
]. As a popular
option, the model owners always provide their models as a cloud service or client-sided software to
the clients. But, training a deep neural network is costly and involves expensive data collection and
large computation resource consumption, and thus, models trained for inference constitute valuable
intellectual property and should be protected [
5
,
6
]. However, model stealing attacks can steal the
valuable model with only API access to the model owner’s well-performed model (source model) [
7
],
causing serious threats to the model owner’s intellectual property (IP).
Model stealing attacks aim to illegally steal functionally equivalent copies of the source model with
white-box or even black-box access to the model. As for the white-box case, the attacker can access
all the inner parameters of the source model and evade the model owner’s detection by source model
modification such as pruning [
8
,
9
] or fine-tuning the source model. By contrast, model extraction
attack [
10
,
11
] as a more powerful attack, has been proposed with the black-box access to the model.
That is to say, the model extraction attack only requires model outputs rather than the inner parameters
to steal the function of the source model, and thus is more threatening.
Corresponding Author
Preprint. Under review.
arXiv:2210.15427v1 [cs.CR] 21 Oct 2022
As model stealing has raised considerable concerns about model ownership, an increasing number
of model IP protection methods have been proposed in the last few years. Generally, there are two
categories to validate and protect the source model’s IP, i.e., the watermarking methods [
5
,
12
19
]
and the fingerprinting methods [
7
,
20
23
]. The watermarking methods use weight regularization
[
12
14
] or backdoor inserting [
5
,
16
,
17
] during model training and leave the specific watermark in
the model. However, they need to involve the model’s training procedure and sacrifice the model’s
performance on the main task. A typical example is EWE [
5
] that witnesses a
4%
classification
accuracy drop on CIFAR10. On the contrary, the fingerprinting methods make use of adversarial
examples’ transferability and identify the stolen models by adversarial examples’ attack success rates
on the suspect model. Since they do not involve in the training procedure, model fingerprinting does
not influence the model’s accuracy. Nevertheless, these adversarial example based fingerprinting
methods are still sensitive to adversarial training [
24
]. They also take up a large amount of time
for the model owner to train the surrogate models (models that the model owner trains using model
extraction by themselves) and generate adversarial examples. In addition, with the changes in label
space on transfer-learning, the adversarial examples’ target label disappears, meaning that they also
can not identify model stealing attacks with transfer learning techniques [25].
As stated above, existing fingerprinting methods, leveraging the suspect model’s output as a point-
wise indicator to detect the stolen models, are sensitive to adversarial training or transfer learning. To
address this problem, we focus on the pair-wise relationship between the outputs and develop a new
method called SAC. Intuitively, samples with similar outputs in the source model are more likely to
also have similar outputs in the stolen models. In particular, we employ the correlation difference
between the source model and the suspect model as the indicator to detect the stolen model. However,
calculating correlation using all the samples from the defender’s dataset will be influenced by the
common knowledge shared by most models trained for the same task, on which most models will
output the same label. To avoid it, we leverage the normal samples which are wrongly predicted by
both the source and the surrogate models as the model input and propose to fingerprint using sample
correlation with wrongly predicted samples (SAC-w). Furthermore, to reduce the needs for a large
number of normal samples and save time consumption for the defender, we use CutMix Augmented
samples directly to calculate the correlation difference (SAC-m), which does not need to train the
surrogate models or generate adversarial examples. To verify the effectiveness of SAC-w and SAC-m,
we investigate 5 types of attacks (i.e., fine-tuning, pruning, transfer learning, model extraction, and
adversarial training), and compare the performance against these attacks across different model
architectures and datasets.
Our main contributions are summarized as follows:
We introduce sample correlation into model IP protection, and propose to leverage the correlation
difference as the robust indicator to identify the model stealing attacks and provide a new insight
into model IP protection.
We introduce the wrongly-predicted normal samples and CutMix Augmented samples to replace ad-
versarial examples as the model inputs, providing two robust correlation-based model fingerprinting
methods.
Extensive results verify that SAC is able to identify different model stealing attacks across different
architectures and datasets with
AUC = 1
in most cases, performing better than previous methods.
Besides, SAC-m only takes up 4.45 seconds on the CIFAR10 dataset, greatly lowering the model
owner’s computation burden.
2 Related Work
Model stealing attacks greatly threaten the rights of the model owner. In general, we can summarize
them in several categories as follows: (1) Fine-tuning [
26
]: The attacker updates the parameters of the
source model using the labeled training data with several epochs. (2) Pruning [
8
,
9
,
27
]: The attacker
prunes less significant weights of the source model based on some indicators such as activation. (3)
Transfer learning [
25
]: The attacker transfers the source model to some similar tasks and makes use
of the source model’s knowledge. (4) Model extraction [
10
,
11
]: Because data labeling is costly and
time-consuming, and there is a large amount of unlabeled data on the Internet, the attacker can steal
the function of the source model using only the unlabeled same-distribution data. Different from the
above attacks which need access to the inner parameters of the model, the model extraction attack
2
Source model’s
correlation
Model B
Model A Correlation Distance Calculation
CutMix
CutMix Augmented Sample Generation
Wrongly Predicted Sample Generation
Input
samples
Source
model
Surrogate
model
Predict wrong
Similar
Different
Irrelevant model Stolen model
Model owner
flip
Figure 1: Correlation fingerprinting framework. We first generate CutMix Augmented samples or
misclassified samples as model inputs, represented by colored balls. Then we calculate the correlation
difference and any suspect model with a similar correlation will be recognized as a stolen model.
can steal the source model with only the source model’s output. (5) Adversarial training [
24
]: The
attacker can train models with both the normal examples and the adversarial examples, which help
evade most fingerprinting detection.
Because of the threat of model stealing attacks, there are many model intellectual property (IP) protec-
tion methods having been proposed. Generally, there are two main categories to validate and protect
the model IP, the watermarking methods and the fingerprinting methods. Watermarking methods
usually leverage weight regularization [
12
15
] to put secret watermark in the model parameters or
train models on triggered set to leave backdoor [
16
,
17
] in them. However, these methods can not
detect newer attacks such as model extraction which trains a stolen model from scratch [
7
,
11
]. There
are also some watermarking methods such as VEF [
28
] or EWE [
5
], which can survive during the
model extraction. However, VEF needs a white-box access to the suspect model, limiting the scope
of its application. Also, they all need to involve the training process, which sacrifices the model’s
accuracy [5, 18, 19], and in many critical domains, even 1% accuracy loss is intolerable [20].
Fingerprinting, on the contrary, utilizes the transferability of the adversarial examples and can
verify models’ ownership without participating in the models’ training process, guaranteeing no
accuracy loss. Lukas et al.
[7]
proposes conferrable adversarial examples to maximize adversarial
examples’ transferability to stolen models and minimize transferability to independently trained
models (irrelevant models). Besides, ModelDiff [
21
], FUAP [
22
], DFA [
23
] leverage different
kinds of adversarial examples such as DeepFool [
29
] and UAP [
30
], to fingerprint the source model.
However, all these methods rely on adversarial examples and can be easily cleared out by adversarial
defense such as adversarial training [
24
] or transfer learning. Furthermore, these methods usually
need to train lots of surrogate models and irrelevant models with different model architectures to
form well-established fingerprints, causing a great computation burden for the model owner. Besides,
DeepJudge [
31
] proposes a unified framework to use different indicators to detect model stealing
attacks in both the white-box and the black-box settings. Unlike previous methods, our method
makes use of the correlation between samples rather than just the instance level difference in previous
methods and can fingerprint models much faster and more robust with data-augmented samples
instead of adversarial examples. Moreover, different from Teacher Model Fingerprinting [
32
], which
uses the paired samples generated from the matching of the representation layers to detect the transfer
learning attack, our method makes use of the correlation of independent samples and can detect more
categories of model stealing attacks.
3 Proposed Method
3.1 Problem Definition
There are two parties in the model IP protection, the defender and the attacker. The defender is the
model owner, who trains a well-performed model with a (proprietary) training dataset and algorithm
[
20
]. The defender can deploy their well-trained models as a cloud service or client-sided software
[
20
]. In cloud service setting, the attacker can only get the output of the model. On the contrary, in
client-sided software setting, the attacker can get access to all the inner parameters of the models. The
3
摘要:

AreYouStealingMyModel?SampleCorrelationforFingerprintingDeepNeuralNetworksJiyangGuan1;2,JianLiang1;2,RanHe1;21NLPR&CRIPAC,InstituteofAutomation,ChineseAcademyofSciences,China2SchoolofArticialIntelligence,UniversityofChineseAcademyofSciences,Chinaguanjiyang2020@ia.ac.cn,liangjian92@gmail.com,rhe@nl...

展开>> 收起<<
Are You Stealing My Model Sample Correlation for Fingerprinting Deep Neural Networks Jiyang Guan12 Jian Liang12 Ran He12.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:463.25KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注