Extracted BERT Model Leaks More Information than You Think Xuanli He1 Chen Chen2 Lingjuan Lyu3y Qiongkai Xu4 1University College London2Zhejiang University3Sony AI4The University of Melbourne

2025-05-06 0 0 371.44KB 8 页 10玖币
侵权投诉
Extracted BERT Model Leaks More Information than You Think!
Xuanli He1
, Chen Chen2, Lingjuan Lyu3
, Qiongkai Xu4
1University College London, 2Zhejiang University, 3Sony AI, 4The University of Melbourne
h.xuanli@ucl.ac.uk, cc33@zju.edu.cn
Lingjuan.Lv@sony.com, Qiongkai.Xu@unimelb.edu.au
Abstract
The collection and availability of big data,
combined with advances in pre-trained models
(e.g. BERT), have revolutionized the predic-
tive performance of natural language process-
ing tasks. This allows corporations to provide
machine learning as a service (MLaaS) by en-
capsulating fine-tuned BERT-based models as
APIs. Due to significant commercial interest,
there has been a surge of attempts to steal re-
mote services via model extraction. Although
previous works have made progress in defend-
ing against model extraction attacks, there has
been little discussion on their performance in
preventing privacy leakage. This work bridges
this gap by launching an attribute inference at-
tack against the extracted BERT model. Our
extensive experiments reveal that model ex-
traction can cause severe privacy leakage even
when victim models are facilitated with ad-
vanced defensive strategies.
1 Introduction
The emergence of pre-trained language models
(PLMs) has revolutionized the natural language pro-
cessing (NLP) research, leading to state-of-the-art
(SOTA) performance on a wide range of tasks (De-
vlin et al.,2018;Yang et al.,2019). This break-
through has enabled commercial companies to de-
ploy machine learning models as black-box APIs
on their cloud platforms to serve millions of users,
such as Google Prediction API
1
, Microsoft Azure
Machine Learning
2
, and Amazon Machine Learn-
ing3.
However, recent works have shown that existing
NLP APIs are vulnerable to model extraction attack
(MEA), which can reconstruct a copy of the remote
Equal contribution. Most of the work was finished when
X.H. was at Monash University. Work done during C.C.s
internship at Sony AI.
Corresponding author.
1https://cloud.google.com/prediction
2https://studio.azureml.net
3https://aws.amazon.com/machine-learning
NLP model based on the carefully-designed queries
and outputs of the target API (Krishna et al.,2019;
Wallace et al.,2020), causing the financial losses
of the target API. Prior to our work, researchers
have investigated the hazards of model extraction
under various settings, including stealing commer-
cial APIs (Wallace et al.,2020;Xu et al.,2022),
ensemble model extraction (Xu et al.,2022), and
adversarial examples transfer (Wallace et al.,2020;
He et al.,2021).
Previous works have indicated that an adver-
sary can leverage the extracted model to conduct
adversarial example transfer, such that these ex-
amples can corrupt the predictions of the victim
model (Wallace et al.,2020;He et al.,2021). Given
the success of MEA and adversarial example trans-
fer, we conjecture that the predictions from a vic-
tim model could reveal its private information un-
consciously, as victim models can memorize side
information in addition to the task-related mes-
sage (Lyu and Chen,2020;Lyu et al.,2020;Carlini
et al.,2021). Thus, we are interested in examining
whether the victim model can leak the private in-
formation of its data to the extracted model, which
has received little attention in previous research. In
addition, a list of defenses against MEA has been
devised (Lee et al.,2019;Ma et al.,2021;Xu et al.,
2022;He et al.,2022a,b). Although these technolo-
gies can alleviate the effects of MEA, it is unknown
whether such defenses can prevent private informa-
tion leakage, e.g., gender, age, identity.
To study the privacy leakage from MEA, we
first leverage MEA to obtain a white-box extracted
model. Then, we demonstrate that from the ex-
tracted model, it is possible to infer sensitive at-
tributes of the data used by the victim model. To
the best of our knowledge, this is the first attempt
that investigates privacy leakage from the extracted
model. Moreover, we demonstrate that the pri-
vacy leakage is resilient to advanced defense strate-
gies even though the task utility of the extracted
arXiv:2210.11735v2 [cs.CR] 31 Oct 2022
model is significantly diminished, which could mo-
tivate further investigation on defense technology
in MEA.4
2 Related Work
MEA aims to steal an intellectual model from
cloud services (Tramèr et al.,2016;Orekondy et al.,
2019;Krishna et al.,2019;Wallace et al.,2020).
It has been studied both empirically and theoreti-
cally, on simple classification tasks (Tramèr et al.,
2016), vision tasks (Orekondy et al.,2019), and
NLP tasks (Krishna et al.,2019;Wallace et al.,
2020). MEA targets at imitating the functionality
of a black-box victim model (Krishna et al.,2019;
Orekondy et al.,2019), i.e., a model replicating the
performance of the victim model.
Furthermore, the extracted model could be used
as a reconnaissance step to facilitate later at-
tacks (Krishna et al.,2019). For instance, the ad-
versary could construct transferrable adversarial
examples over the extracted model to corrupt the
predictions of the victim model (Wallace et al.,
2020;He et al.,2021). Prior works (Coavoux et al.,
2018;Lyu et al.,2020) have shown malicious users
can infer confidential attributes based on the inter-
action with a trained model. However, to the best
of our knowledge, none of the previous works in-
vestigate whether the extracted model can facilitate
privacy leakage of the data used by the black-box
victim model.
In conjunction with MEA, a list of avenues has
been proposed to defend against MEA. These ap-
proaches focus on the perturbation of the posterior
prediction. Orekondy et al. (2019) suggested re-
vealing the top-K posterior probabilities only. Lee
et al. (2019) demonstrated that API owners could
increase the difficulty of MEA by softening the pos-
terior probabilities and imposing a random noise
on the non-argmax probabilities. Ma et al. (2021)
introduced an adversarial training process to dis-
courage the knowledge distillation from the victim
model to the extracted model. However, these ap-
proaches are specific to model extraction, which are
not effective to defend against attribute inference
attack, as shown in Section 5.
3 Attacking BERT-based API
We first describe the process of MEA. Then we de-
tail the proposed attack: attribute inference attack
4Code and data are available at: https://github.com/xlhex/
emnlp2022_aia.git
Figure 1: The workflow of attribute inference attack
against an extracted BERT model. We use an auxiliary
attribute inference model to infer the demographic in-
formation of a text.
(AIA). Throughout this paper, we mainly focus on
the BERT-based API as the victim model, which is
widely used in commercial black-box APIs.
Model Extraction Attack (MEA).
To conduct
MEA, attackers craft a set of inputs as queries
(transfer set), and send them to the target victim
model (BERT-based API) to obtain the predicted
posterior probability, i.e., the outputs of the soft-
max layer. Then attackers can reconstruct a copy
of the victim model as an “extracted model” by
training on query-prediction pairs.
Attribute Inference Attack (AIA).
After we de-
rive an extracted model, we now investigate how
to infer sensitive information from the extracted
model by conducting AIA against the extracted
model. Given any record
x= [xns, xs]
, AIA aims
to reconstruct the sensitive components
xs
, based
on the hidden representation of
xns
, where
xns
and
xs
represent the non-sensitive information and the
target sensitive attribute respectively. The intuition
behind AIA is that the representation generated by
the extracted model can be used to facilitate the
inference of the sensitive information of the data
used by the victim model (Coavoux et al.,2018).
Note that the
only
explicit information that is ac-
cessible to the attacker is the predictions output
by the victim model, rather than the raw BERT
representations.
Given an extracted model
g0
V
, we first feed a
limited amount of the auxiliary data
Daux
with
labelled attribute into
g0
V
to collect the BERT rep-
resentation
h(xns
i)
for each
xiDaux
. Then,
we train an inference model
f(·)
, which takes the
BERT representation of the extracted model as in-
put and outputs the sensitive attribute of the input,
i.e.,
{h(xns
i), xs
i}
. The trained inference model
摘要:

ExtractedBERTModelLeaksMoreInformationthanYouThink!XuanliHe1,ChenChen2,LingjuanLyu3y,QiongkaiXu41UniversityCollegeLondon,2ZhejiangUniversity,3SonyAI,4TheUniversityofMelbourneh.xuanli@ucl.ac.uk,cc33@zju.edu.cnLingjuan.Lv@sony.com,Qiongkai.Xu@unimelb.edu.auAbstractThecollectionandavailabilityofbigda...

展开>> 收起<<
Extracted BERT Model Leaks More Information than You Think Xuanli He1 Chen Chen2 Lingjuan Lyu3y Qiongkai Xu4 1University College London2Zhejiang University3Sony AI4The University of Melbourne.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:8 页 大小:371.44KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注