Extracted BERT Model Leaks More Information than You Think Xuanli He1 Chen Chen2 Lingjuan Lyu3y Qiongkai Xu4 1University College London2Zhejiang University3Sony AI4The University of Melbourne

2025-05-06 0 0 371.44KB 8 页 10玖币

侵权投诉

Extracted BERT Model Leaks More Information than You Think!

Xuanli He1∗

, Chen Chen2∗, Lingjuan Lyu3†

, Qiongkai Xu4

1University College London, 2Zhejiang University, 3Sony AI, 4The University of Melbourne

h.xuanli@ucl.ac.uk, cc33@zju.edu.cn

Lingjuan.Lv@sony.com, Qiongkai.Xu@unimelb.edu.au

Abstract

The collection and availability of big data,

combined with advances in pre-trained models

(e.g. BERT), have revolutionized the predic-

tive performance of natural language process-

ing tasks. This allows corporations to provide

machine learning as a service (MLaaS) by en-

capsulating ﬁne-tuned BERT-based models as

APIs. Due to signiﬁcant commercial interest,

there has been a surge of attempts to steal re-

mote services via model extraction. Although

previous works have made progress in defend-

ing against model extraction attacks, there has

been little discussion on their performance in

preventing privacy leakage. This work bridges

this gap by launching an attribute inference at-

tack against the extracted BERT model. Our

extensive experiments reveal that model ex-

traction can cause severe privacy leakage even

when victim models are facilitated with ad-

vanced defensive strategies.

1 Introduction

The emergence of pre-trained language models

(PLMs) has revolutionized the natural language pro-

cessing (NLP) research, leading to state-of-the-art

(SOTA) performance on a wide range of tasks (De-

vlin et al.,2018;Yang et al.,2019). This break-

through has enabled commercial companies to de-

ploy machine learning models as black-box APIs

on their cloud platforms to serve millions of users,

such as Google Prediction API

, Microsoft Azure

Machine Learning

, and Amazon Machine Learn-

ing3.

However, recent works have shown that existing

NLP APIs are vulnerable to model extraction attack

(MEA), which can reconstruct a copy of the remote

∗

Equal contribution. Most of the work was ﬁnished when

X.H. was at Monash University. Work done during C.C.’s

internship at Sony AI.

†Corresponding author.

1https://cloud.google.com/prediction

2https://studio.azureml.net

3https://aws.amazon.com/machine-learning

NLP model based on the carefully-designed queries

and outputs of the target API (Krishna et al.,2019;

Wallace et al.,2020), causing the ﬁnancial losses

of the target API. Prior to our work, researchers

have investigated the hazards of model extraction

under various settings, including stealing commer-

cial APIs (Wallace et al.,2020;Xu et al.,2022),

ensemble model extraction (Xu et al.,2022), and

adversarial examples transfer (Wallace et al.,2020;

He et al.,2021).

Previous works have indicated that an adver-

sary can leverage the extracted model to conduct

adversarial example transfer, such that these ex-

amples can corrupt the predictions of the victim

model (Wallace et al.,2020;He et al.,2021). Given

the success of MEA and adversarial example trans-

fer, we conjecture that the predictions from a vic-

tim model could reveal its private information un-

consciously, as victim models can memorize side

information in addition to the task-related mes-

sage (Lyu and Chen,2020;Lyu et al.,2020;Carlini

et al.,2021). Thus, we are interested in examining

whether the victim model can leak the private in-

formation of its data to the extracted model, which

has received little attention in previous research. In

addition, a list of defenses against MEA has been

devised (Lee et al.,2019;Ma et al.,2021;Xu et al.,

2022;He et al.,2022a,b). Although these technolo-

gies can alleviate the effects of MEA, it is unknown

whether such defenses can prevent private informa-

tion leakage, e.g., gender, age, identity.

To study the privacy leakage from MEA, we

ﬁrst leverage MEA to obtain a white-box extracted

model. Then, we demonstrate that from the ex-

tracted model, it is possible to infer sensitive at-

tributes of the data used by the victim model. To

the best of our knowledge, this is the ﬁrst attempt

that investigates privacy leakage from the extracted

model. Moreover, we demonstrate that the pri-

vacy leakage is resilient to advanced defense strate-

gies even though the task utility of the extracted

arXiv:2210.11735v2 [cs.CR] 31 Oct 2022

model is signiﬁcantly diminished, which could mo-

tivate further investigation on defense technology

in MEA.4

2 Related Work

MEA aims to steal an intellectual model from

cloud services (Tramèr et al.,2016;Orekondy et al.,

2019;Krishna et al.,2019;Wallace et al.,2020).

It has been studied both empirically and theoreti-

cally, on simple classiﬁcation tasks (Tramèr et al.,

2016), vision tasks (Orekondy et al.,2019), and

NLP tasks (Krishna et al.,2019;Wallace et al.,

2020). MEA targets at imitating the functionality

of a black-box victim model (Krishna et al.,2019;

Orekondy et al.,2019), i.e., a model replicating the

performance of the victim model.

Furthermore, the extracted model could be used

as a reconnaissance step to facilitate later at-

tacks (Krishna et al.,2019). For instance, the ad-

versary could construct transferrable adversarial

examples over the extracted model to corrupt the

predictions of the victim model (Wallace et al.,

2020;He et al.,2021). Prior works (Coavoux et al.,

2018;Lyu et al.,2020) have shown malicious users

can infer conﬁdential attributes based on the inter-

action with a trained model. However, to the best

of our knowledge, none of the previous works in-

vestigate whether the extracted model can facilitate

privacy leakage of the data used by the black-box

victim model.

In conjunction with MEA, a list of avenues has

been proposed to defend against MEA. These ap-

proaches focus on the perturbation of the posterior

prediction. Orekondy et al. (2019) suggested re-

vealing the top-K posterior probabilities only. Lee

et al. (2019) demonstrated that API owners could

increase the difﬁculty of MEA by softening the pos-

terior probabilities and imposing a random noise

on the non-argmax probabilities. Ma et al. (2021)

introduced an adversarial training process to dis-

courage the knowledge distillation from the victim

model to the extracted model. However, these ap-

proaches are speciﬁc to model extraction, which are

not effective to defend against attribute inference

attack, as shown in Section 5.

3 Attacking BERT-based API

We ﬁrst describe the process of MEA. Then we de-

tail the proposed attack: attribute inference attack

4Code and data are available at: https://github.com/xlhex/

emnlp2022_aia.git

Figure 1: The workﬂow of attribute inference attack

against an extracted BERT model. We use an auxiliary

attribute inference model to infer the demographic in-

formation of a text.

(AIA). Throughout this paper, we mainly focus on

the BERT-based API as the victim model, which is

widely used in commercial black-box APIs.

Model Extraction Attack (MEA).

To conduct

MEA, attackers craft a set of inputs as queries

(transfer set), and send them to the target victim

model (BERT-based API) to obtain the predicted

posterior probability, i.e., the outputs of the soft-

max layer. Then attackers can reconstruct a copy

of the victim model as an “extracted model” by

training on query-prediction pairs.

Attribute Inference Attack (AIA).

After we de-

rive an extracted model, we now investigate how

to infer sensitive information from the extracted

model by conducting AIA against the extracted

model. Given any record

x= [xns, xs]

, AIA aims

to reconstruct the sensitive components

, based

on the hidden representation of

xns

, where

xns

and

represent the non-sensitive information and the

target sensitive attribute respectively. The intuition

behind AIA is that the representation generated by

the extracted model can be used to facilitate the

inference of the sensitive information of the data

used by the victim model (Coavoux et al.,2018).

Note that the

only

explicit information that is ac-

cessible to the attacker is the predictions output

by the victim model, rather than the raw BERT

representations.

Given an extracted model

, we ﬁrst feed a

limited amount of the auxiliary data

Daux

with

labelled attribute into

to collect the BERT rep-

resentation

h(xns

for each

xi∈Daux

. Then,

we train an inference model

f(·)

, which takes the

BERT representation of the extracted model as in-

put and outputs the sensitive attribute of the input,

i.e.,

{h(xns

i), xs

. The trained inference model

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ExtractedBERTModelLeaksMoreInformationthanYouThink!XuanliHe1,ChenChen2,LingjuanLyu3y,QiongkaiXu41UniversityCollegeLondon,2ZhejiangUniversity,3SonyAI,4TheUniversityofMelbourneh.xuanli@ucl.ac.uk,cc33@zju.edu.cnLingjuan.Lv@sony.com,Qiongkai.Xu@unimelb.edu.auAbstractThecollectionandavailabilityofbigda...

展开>> 收起<<

Extracted BERT Model Leaks More Information than You Think Xuanli He1 Chen Chen2 Lingjuan Lyu3y Qiongkai Xu4 1University College London2Zhejiang University3Sony AI4The University of Melbourne.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Extracted BERT Model Leaks More Information than You Think Xuanli He1 Chen Chen2 Lingjuan Lyu3y Qiongkai Xu4 1University College London2Zhejiang University3Sony AI4The University of Melbourne

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: