Unsupervised Non-transferable Text Classiﬁcation Guangtao Zeng and Wei Lu StatNLP Research Group

2025-05-06 1 0 1.16MB 14 页 10玖币

侵权投诉

Unsupervised Non-transferable Text Classiﬁcation

Guangtao Zeng and Wei Lu

StatNLP Research Group

Singapore University of Technology and Design

guangtao_zeng@mymail.sutd.edu.sg, luwei@sutd.edu.sg

Abstract

Training a good deep learning model requires

substantial data and computing resources,

which makes the resulting neural model a valu-

able intellectual property. To prevent the neu-

ral network from being undesirably exploited,

non-transferable learning has been proposed

to reduce the model generalization ability in

speciﬁc target domains. However, existing ap-

proaches require labeled data for the target do-

main which can be difﬁcult to obtain. Fur-

thermore, they do not have the mechanism to

still recover the model’s ability to access the

target domain. In this paper, we propose a

novel unsupervised non-transferable learning

method for the text classiﬁcation task that does

not require annotated target domain data. We

further introduce a secret key component in our

approach for recovering access to the target do-

main, where we design both an explicit and an

implicit method for doing so. Extensive exper-

iments demonstrate the effectiveness of our ap-

proach.

1 Introduction

Deep learning has achieved remarkable success

over the past decade and is active in various ﬁelds,

including computer vision, natural language pro-

cessing (NLP), and data mining. Although neural

models can perform well in most tasks, they re-

quire a huge amount of data and a high computation

cost to train, making the trained model a valuable

intellectual property. As a result, it is essential

for us to prevent neural models from being used

without authorization. In the last few years, many

methods have been proposed to safeguard deep

neural networks and they can be roughly divided

into two types: watermarking (Adi et al.,2018),

and secure authorization (Alam et al.,2020). In

the watermarking approaches, the owners can ver-

ify the ownership of the neural model based on

a unique watermark. However, due to the catas-

trophic forgetting problem (Kemker et al.,2018),









 









 



Figure 1: Overview of our unsupervised non-

transferable learning method with secret keys.

the watermark-based neural models (Kuribayashi

et al.,2020;Song et al.,2017) are known to be

vulnerable to certain malicious attacks (Wang and

Kerschbaum,2019), which may lead to the loss

of their watermarks. On the other hand, in the se-

cure authorization approaches, the owners of the

neural network want to ensure that users can only

access the model with authorization. Recently,

Wang et al. (2022) proposed a new perspective

with non-transferable learning (NTL) to protect the

model from illegal access to unauthorized data. The

method trains the model to have good performance

only in the authorized domain while performing

badly in the unauthorized domain. However, such

an approach has some limitations: 1) their work

relies on a signiﬁcant amount of labeled data from

the target domain, while such labels are usually not

easy to acquire, 2) the access to the unauthorized

domain can no longer be regained, if required, after

the model is learned.

In this work, we propose our new NTL method

named

nsupervised

on-

ransferable

earning

(UNTL) for the text classiﬁcation tasks. As Fig-

ure 1shows, our model can perform well in the

source domain while performing badly in the tar-

get domain. In addition, we propose secret key

modules, which can help recover the ability of the

model in the target domain. Our contributions in-

clude:

•

We propose a novel unsupervised non-

transferable learning approach for text clas-

arXiv:2210.12651v2 [cs.CL] 19 Feb 2023

siﬁcation tasks. Different from existing ap-

proaches, our model can still perform well

without the need for the label information in

the target domain.

•

We introduce two different methods, namely

Prompt-based Secret Key and Adapter-based

Secret Key, that allow us to recover the ability

of the model to perform classiﬁcation on the

target domain.

•

Extensive experiments show that our proposed

models can perform well in the source domain

but badly in the target domain. Moreover, ac-

cess to the target domain can still be regained

using the secret key.

To the best of our knowledge, our work is the

ﬁrst approach for learning under the unsupervised

non-transferable learning setup, which also comes

with the ability to recover access to the target do-

main.1

2 Related Work

In this section, we brieﬂy survey ideas that are

related to our work from two ﬁelds: domain adap-

tation and intellectual property protection. Further-

more, we discuss some limitations in the existing

methods which we will tackle with our approach.

In domain adaptation, given a source domain

and a target domain with unlabeled data or a few la-

beled data, the goal is to improve the performance

in the target task using the knowledge from the

source domain. Ghifary et al. (2014), Tzeng et al.

(2014), and Zhu et al. (2021) applied a Maximum

Mean Discrepancy regularization method (Gretton

et al.,2012) to maximize the invariance informa-

tion between different domains. Ganin et al. (2016)

and Schoenauer-Sebag et al. (2019) tried to match

the feature space distributions of the two domains

with adversarial learning. In contrast to the meth-

ods above, Wang et al. (2022) analyzed domain

adaptation in a different way and proposed non-

transferable learning (NTL) to prevent the knowl-

edge transfer from the source to the target domain

by enlarging the discrepancy between the represen-

tations in different domains.

In intellectual property protection, due to the

signiﬁcant value and its vulnerability against ma-

licious attacks of learned deep neural networks,

Our code and data are released at

https://github.com/

ChaosCodes/UNTL.

it is crucial to propose intellectual property pro-

tection methods to defend the owners of the deep

neural networks (DNNs) from any loss. Recently,

two different approaches to safeguard DNNs have

been proposed: watermarking (Adi et al.,2018)

and secure authorization (Alam et al.,2020). In the

watermarking approaches, researchers designed a

digital watermark that can be embedded into data

such as video, images, and so on. With the de-

tection of the unique watermark, we could verify

the ownership of the copyright of the data. Based

on these ideas, Song et al. (2017) and Kuribayashi

et al. (2020) embedded the digital watermarks into

the parameters of the neural networks. Zhang et al.

(2020) and Wu et al. (2021) proposed a framework

to generate images with an invisible but extractable

watermark. However, they are vulnerable to some

active attack algorithms (Wang and Kerschbaum,

2019;Chen et al.,2021) which ﬁrst detect the wa-

termark and then rewrite or remove it. On the other

hand, the secure authorization approach seeks to

train a model that generates inaccurate results with-

out authorization. Alam et al. (2020) proposed a

key-based framework that ensures correct model

functioning only with the correct secret key. In ad-

dition, Wang et al. (2022) were inspired by domain

generalization and proposed non-transferable learn-

ing (NTL), which achieves secure authorization by

reducing the model’s generalization ability in the

speciﬁed unauthorized domain.

Although the NTL model can effectively prevent

access to the unauthorized domain, it requires tar-

get labels during training, which may not always

be easy to obtain. Furthermore, there is no mecha-

nism to recover access to the unauthorized domain

when needed. In this paper, we present a new NTL

model and show that our model can still have good

performance even in the absence of the target labels

which are, however, indispensable in the work of

Wang et al. (2022). Besides, we extend it to a secret

key-based version. With our method, authorized

users can still access the target domain with the

provided keys.

3 Approach

In this section, we ﬁrst introduce our proposed

Unsupervised Non-Transferable Learning (UNTL)

approach in Sec. 3.1, followed by a discussion

on its practical limitation – it lacks the ability to

regain the access to the target domain. Next, we

discuss our secret key-based methods in Sec. 3.2

to address this limitation.

3.1 UNTL Text Classiﬁcation

Problem Description

First of all, we present

our deﬁnition of the unsupervised non-transferable

learning task without labeled data from the target

domain. Following Farahani et al. (2020), we con-

sider that a domain consists of three parts: input

space

, label space

, and the joint probability

distribution

p(X, Y )

. Given a source domain

Ds=

{xi,yi}Ns

i=1

and a target domain

Dt={xj}Nt

j=1

with unlabeled samples, where

yi∈RC

is a one-

hot vector indicating the label of

is the num-

ber of classes, and

Ns, Nt

refer to the number of

examples in the source and target domain respec-

tively. The goal of our UNTL method is to prevent

the knowledge transfer from the source to the target

domain, i.e., to train the model so that it performs

well on the source domain data but poorly on the

target domain data, without the requirement of ac-

cessing the label information of the target data.

Text Classiﬁcation

In our work, we use a BERT-

based model (Devlin et al.,2019)

as our feature

extractor for the input sentence and consider the

ﬁnal hidden state

of the token

[CLS]

as the fea-

ture representation, where we denote

ψ(x)

A simple feed-forward network

FFN (·)

will be

added on top of BERT as a classiﬁer to predict the

label. The formal loss function can be:

LCE =E(x,y)∼Ds[CE (FFN (ψ(x)) ,y)] (1)

where

is the source domain dataset and

indicates the cross entropy function.

Maximum Mean Discrepancy

To enlarge the

distance between the representations of the source

and target domains, we follow Wang et al. (2022)

and use Maximum Mean Discrepancy (Gretton

et al.,2012) (MMD) to achieve this goal. MMD is

a kernel two-sample test and can be used as a met-

ric to determine whether two data distributions

and

are similar. MMD deﬁnes the metric function

as follows:

dp,q =||Ex∼p[ψ(x)] −Ex0∼q[ψx0]||2

Hk(2)

where

is the reproducing kernel Hilbert space

(RKHS) with a kernel

, whose operation is

k(z,z0) = e−||z−z0||2

and function

maps the

sentence input into RKHS. The smaller the distance

dp,q

, the more similar the two distributions

and

In our work, we use MMD to increase the dis-

tance between the feature representations of the

source and the target domain, forcing the feature

extractor

to extract domain-dependent represen-

tations rather than maximizing the inter-domain

invariance. To prevent the high MMD from domi-

nating the entire loss, we follow Wang et al. (2022)

and set an upper bound for it. Therefore, based on

Equation 2, our MMD loss can be formulated as:

LMMD (S,T) = −min (c, dS,T)(3)

where

is the upper bound for MMD, and

S,T

are

data distributions of the source and target domains

respectively. With this loss, we only maximize

dS,Twhen it is smaller than the upper bound c.

Domain Classiﬁer

Despite being able to enlarge

the gap between the source and target domains to

some extent, the MMD loss lacks the explicit ability

to clearly draw the boundary between the represen-

tations of different domains, especially when the

knowledge between domains is similar. Therefore,

we hypothesize that using MMD alone may not be

sufﬁcient to yield optimal empirical performance.

To mitigate this issue, we draw inspiration from

the Domain-Adversarial Neural Networks (Ganin

et al.,2016) and propose an additional domain clas-

siﬁer added on top of the feature extractor. This

classiﬁer is trained to predict the domain with the

feature representations. We employ a cross-entropy

loss to train the domain classiﬁer. By optimizing

this loss, the representations of different domains

are encouraged to be more distinct. Speciﬁcally,

we use 0 to indicate the source domain and 1 to

indicate the target domain. We can formulate the

domain classiﬁcation (DC) loss as:

LDC (S,T) = ExS∼S [CE FFNdc ψxS ,0]

+ExT∼T [CE(FFNdc ψxT ,1)]

(4)

where

FFNdc

is the domain classiﬁer. With this

DC loss as a regularization term, the boundary

of feature representation between the source and

the target can be clearer, facilitating better non-

transferable learning.

Objective Function

In this task, our goal is to

train a model that can perform well on the source

domain while performing badly on the target do-

main. To achieve this goal, we propose a loss

function for unsupervised non-transferable learn-

ing, which contains three terms. The ﬁrst term is

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnsupervisedNon-transferableTextClassicationGuangtaoZengandWeiLuStatNLPResearchGroupSingaporeUniversityofTechnologyandDesignguangtao_zeng@mymail.sutd.edu.sg,luwei@sutd.edu.sgAbstractTrainingagooddeeplearningmodelrequiressubstantialdataandcomputingresources,whichmakestheresultingneuralmodelavalu-abl...

展开>> 收起<<

Unsupervised Non-transferable Text Classiﬁcation Guangtao Zeng and Wei Lu StatNLP Research Group.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Unsupervised Non-transferable Text Classiﬁcation Guangtao Zeng and Wei Lu StatNLP Research Group

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: