Unsupervised Non-transferable Text Classification Guangtao Zeng and Wei Lu StatNLP Research Group

2025-05-06 0 0 1.16MB 14 页 10玖币
侵权投诉
Unsupervised Non-transferable Text Classification
Guangtao Zeng and Wei Lu
StatNLP Research Group
Singapore University of Technology and Design
guangtao_zeng@mymail.sutd.edu.sg, luwei@sutd.edu.sg
Abstract
Training a good deep learning model requires
substantial data and computing resources,
which makes the resulting neural model a valu-
able intellectual property. To prevent the neu-
ral network from being undesirably exploited,
non-transferable learning has been proposed
to reduce the model generalization ability in
specific target domains. However, existing ap-
proaches require labeled data for the target do-
main which can be difficult to obtain. Fur-
thermore, they do not have the mechanism to
still recover the model’s ability to access the
target domain. In this paper, we propose a
novel unsupervised non-transferable learning
method for the text classification task that does
not require annotated target domain data. We
further introduce a secret key component in our
approach for recovering access to the target do-
main, where we design both an explicit and an
implicit method for doing so. Extensive exper-
iments demonstrate the effectiveness of our ap-
proach.
1 Introduction
Deep learning has achieved remarkable success
over the past decade and is active in various fields,
including computer vision, natural language pro-
cessing (NLP), and data mining. Although neural
models can perform well in most tasks, they re-
quire a huge amount of data and a high computation
cost to train, making the trained model a valuable
intellectual property. As a result, it is essential
for us to prevent neural models from being used
without authorization. In the last few years, many
methods have been proposed to safeguard deep
neural networks and they can be roughly divided
into two types: watermarking (Adi et al.,2018),
and secure authorization (Alam et al.,2020). In
the watermarking approaches, the owners can ver-
ify the ownership of the neural model based on
a unique watermark. However, due to the catas-
trophic forgetting problem (Kemker et al.,2018),



 




 

Figure 1: Overview of our unsupervised non-
transferable learning method with secret keys.
the watermark-based neural models (Kuribayashi
et al.,2020;Song et al.,2017) are known to be
vulnerable to certain malicious attacks (Wang and
Kerschbaum,2019), which may lead to the loss
of their watermarks. On the other hand, in the se-
cure authorization approaches, the owners of the
neural network want to ensure that users can only
access the model with authorization. Recently,
Wang et al. (2022) proposed a new perspective
with non-transferable learning (NTL) to protect the
model from illegal access to unauthorized data. The
method trains the model to have good performance
only in the authorized domain while performing
badly in the unauthorized domain. However, such
an approach has some limitations: 1) their work
relies on a significant amount of labeled data from
the target domain, while such labels are usually not
easy to acquire, 2) the access to the unauthorized
domain can no longer be regained, if required, after
the model is learned.
In this work, we propose our new NTL method
named
U
nsupervised
N
on-
T
ransferable
L
earning
(UNTL) for the text classification tasks. As Fig-
ure 1shows, our model can perform well in the
source domain while performing badly in the tar-
get domain. In addition, we propose secret key
modules, which can help recover the ability of the
model in the target domain. Our contributions in-
clude:
We propose a novel unsupervised non-
transferable learning approach for text clas-
arXiv:2210.12651v2 [cs.CL] 19 Feb 2023
sification tasks. Different from existing ap-
proaches, our model can still perform well
without the need for the label information in
the target domain.
We introduce two different methods, namely
Prompt-based Secret Key and Adapter-based
Secret Key, that allow us to recover the ability
of the model to perform classification on the
target domain.
Extensive experiments show that our proposed
models can perform well in the source domain
but badly in the target domain. Moreover, ac-
cess to the target domain can still be regained
using the secret key.
To the best of our knowledge, our work is the
first approach for learning under the unsupervised
non-transferable learning setup, which also comes
with the ability to recover access to the target do-
main.1
2 Related Work
In this section, we briefly survey ideas that are
related to our work from two fields: domain adap-
tation and intellectual property protection. Further-
more, we discuss some limitations in the existing
methods which we will tackle with our approach.
In domain adaptation, given a source domain
and a target domain with unlabeled data or a few la-
beled data, the goal is to improve the performance
in the target task using the knowledge from the
source domain. Ghifary et al. (2014), Tzeng et al.
(2014), and Zhu et al. (2021) applied a Maximum
Mean Discrepancy regularization method (Gretton
et al.,2012) to maximize the invariance informa-
tion between different domains. Ganin et al. (2016)
and Schoenauer-Sebag et al. (2019) tried to match
the feature space distributions of the two domains
with adversarial learning. In contrast to the meth-
ods above, Wang et al. (2022) analyzed domain
adaptation in a different way and proposed non-
transferable learning (NTL) to prevent the knowl-
edge transfer from the source to the target domain
by enlarging the discrepancy between the represen-
tations in different domains.
In intellectual property protection, due to the
significant value and its vulnerability against ma-
licious attacks of learned deep neural networks,
1
Our code and data are released at
https://github.com/
ChaosCodes/UNTL.
it is crucial to propose intellectual property pro-
tection methods to defend the owners of the deep
neural networks (DNNs) from any loss. Recently,
two different approaches to safeguard DNNs have
been proposed: watermarking (Adi et al.,2018)
and secure authorization (Alam et al.,2020). In the
watermarking approaches, researchers designed a
digital watermark that can be embedded into data
such as video, images, and so on. With the de-
tection of the unique watermark, we could verify
the ownership of the copyright of the data. Based
on these ideas, Song et al. (2017) and Kuribayashi
et al. (2020) embedded the digital watermarks into
the parameters of the neural networks. Zhang et al.
(2020) and Wu et al. (2021) proposed a framework
to generate images with an invisible but extractable
watermark. However, they are vulnerable to some
active attack algorithms (Wang and Kerschbaum,
2019;Chen et al.,2021) which first detect the wa-
termark and then rewrite or remove it. On the other
hand, the secure authorization approach seeks to
train a model that generates inaccurate results with-
out authorization. Alam et al. (2020) proposed a
key-based framework that ensures correct model
functioning only with the correct secret key. In ad-
dition, Wang et al. (2022) were inspired by domain
generalization and proposed non-transferable learn-
ing (NTL), which achieves secure authorization by
reducing the model’s generalization ability in the
specified unauthorized domain.
Although the NTL model can effectively prevent
access to the unauthorized domain, it requires tar-
get labels during training, which may not always
be easy to obtain. Furthermore, there is no mecha-
nism to recover access to the unauthorized domain
when needed. In this paper, we present a new NTL
model and show that our model can still have good
performance even in the absence of the target labels
which are, however, indispensable in the work of
Wang et al. (2022). Besides, we extend it to a secret
key-based version. With our method, authorized
users can still access the target domain with the
provided keys.
3 Approach
In this section, we first introduce our proposed
Unsupervised Non-Transferable Learning (UNTL)
approach in Sec. 3.1, followed by a discussion
on its practical limitation – it lacks the ability to
regain the access to the target domain. Next, we
discuss our secret key-based methods in Sec. 3.2
to address this limitation.
3.1 UNTL Text Classification
Problem Description
First of all, we present
our definition of the unsupervised non-transferable
learning task without labeled data from the target
domain. Following Farahani et al. (2020), we con-
sider that a domain consists of three parts: input
space
X
, label space
Y
, and the joint probability
distribution
p(X, Y )
. Given a source domain
Ds=
{xi,yi}Ns
i=1
and a target domain
Dt={xj}Nt
j=1
with unlabeled samples, where
yiRC
is a one-
hot vector indicating the label of
xi
,
C
is the num-
ber of classes, and
Ns, Nt
refer to the number of
examples in the source and target domain respec-
tively. The goal of our UNTL method is to prevent
the knowledge transfer from the source to the target
domain, i.e., to train the model so that it performs
well on the source domain data but poorly on the
target domain data, without the requirement of ac-
cessing the label information of the target data.
Text Classification
In our work, we use a BERT-
based model (Devlin et al.,2019)
ψ
as our feature
extractor for the input sentence and consider the
final hidden state
h
of the token
[CLS]
as the fea-
ture representation, where we denote
h
as
ψ(x)
.
A simple feed-forward network
FFN (·)
will be
added on top of BERT as a classifier to predict the
label. The formal loss function can be:
LCE =E(x,y)∼Ds[CE (FFN (ψ(x)) ,y)] (1)
where
Ds
is the source domain dataset and
CE
indicates the cross entropy function.
Maximum Mean Discrepancy
To enlarge the
distance between the representations of the source
and target domains, we follow Wang et al. (2022)
and use Maximum Mean Discrepancy (Gretton
et al.,2012) (MMD) to achieve this goal. MMD is
a kernel two-sample test and can be used as a met-
ric to determine whether two data distributions
p
and
q
are similar. MMD defines the metric function
as follows:
dp,q =||Exp[ψ(x)] Ex0q[ψx0]||2
Hk(2)
where
Hk
is the reproducing kernel Hilbert space
(RKHS) with a kernel
k
, whose operation is
k(z,z0) = e−||zz0||2
and function
ψ
maps the
sentence input into RKHS. The smaller the distance
dp,q
, the more similar the two distributions
p
and
q
.
In our work, we use MMD to increase the dis-
tance between the feature representations of the
source and the target domain, forcing the feature
extractor
ψ
to extract domain-dependent represen-
tations rather than maximizing the inter-domain
invariance. To prevent the high MMD from domi-
nating the entire loss, we follow Wang et al. (2022)
and set an upper bound for it. Therefore, based on
Equation 2, our MMD loss can be formulated as:
LMMD (S,T) = min (c, dS,T)(3)
where
c
is the upper bound for MMD, and
S,T
are
data distributions of the source and target domains
respectively. With this loss, we only maximize
dS,Twhen it is smaller than the upper bound c.
Domain Classifier
Despite being able to enlarge
the gap between the source and target domains to
some extent, the MMD loss lacks the explicit ability
to clearly draw the boundary between the represen-
tations of different domains, especially when the
knowledge between domains is similar. Therefore,
we hypothesize that using MMD alone may not be
sufficient to yield optimal empirical performance.
To mitigate this issue, we draw inspiration from
the Domain-Adversarial Neural Networks (Ganin
et al.,2016) and propose an additional domain clas-
sifier added on top of the feature extractor. This
classifier is trained to predict the domain with the
feature representations. We employ a cross-entropy
loss to train the domain classifier. By optimizing
this loss, the representations of different domains
are encouraged to be more distinct. Specifically,
we use 0 to indicate the source domain and 1 to
indicate the target domain. We can formulate the
domain classification (DC) loss as:
LDC (S,T) = ExS∼S [CE FFNdc ψxS ,0]
+ExT∼T [CE(FFNdc ψxT ,1)]
(4)
where
FFNdc
is the domain classifier. With this
DC loss as a regularization term, the boundary
of feature representation between the source and
the target can be clearer, facilitating better non-
transferable learning.
Objective Function
In this task, our goal is to
train a model that can perform well on the source
domain while performing badly on the target do-
main. To achieve this goal, we propose a loss
function for unsupervised non-transferable learn-
ing, which contains three terms. The first term is
摘要:

UnsupervisedNon-transferableTextClassicationGuangtaoZengandWeiLuStatNLPResearchGroupSingaporeUniversityofTechnologyandDesignguangtao_zeng@mymail.sutd.edu.sg,luwei@sutd.edu.sgAbstractTrainingagooddeeplearningmodelrequiressubstantialdataandcomputingresources,whichmakestheresultingneuralmodelavalu-abl...

展开>> 收起<<
Unsupervised Non-transferable Text Classification Guangtao Zeng and Wei Lu StatNLP Research Group.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.16MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注