A CURRICULUM LEARNING APPROACH FOR MULTI-DOMAIN TEXT CLASSIFICATION USING KEYWORD WEIGHT RANKING Zilin Yuan1 Yinghui Li1 Yangning Li1 Rui Xie2 Wei Wu2 Hai-Tao Zheng13

2025-04-30 0 0 537.68KB 5 页 10玖币
侵权投诉
A CURRICULUM LEARNING APPROACH FOR MULTI-DOMAIN TEXT CLASSIFICATION
USING KEYWORD WEIGHT RANKING
Zilin Yuan1, Yinghui Li1, Yangning Li1, Rui Xie2, Wei Wu2, Hai-Tao Zheng1,3
1Shenzhen International Graduate School, Tsinghua University
2Meituan, 3Peng Cheng Laboratory
ABSTRACT
Text classification is a very classic NLP task, but it has two promi-
nent shortcomings: On the one hand, text classification is deeply
domain-dependent. That is, a classifier trained on the corpus of one
domain may not perform so well in another domain. On the other
hand, text classification models require a lot of annotated data for
training. However, for some domains, there may not exist enough
annotated data. Therefore, it is valuable to investigate how to effi-
ciently utilize text data from different domains to improve the per-
formance of models in various domains. Some multi-domain text
classification models are trained by adversarial training to extract
shared features among all domains and the specific features of each
domain. We noted that the distinctness of the domain-specific fea-
tures is different, so in this paper, we propose to use a curriculum
learning strategy based on keyword weight ranking to improve the
performance of multi-domain text classification models. The exper-
imental results on the Amazon review and FDU-MTL datasets show
that our curriculum learning strategy effectively improves the perfor-
mance of multi-domain text classification models based on adversar-
ial learning and outperforms state-of-the-art methods.
Index TermsMulti-Domain Text Classification, Curriculum
Learning, Keyword Weight Ranking
1. INTRODUCTION
Text classification is one of the fundamental NLP tasks and it has
a wide range of applications, such as spam determination [1], news
classification [2], and evaluation of e-commerce products [3]. The
research on text classification methods can be traced back to the
methods based on expert rules in the 1950s. In the 1990s, machine
learning classification methods combining feature engineering and
classifiers began to appear [4], and now the more popular method is
to use CNN [5], RNN[6, 7], attention mechanism [8] and other deep
learning methods for classification.
But no matter which method, there are two main problems: the
highly domain-dependence and the need for amounts of the anno-
tated corpus. Domain-dependence means that the classifier trained
on a certain domain may not have the same effect in other domains,
because the meaning of vocabulary of different domains may be dif-
ferent, and even the same word expresses different meanings in dif-
ferent domains. As shown in Figure 1, the “infantile” [9] often ex-
presses a negative meaning in the domain of Movie Review (e.g.,
“The idea of the movie is infantile”), but there is usually no obvi-
ous emotional color in the evaluation of Infant Products (e.g., “The
infantile toy was sold out yesterday”). Therefore, when we want
to train classifiers in different domain texts, we need enough labeled
* Corresponding author. (E-mail: zheng.haitao@sz.tsinghua.edu.cn)
: The idea of the movie is infantile.
: The infantile toy was sold out yesterday.
Neutral
Negative
😐
😡
Movie Review
Infant Products
Fig. 1. The different sentiments of “infantile” in different domains.
data in each domain, but not all domains have enough domain corpus
to train. So it is necessary to make full use of the corpus in different
domains to classify the texts in a specific domain, also known as the
Multi-Domain Text Classification (MDTC) [10, 11]. However, the
traditional MDTC methods [11, 12] all ignore a piece of important
information. That is, the classification difficulty of different domains
is different.
The difficulty of text classification of different domains is in-
consistent, so this feature might be used to make the model learn the
data from easy to difficult. This way of learning is like human learn-
ing, in which simple lessons are learned first, followed by complex
lessons. This learning mode is called curriculum learning [13], and it
has shown outstanding promotion in NLP tasks such as dialog state
tracking [14], few-shot text classification [15], Chinese Spell Check-
ing [16] and so on. The core of the course learning lies in the dif-
ficulty measurer of data samples and the data scheduler. Combined
with the extraction of private and shared features by multi-domain
text classification, we propose that the sum of the weights of domain
keywords can be regarded as a measurer of the difficulty of domain-
specific feature extraction to adjust the order when the corpus of a
specific domain is fed into the model.
Based on the above motivations, we propose a framework called
Keyword-weight-aware Curriculum Learning (KCL) for MDTC,
which includes the following two features:
1) By calculating the word weights of texts, take the Top-N
words as the domain keywords, and calculate the sum of the weights
of these N keywords to measure the difficulty of extracting the
domain-specific feature of each domain. The higher the sum is, the
more obvious the domain-specific features are, and the easier it is to
extract, so it is necessary to enter the model for training earlier.
2) Using different methods of keyword extraction and testing
different numbers of keywords to find the best order of domains.
The experimental results show that our proposed approach im-
proves MDTC performance and achieves new state-of-the-art results
on the Amazon review dataset and FDU-MTL dataset.
arXiv:2210.15147v1 [cs.CL] 27 Oct 2022
摘要:

ACURRICULUMLEARNINGAPPROACHFORMULTI-DOMAINTEXTCLASSIFICATIONUSINGKEYWORDWEIGHTRANKINGZilinYuan1,YinghuiLi1,YangningLi1,RuiXie2,WeiWu2,Hai-TaoZheng1;31ShenzhenInternationalGraduateSchool,TsinghuaUniversity2Meituan,3PengChengLaboratoryABSTRACTTextclassicationisaveryclassicNLPtask,butithastwopromi-ne...

展开>> 收起<<
A CURRICULUM LEARNING APPROACH FOR MULTI-DOMAIN TEXT CLASSIFICATION USING KEYWORD WEIGHT RANKING Zilin Yuan1 Yinghui Li1 Yangning Li1 Rui Xie2 Wei Wu2 Hai-Tao Zheng13.pdf

共5页,预览1页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:5 页 大小:537.68KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 5
客服
关注