
A CURRICULUM LEARNING APPROACH FOR MULTI-DOMAIN TEXT CLASSIFICATION
USING KEYWORD WEIGHT RANKING
Zilin Yuan1, Yinghui Li1, Yangning Li1, Rui Xie2, Wei Wu2, Hai-Tao Zheng1,3∗
1Shenzhen International Graduate School, Tsinghua University
2Meituan, 3Peng Cheng Laboratory
ABSTRACT
Text classification is a very classic NLP task, but it has two promi-
nent shortcomings: On the one hand, text classification is deeply
domain-dependent. That is, a classifier trained on the corpus of one
domain may not perform so well in another domain. On the other
hand, text classification models require a lot of annotated data for
training. However, for some domains, there may not exist enough
annotated data. Therefore, it is valuable to investigate how to effi-
ciently utilize text data from different domains to improve the per-
formance of models in various domains. Some multi-domain text
classification models are trained by adversarial training to extract
shared features among all domains and the specific features of each
domain. We noted that the distinctness of the domain-specific fea-
tures is different, so in this paper, we propose to use a curriculum
learning strategy based on keyword weight ranking to improve the
performance of multi-domain text classification models. The exper-
imental results on the Amazon review and FDU-MTL datasets show
that our curriculum learning strategy effectively improves the perfor-
mance of multi-domain text classification models based on adversar-
ial learning and outperforms state-of-the-art methods.
Index Terms—Multi-Domain Text Classification, Curriculum
Learning, Keyword Weight Ranking
1. INTRODUCTION
Text classification is one of the fundamental NLP tasks and it has
a wide range of applications, such as spam determination [1], news
classification [2], and evaluation of e-commerce products [3]. The
research on text classification methods can be traced back to the
methods based on expert rules in the 1950s. In the 1990s, machine
learning classification methods combining feature engineering and
classifiers began to appear [4], and now the more popular method is
to use CNN [5], RNN[6, 7], attention mechanism [8] and other deep
learning methods for classification.
But no matter which method, there are two main problems: the
highly domain-dependence and the need for amounts of the anno-
tated corpus. Domain-dependence means that the classifier trained
on a certain domain may not have the same effect in other domains,
because the meaning of vocabulary of different domains may be dif-
ferent, and even the same word expresses different meanings in dif-
ferent domains. As shown in Figure 1, the “infantile” [9] often ex-
presses a negative meaning in the domain of Movie Review (e.g.,
“The idea of the movie is infantile”), but there is usually no obvi-
ous emotional color in the evaluation of Infant Products (e.g., “The
infantile toy was sold out yesterday”). Therefore, when we want
to train classifiers in different domain texts, we need enough labeled
* Corresponding author. (E-mail: zheng.haitao@sz.tsinghua.edu.cn)
: The idea of the movie is infantile.
: The infantile toy was sold out yesterday.
Neutral
Negative
😐
😡
Movie Review
Infant Products
Fig. 1. The different sentiments of “infantile” in different domains.
data in each domain, but not all domains have enough domain corpus
to train. So it is necessary to make full use of the corpus in different
domains to classify the texts in a specific domain, also known as the
Multi-Domain Text Classification (MDTC) [10, 11]. However, the
traditional MDTC methods [11, 12] all ignore a piece of important
information. That is, the classification difficulty of different domains
is different.
The difficulty of text classification of different domains is in-
consistent, so this feature might be used to make the model learn the
data from easy to difficult. This way of learning is like human learn-
ing, in which simple lessons are learned first, followed by complex
lessons. This learning mode is called curriculum learning [13], and it
has shown outstanding promotion in NLP tasks such as dialog state
tracking [14], few-shot text classification [15], Chinese Spell Check-
ing [16] and so on. The core of the course learning lies in the dif-
ficulty measurer of data samples and the data scheduler. Combined
with the extraction of private and shared features by multi-domain
text classification, we propose that the sum of the weights of domain
keywords can be regarded as a measurer of the difficulty of domain-
specific feature extraction to adjust the order when the corpus of a
specific domain is fed into the model.
Based on the above motivations, we propose a framework called
Keyword-weight-aware Curriculum Learning (KCL) for MDTC,
which includes the following two features:
1) By calculating the word weights of texts, take the Top-N
words as the domain keywords, and calculate the sum of the weights
of these N keywords to measure the difficulty of extracting the
domain-specific feature of each domain. The higher the sum is, the
more obvious the domain-specific features are, and the easier it is to
extract, so it is necessary to enter the model for training earlier.
2) Using different methods of keyword extraction and testing
different numbers of keywords to find the best order of domains.
The experimental results show that our proposed approach im-
proves MDTC performance and achieves new state-of-the-art results
on the Amazon review dataset and FDU-MTL dataset.
arXiv:2210.15147v1 [cs.CL] 27 Oct 2022