
Revisiting and Advancing Chinese Natural Language Understanding with
Accelerated Heterogeneous Knowledge Pre-training
Taolin Zhang1,2, Junwei Dong2,3, Jianing Wang1,2, Chengyu Wang2∗
, Ang Wang2,
Yinghui Liu2, Jun Huang2, Yong Li2, Xiaofeng He1
1East China Normal University, Shanghai, China
2Alibaba Group, Hangzhou, China
3Chongqing University, Chongqing, China
zhangtl0519@gmail.com,chengyu.wcy@alibaba-inc.com
Abstract
Recently, knowledge-enhanced pre-trained
language models (KEPLMs) improve context-
aware representations via learning from struc-
tured relations in knowledge graphs, and/or
linguistic knowledge from syntactic or depen-
dency analysis. Unlike English, there is a
lack of high-performing open-source Chinese
KEPLMs in the natural language processing
(NLP) community to support various language
understanding applications. In this paper, we
revisit and advance the development of Chi-
nese natural language understanding with a
series of novel Chinese KEPLMs released
in various parameter sizes, namely CKBERT
(Chinese knowledge-enhanced BERT). Specif-
ically, both relational and linguistic knowledge
is effectively injected into CKBERT based on
two novel pre-training tasks, i.e., linguistic-
aware masked language modeling and con-
trastive multi-hop relation modeling. Based on
the above two pre-training paradigms and our
in-house implemented TorchAccelerator, we
have pre-trained base (110M), large (345M)
and huge (1.3B) versions of CKBERT effi-
ciently on GPU clusters. Experiments demon-
strate that CKBERT outperforms strong base-
lines for Chinese over various benchmark NLP
tasks and in terms of different model sizes. 1
1 Introduction
Pre-trained Language Models (PLMs) such as
BERT (Devlin et al.,2019) are pre-trained by self-
supervised learning on large-scale text corpora
to capture the rich semantic knowledge of words
(Li et al.,2021;Gong et al.,2022), improving
various downstream NLP tasks significantly (He
et al.,2020;Xu et al.,2021;Chang et al.,2021).
Although these PLMs have stored much internal
knowledge (Petroni et al.,2019,2020), they can
∗Corresponding author.
1
All the codes and model checkpoints have been released
to public in the EasyNLP framework (Wang et al.,2022).
URL: https://github.com/alibaba/EasyNLP.
hardly understand external background knowledge
from the world such as factual and linguistic knowl-
edge (Colon-Hernandez et al.,2021;Cui et al.,
2021;Lai et al.,2021).
In the literature, most approaches of knowledge
injection can be divided into two categories, includ-
ing relational knowledge and linguistic knowledge.
(1) Relational knowledge-based approaches inject
entity and relation representations in Knowledge
Graphs (KGs) trained by knowledge embedding al-
gorithms (Zhang et al.,2019;Peters et al.,2019) or
convert triples into sentences for joint pre-training
(Liu et al.,2020;Sun et al.,2020). (2) Linguis-
tic knowledge-based approaches extract semantic
units from pre-training sentences such as part-of-
speech tags, constituent and dependency syntactic
parsing, and feed all linguistic information into var-
ious transformer-based architectures (Zhou et al.,
2020;Lai et al.,2021). We observe that there
can be three potential drawbacks. (1) These ap-
proaches generally utilize a single source of knowl-
edge (i.e., inherent linguistic knowledge), which
ignore important knowledge from other sources (Su
et al.,2021) (i.e., relational knowledge from KGs).
(2) Training large-scale KEPLMs from scratch re-
quires high-memory computing devices and is time-
consuming, which brings significant computational
burdens for users (Zhang et al.,2021,2022). (3)
Most of these models are pre-trained in English
only. There is a lack of powerful KEPLMs for
understanding other languages (Lee et al.,2020;
Pérez et al.,2021).
To overcome the above problems, we release a
series of Chinese KEPLMs named CKBERT (Chi-
nese knowledge-enhanced BERT), with heteroge-
neous knowledge sources injected. We particularly
focus on Chinese as it is one of the most widely spo-
ken languages other than English. The CKBERT
models are pre-trained by two well-designed pre-
training tasks as follows:
•Linguistic-aware Masked Language Mod-
arXiv:2210.05287v2 [cs.CL] 12 Oct 2022