
all current advanced CSC approaches have actually
exploited, either explicitly or implicitly, character
pronunciation. The implicit use takes into account
phonological similarities between pairs of charac-
ters, e.g., by increasing the decoding probability
of characters with similar pronunciation (Cheng
et al.,2020) or integrating such similarities into
the encoding process via graph convolutional net-
works (GCNs) (Cheng et al.,2020). The explicit
use considers directly the pronunciation, or more
specifically, pinyin
1
, of individual characters, en-
coding the pinyin of input characters to produce
extra phonetic features (Xu et al.,2021;Huang
et al.,2021) or decoding the pinyin of target cor-
rect characters to serve as an auxiliary prediction
task (Liu et al.,2021;Ji et al.,2021).
This paper also considers improving CSC with
auxiliary character pronunciation prediction (CPP),
but focuses specifically on the adaptivity and gran-
ularity of the auxiliary task, which have never been
systematically studied before. First, all the prior at-
tempts in similar spirit simply assigned a universal
trade-off between the primary and auxiliary tasks
for all instances during training, while ignoring the
fact that the auxiliary task might provide different
levels of benefits given different instances. Take for
example the instances shown in Table 1. Compared
to the misspelled character “
蓝
” and its correction
“
监
” in the 4th instance, the two characters “
完
”
and “
玩
” in the 1st instance are much more similar
in pronunciation, suggesting that the spelling error
there is more likely to be caused by phonological
similarity, to which the pronunciation-related auxil-
iary task might provide greater benefits and hence
should be assigned a larger weight. Second, prior
efforts mainly explored predicting the whole pinyin
of a character, e.g., “gao1” for “
高
”. Nevertheless,
a syllable in Chinese is inherently composed of an
initial, a final, and a tone, e.g., “g”, “ao”, and “1”
for “
高
”. This fine-grained phonetic representation
can better reflect not only the intrinsic regularities
of Chinese pronunciation, but also the phonological
similarities between Chinese characters. Consider
for example the “
高
” and “
告
” case from the 2nd
instance in Table 1. These two characters show no
similarity in terms of their whole pinyin, but actu-
ally they share the same initial and final, differing
solely in their tones.
Based on the above intuitions we devise
1
Pinyin is the official phonetic system of Mandarin Chi-
nese, which literally means “spelled sounds”.
SCOPE
(i.e.,
S
pelling
C
heck by pr
O
nunciation
P
r
E
diction), which introduces a fine-grained CPP
task with an adaptive task weighting scheme to
improve CSC. Figure 1provides an overview of
SCOPE. Given a sentence with spelling errors as
input, we encode it using ChineseBERT (Sun et al.,
2021) to produce semantic and phonetic features.
Then we build on top of the encoder two parallel
decoders, one to generate target correct characters,
i.e., the primary CSC task, and the other to predict
the initial, final and tone of the pinyin of each target
character, i.e., the auxiliary fine-grained CPP task.
The trade-off between the two tasks can be further
adjusted adaptively for each instance, according
to the phonological similarity between input and
target characters therein. In addition, we design an
iterative correction strategy during inference to ad-
dress the over-correction issue and tackle difficult
instances with consecutive errors.
We empirically evaluate SCOPE on three shared
benchmarks, and achieve substantial and consistent
improvements over previous state-of-the-art on all
three benchmarks, demonstrating the effectiveness
and superiority of our auxiliary CPP task. Compre-
hensive ablation studies further verify the positive
effects of adaptivity and granularity of the task.
The main contributions of this paper are summa-
rized as follows: (1) We investigate the possibility
of introducing an auxiliary CPP task to improve
CSC and, for the first time, systematically discuss
the adaptivity and granularity of this auxiliary task.
(2) We propose SCOPE, which builds two parallel
decoders upon a shared encoder for CSC and CPP,
with a novel adaptive weighting scheme to balance
the two tasks. (3) We establish new state-of-the-art
on three benchmarking CSC datasets.
2 Related Work
CSC is a fundamental NLP task that has received
wide attention over the past decades. Early work on
this topic was mainly based on manually designed
rules (Mangu and Brill,1997;Jiang et al.,2012).
After that, statistical language models became the
mainstream for CSC (Chen et al.,2013;Yu and Li,
2014;Tseng et al.,2015). Methods of this kind in
general followed a pipeline of error detection, can-
didate generation, and candidate selection. Given
a sentence, the error positions are first detected by
the perplexity of a language model. The candidates
for corrections can then be generated according to
similarity between characters, typically by using