Instance Segmentation for Chinese Character Stroke Extraction Datasets and Benchmarks Lizhao Liu1 Kunyang Lin1 Shangxin Huang1 Zhongli Li2

2025-05-05 0 0 8.14MB 12 页 10玖币
侵权投诉
Instance Segmentation for Chinese Character Stroke Extraction,
Datasets and Benchmarks
Lizhao Liu1*, Kunyang Lin1, Shangxin Huang1, Zhongli Li2,
Chao Li3, Yunbo Cao2, and Qingyu Zhou2
1South China University of Technology, 2Tencent Cloud Xiaowei, 3Xiaomi Group,
selizhaoliu@mail.scut.edu.cn, qingyuzhou@tencent.com
Abstract
Stroke is the basic element of Chinese character and stroke
extraction has been an important and long-standing endeavor.
Existing stroke extraction methods are often handcrafted and
highly depend on domain expertise due to the limited train-
ing data. Moreover, there are no standardized benchmarks
to provide a fair comparison between different stroke ex-
traction methods, which, we believe, is a major impediment
to the development of Chinese character stroke understand-
ing and related tasks. In this work, we present the first pub-
lic available Chinese Character Stroke Extraction (CCSE)
benchmark, with two new large-scale datasets: Kaiti CCSE
(CCSE-Kai) and Handwritten CCSE (CCSE-HW). With the
large-scale datasets, we hope to leverage the representation
power of deep models such as CNNs to solve the stroke ex-
traction task, which, however, remains an open question. To
this end, we turn the stroke extraction problem into a stroke
instance segmentation problem. Using the proposed datasets
to train a stroke instance segmentation model, we surpass
previous methods by a large margin. Moreover, the models
trained with the proposed datasets benefit the downstream
font generation and handwritten aesthetic assessment tasks.
We hope these benchmark results can facilitate further re-
search. The source code and datasets are publicly available
at: https://github.com/lizhaoliu-Lec/CCSE.
Introduction
Stroke is the basic element of Chinese character and stroke
extraction has been an important and long-standing en-
deavor (Lee and Wu 1998). Given an image of a Chinese
character, stroke extraction aims to decompose it into in-
dividual strokes (see Figure 1). It serves as a bedrock for
many Chinese character-related applications such as hand-
written synthesis (Liu and Lian 2021), font generation (Jiang
et al. 2019; Zeng et al. 2021; Xie et al. 2021), character style
transfer (Huang et al. 2020), handwritten aesthetic evalua-
tion (Xu et al. 2007; Sun et al. 2015), etc. Recently, it has
been shown that explicitly incorporating the stroke infor-
mation boosts the performance of Chinese character-related
tasks (Gao and Wu 2020; Huang et al. 2020; Zeng et al.
2021). Though various tasks that leverage the stroke infor-
*This work was partially done while the author was an intern at
Tencent Cloud Xiaowei.
Corresponding author.
𠃋
𠃌
𡿨
𠄎
𠄌
(a)
(b) Stroke
Extraction
Figure 1: (a) Illustration of 25 kinds of Chinese character
strokes considered in this paper, which serve as the building
rock of Chinese characters. (b) Illustration of the Chinese
character stroke extraction task. Given a Chinese character,
the stroke extraction task requires the model to decompose
the character into individual strokes.
mation has gained a large amount of attention from the com-
munity and made substantial progress by applying the state-
of-the-art deep models, the understanding of the Chinese
character stroke alone has fallen behind.
Generally, there are two lines of works: stroke extraction
from skeleton images (Fan and Wu 2000; Liu, Kim, and Kim
2001; Liu, Jia, and Tan 2006; Su, Cao, and Wang 2009;
Zeng et al. 2010) and from original images (Lee and Wu
1998; Yu, Wu, and Yuan 2012). For skeleton-based methods,
the thinning algorithm (Arcelli and Di Baja 1985) is often
used as a preprocessing step, which introduces stroke dis-
tortion and the loss of short strokes. Stroke extraction from
the original image is thereby proposed to address these is-
sues. This kind of approach typically enjoys rich informa-
tion such as stroke width and curvature, obtaining good per-
formance. The latest research (Xu et al. 2016) proposes to
combine merits from both worlds by finding the cross points
on the skeleton and combining stroke segments on original
images. However, due to the lack of a large-scale dataset to
develop learning-based methods, most previous approaches
are rule-based and require in-depth expertise during algo-
rithm design. Thus, they inherently suffer from the follow-
ing limitations: First, to decompose the character into stroke
segments, handcrafted rules are required to find the partition
arXiv:2210.13826v1 [cs.CV] 25 Oct 2022
points, which inevitably contain fork points due to the com-
plex character structure. Second, these methods are typically
tailored to the regular and highly structural standard fonts
and may not perform well on handwritten characters due to
the large intra-class variance of strokes caused by different
handwriting habits. Last, they aim to optimize the stroke ex-
traction task only and may not produce transferable features
to benefit downstream tasks.
Moreover, there are no standardized benchmarks to pro-
vide a fair comparison between different stroke extraction
methods, which is of great importance to guide and facilitate
further research. And the lack of publicly available datasets
leads to inconsistent evaluation protocols. Specifically, (Cao
and Tan 2000; Qiguang 2004; Xu et al. 2016) consider ac-
curacy as the main evaluation metric for the stroke extrac-
tion task, which does not consider the spatial location of
the extracted stroke, thereby, can not comprehensively mea-
sure the performance of stroke extraction algorithm. (Chen
et al. 2016, 2017) leverage Hamming distance and cut dis-
crepancy to measure the consistency of stroke interiors and
the similarity of stroke boundaries, respectively. They re-
quire the extracted strokes and the ground truth strokes to be
strictly aligned by spatial location and categories, which is
hard to evaluate the missed and false extraction. Thus, how
to effectively evaluate the stroke extraction algorithm with
reasonable protocol remains an unsolved question.
To facilitate stroke extraction research, we present a Chi-
nese Character Stroke Extraction (CCSE) benchmark, with
two new large-scale datasets and evaluation methods. As
the foundation of the CCSE benchmark, the datasets have
two requirements: i.e., character-level diversity and stroke-
level diversity. Specifically, the datasets should cover as
many Chinese characters to represent the structure between
strokes, whose relationship can be very complex (see the left
of Figure 2). Moreover, since humans with different writing
habits will produce very different appearances even for the
same stroke (see the right of Figure 2), the datasets should
cover this kind of diversity for models to achieve effective
extraction. To this end, we harvested a large set of Kai Ti
(a kind of Chinese font) Chinese character images and hand-
written Chinese character images to achieve character-level
diversity and stroke-level diversity, respectively.
With the large-scale datasets, we hope to leverage the rep-
resentation power of deep models such as CNNs to solve
the stroke extraction task, which, however, remains an open
question. To this end, we turn the stroke extraction problem
into the stroke instance segmentation problem. This change
of view not only allows us to take advantage of the state-
of-the-art instance segmentation models but also the well-
defined evaluation metrics (i.e., box AP and mask AP). We
perform experiments with state-of-the-art instance segmen-
tation models to produce benchmark results that facilitate
further research. Compared to previous methods of stroke
extraction, our approach does not require reference images
and in-depth domain expertise. Moreover, the deep models
trained on our dataset are able to produce transferable fea-
tures that consistently benefit the downstream tasks.
We summarize our contributions as follows:
We propose the first benchmark containing two high-
quality large-scale datasets that satisfy the requirements
of the character-level and stroke-level diversities for
building promising stroke extraction models.
We cast the stroke extraction problem into the stroke
instance segmentation problem. In this way, we build
deep stroke extraction models that scale to scenarios with
highly-diverse characters and stroke variance while pro-
ducing transferable features to benefit downstream tasks.
By leveraging the state-of-the-art instance segmentation
models and well-defined evaluation metrics, we build
standardized benchmarks to facilitate further research.
Related Work
Stroke Extraction
Stroke extraction aims to extract strokes from handwritten
image (Lee and Wu 1998), which is very difficult to solve
due to the complex character structure (Cao and Tan 2000)
and the large intra-class variances (Xu et al. 2016). Exist-
ing methods mainly follow stroke extraction from skele-
tonized character or from original character paradigms. For
the first kind of approach, efforts have been put into explor-
ing the relations between strokes by resolving the fork points
issues (Fan and Wu 2000), applying affine transformation
to strokes (Liu, Jia, and Tan 2006), detecting ambiguous
zone (Su, Cao, and Wang 2009) and using additional ref-
erence image (Zeng et al. 2010). However, these approaches
are limited by the thinning step that introduces stroke dis-
tortion and the loss of short strokes. Therefore, stroke ex-
traction from the original image is proposed to conquer this
limitation. These approaches focus on leveraging the rich in-
formation in characters such as stroke width and curvature
by combining multiple contour information in strokes (Lee
and Wu 1998), exploring pixel-stroke relationships (Cao and
Tan 2000), detecting strokes in multiple directions (Su and
Wang 2004) and using corner points (Yu, Wu, and Yuan
2012). The latest approach (Xu et al. 2016) considers the
advantages from both worlds to further improve the per-
formance. Nonetheless, these methods typically use hand-
crafted rules to improve the stroke extraction task only dur-
ing algorithm design. Therefore, they inherently suffer from
extracting strokes from complex characters and with highly
irregular shape. Moreover, they can not be trivially em-
ployed for downstream tasks such as font generation, lim-
iting their further application.
Instance Segmentation
The goal of instance segmentation is to segment every in-
stance (countable objects) in an image by assigning it with
pixel-wise class label. Existing approaches can be broadly
divided into two categories: two-stage (He et al. 2017;
Hsieh et al. 2021) and one-stage (Bolya et al. 2019). Two-
stage methods consist of instance detection and segmenta-
tion steps. In Mask R-CNN (He et al. 2017), one of the most
important milestones in computer vision, the segmentation
head is applied to the detected instances from the Faster
R-CNN (Ren et al. 2015) detector to acquire the instance-
wise segmentation mask. Approaches based on Mask R-
CNN typically demand dense prior proposals or anchors to
obtain decent results, leading to complicated label assign-
ment and post-processing steps. To tackle this issue, one-
stage methods such as YOLACT (Bolya et al. 2019) produce
instance masks by linearly combining the prototypes with
the mask coefficients and do not depend on pre-detection
step. In this paper, we benefit from the rapid development
of instance segmentation algorithms and focus on applying
the instance segmentation models to tackle the stroke ex-
traction task, thus we mainly consider the well-studied two-
stage methods such as Mask R-CNN as our baselines.
na
heng
shu
heng xie
gou
CCSE-KaiCCSE-HW
Figure 2: From left to right, samples of annotated Chinese
characters in CCSE-Kai dataset and CCSE-HW dataset.
Proposed Datasets
Image Collection and Annotation
To achieve promising stroke extraction performance, we har-
vest a large number of samples that cover the complex struc-
tures of Chinese characters and different styles of stroke,
which are character-level and stroke-level diversity, respec-
tively. Since the frequently used Chinese characters are re-
stricted to a small range, there may not have enough hand-
written characters with complex stroke structures. Thus, we
collect the frequently used standard font (e.g., Kai Ti) to
meet the character-level diversity requirement. Then, to sat-
isfy the stroke-level diversity, we gather handwritten Chi-
nese character images from different writers. We detail the
process of collection and annotation below.
Kai Ti Image Collection and Annotation Labeling
every stroke in an image is time-consuming and labor-
intensive. Since Kai Ti is a standard Chinese font com-
monly used in daily life, our first thought is to collect an
annotation-free Kai Ti dataset by retrieving the spatial in-
formation from its font design database. However, the coor-
dinates of each stoke are not preserved during the font design
process. Thus, we browse the web resources extensively and
discover an open source project Make Me A Hanzi1, which
has constructed a stroke database for Kai Ti. Then, this
project is further evolved by cnchar2, which provides more
1https://github.com/skishore/makemeahanzi
2https://github.com/theajack/cnchar
user-friendly interfaces to access the Kai Ti image stroke-
by-stroke. As shown in Figure 3, the results from cnchar
have a clear stroke-wise mark with light brown denoting the
spatial mask and category of the current stroke. Regarding
the stroke category, the database of cnchar contains the most
frequently used 25 categories (see Figure 1 (a) for details).
(4) dian(1) shu (2) heng zhe (5) pie(3) heng (6) shu
Figure 3: Illustration of the Kai Ti image collection pro-
cess. We use the open source character rendering library cn-
char to generate the images of a Chinese character in a stroke
incremental manner. The character ya is written stroke by
stroke with the stroke highlighted by light brown in the im-
age and the stroke class denoted underneath.
Kai Ti
Character
Stroke-
separable
Handwriting
Stroke-
inseparable
Handwriting
Kai Ti
Character
Stroke-
separable
Handwriting
Stroke-
inseparable
Handwriting
Figure 4: Comparison between the stroke-separable and
stroke-inseparable handwriting. The corresponding Kai
Ti characters are put on the left for reference.
With the assistance of cnchar, we harvest stroke-wise
images from 9,523 unique Kai Ti Chinese characters.
Then, we use OpenCV3to produce the bounding box and
mask annotation from the light brown area, resulting in our
Kaiti CCSE (CCSE-Kai) dataset. The visualization results of
CCSE-Kai are depicted on the left of Figure 2. We can see
that CCSE-Kai provides samples with complex stroke struc-
tures. There are more than 1M stroke instances in CCSE-
Kai and the detailed statistics will be elaborated later. The
merits of our CCSE-Kai are as follows: 1) We discover an
automated method to effectively produce a stroke instance
dataset without extensive human labor. 2) CCSE-Kai sat-
isfies the character-level diversity by covering most of the
Chinese characters despite the usage frequency. However, its
shortcoming is obvious: lack of stroke-level diversity since
the stroke in the standard font library is relatively fixed. In
this sense, the model trained with CCSE-Kai may not deliver
satisfactory results in some application scenarios, where ex-
tracting strokes from handwritten Chinese is desired.
Handwritten Image Collection and Annotation Since
CCSE-Kai only meets character-level diversity, we target
at improving the stroke-level diversity of our dataset by
leveraging the handwritten character with various styles. To
this end, we further harvest handwritten Chinese charac-
ters and label them in a stroke instance manner. Specifi-
cally, we leverage the CASIA Offline Chinese Handwrit-
3https://opencv.org/
摘要:

InstanceSegmentationforChineseCharacterStrokeExtraction,DatasetsandBenchmarksLizhaoLiu1*,KunyangLin1,ShangxinHuang1,ZhongliLi2,ChaoLi3,YunboCao2,andQingyuZhou2†1SouthChinaUniversityofTechnology,2TencentCloudXiaowei,3XiaomiGroup,selizhaoliu@mail.scut.edu.cn,qingyuzhou@tencent.comAbstractStrokeistheba...

展开>> 收起<<
Instance Segmentation for Chinese Character Stroke Extraction Datasets and Benchmarks Lizhao Liu1 Kunyang Lin1 Shangxin Huang1 Zhongli Li2.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:8.14MB 格式:PDF 时间:2025-05-05

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注