Instance Segmentation for Chinese Character Stroke Extraction Datasets and Benchmarks Lizhao Liu1 Kunyang Lin1 Shangxin Huang1 Zhongli Li2

2025-05-05 0 0 8.14MB 12 页 10玖币

侵权投诉

Instance Segmentation for Chinese Character Stroke Extraction,

Datasets and Benchmarks

Lizhao Liu1*, Kunyang Lin1, Shangxin Huang1, Zhongli Li2,

Chao Li3, Yunbo Cao2, and Qingyu Zhou2†

1South China University of Technology, 2Tencent Cloud Xiaowei, 3Xiaomi Group,

selizhaoliu@mail.scut.edu.cn, qingyuzhou@tencent.com

Abstract

Stroke is the basic element of Chinese character and stroke

extraction has been an important and long-standing endeavor.

Existing stroke extraction methods are often handcrafted and

highly depend on domain expertise due to the limited train-

ing data. Moreover, there are no standardized benchmarks

to provide a fair comparison between different stroke ex-

traction methods, which, we believe, is a major impediment

to the development of Chinese character stroke understand-

ing and related tasks. In this work, we present the ﬁrst pub-

lic available Chinese Character Stroke Extraction (CCSE)

benchmark, with two new large-scale datasets: Kaiti CCSE

(CCSE-Kai) and Handwritten CCSE (CCSE-HW). With the

large-scale datasets, we hope to leverage the representation

power of deep models such as CNNs to solve the stroke ex-

traction task, which, however, remains an open question. To

this end, we turn the stroke extraction problem into a stroke

instance segmentation problem. Using the proposed datasets

to train a stroke instance segmentation model, we surpass

previous methods by a large margin. Moreover, the models

trained with the proposed datasets beneﬁt the downstream

font generation and handwritten aesthetic assessment tasks.

We hope these benchmark results can facilitate further re-

search. The source code and datasets are publicly available

at: https://github.com/lizhaoliu-Lec/CCSE.

Introduction

Stroke is the basic element of Chinese character and stroke

extraction has been an important and long-standing en-

deavor (Lee and Wu 1998). Given an image of a Chinese

character, stroke extraction aims to decompose it into in-

dividual strokes (see Figure 1). It serves as a bedrock for

many Chinese character-related applications such as hand-

written synthesis (Liu and Lian 2021), font generation (Jiang

et al. 2019; Zeng et al. 2021; Xie et al. 2021), character style

transfer (Huang et al. 2020), handwritten aesthetic evalua-

tion (Xu et al. 2007; Sun et al. 2015), etc. Recently, it has

been shown that explicitly incorporating the stroke infor-

mation boosts the performance of Chinese character-related

tasks (Gao and Wu 2020; Huang et al. 2020; Zeng et al.

2021). Though various tasks that leverage the stroke infor-

*This work was partially done while the author was an intern at

Tencent Cloud Xiaowei.

†Corresponding author.

汉

𠃋㇅

㇄

𠃌

亅

㇁𡿨

㇇

㇎

乚

㇏㇂

⺄

一

㇉

㇀𠄎

丶

㇋

ㄣ

㇏乛

丨

㇊

𠄌

(a)

(b) Stroke

Extraction

Figure 1: (a) Illustration of 25 kinds of Chinese character

strokes considered in this paper, which serve as the building

rock of Chinese characters. (b) Illustration of the Chinese

character stroke extraction task. Given a Chinese character,

the stroke extraction task requires the model to decompose

the character into individual strokes.

mation has gained a large amount of attention from the com-

munity and made substantial progress by applying the state-

of-the-art deep models, the understanding of the Chinese

character stroke alone has fallen behind.

Generally, there are two lines of works: stroke extraction

from skeleton images (Fan and Wu 2000; Liu, Kim, and Kim

2001; Liu, Jia, and Tan 2006; Su, Cao, and Wang 2009;

Zeng et al. 2010) and from original images (Lee and Wu

1998; Yu, Wu, and Yuan 2012). For skeleton-based methods,

the thinning algorithm (Arcelli and Di Baja 1985) is often

used as a preprocessing step, which introduces stroke dis-

tortion and the loss of short strokes. Stroke extraction from

the original image is thereby proposed to address these is-

sues. This kind of approach typically enjoys rich informa-

tion such as stroke width and curvature, obtaining good per-

formance. The latest research (Xu et al. 2016) proposes to

combine merits from both worlds by ﬁnding the cross points

on the skeleton and combining stroke segments on original

images. However, due to the lack of a large-scale dataset to

develop learning-based methods, most previous approaches

are rule-based and require in-depth expertise during algo-

rithm design. Thus, they inherently suffer from the follow-

ing limitations: First, to decompose the character into stroke

segments, handcrafted rules are required to ﬁnd the partition

arXiv:2210.13826v1 [cs.CV] 25 Oct 2022

points, which inevitably contain fork points due to the com-

plex character structure. Second, these methods are typically

tailored to the regular and highly structural standard fonts

and may not perform well on handwritten characters due to

the large intra-class variance of strokes caused by different

handwriting habits. Last, they aim to optimize the stroke ex-

traction task only and may not produce transferable features

to beneﬁt downstream tasks.

Moreover, there are no standardized benchmarks to pro-

vide a fair comparison between different stroke extraction

methods, which is of great importance to guide and facilitate

further research. And the lack of publicly available datasets

leads to inconsistent evaluation protocols. Speciﬁcally, (Cao

and Tan 2000; Qiguang 2004; Xu et al. 2016) consider ac-

curacy as the main evaluation metric for the stroke extrac-

tion task, which does not consider the spatial location of

the extracted stroke, thereby, can not comprehensively mea-

sure the performance of stroke extraction algorithm. (Chen

et al. 2016, 2017) leverage Hamming distance and cut dis-

crepancy to measure the consistency of stroke interiors and

the similarity of stroke boundaries, respectively. They re-

quire the extracted strokes and the ground truth strokes to be

strictly aligned by spatial location and categories, which is

hard to evaluate the missed and false extraction. Thus, how

to effectively evaluate the stroke extraction algorithm with

reasonable protocol remains an unsolved question.

To facilitate stroke extraction research, we present a Chi-

nese Character Stroke Extraction (CCSE) benchmark, with

two new large-scale datasets and evaluation methods. As

the foundation of the CCSE benchmark, the datasets have

two requirements: i.e., character-level diversity and stroke-

level diversity. Speciﬁcally, the datasets should cover as

many Chinese characters to represent the structure between

strokes, whose relationship can be very complex (see the left

of Figure 2). Moreover, since humans with different writing

habits will produce very different appearances even for the

same stroke (see the right of Figure 2), the datasets should

cover this kind of diversity for models to achieve effective

extraction. To this end, we harvested a large set of Kai Ti

(a kind of Chinese font) Chinese character images and hand-

written Chinese character images to achieve character-level

diversity and stroke-level diversity, respectively.

With the large-scale datasets, we hope to leverage the rep-

resentation power of deep models such as CNNs to solve

the stroke extraction task, which, however, remains an open

question. To this end, we turn the stroke extraction problem

into the stroke instance segmentation problem. This change

of view not only allows us to take advantage of the state-

of-the-art instance segmentation models but also the well-

deﬁned evaluation metrics (i.e., box AP and mask AP). We

perform experiments with state-of-the-art instance segmen-

tation models to produce benchmark results that facilitate

further research. Compared to previous methods of stroke

extraction, our approach does not require reference images

and in-depth domain expertise. Moreover, the deep models

trained on our dataset are able to produce transferable fea-

tures that consistently beneﬁt the downstream tasks.

We summarize our contributions as follows:

• We propose the ﬁrst benchmark containing two high-

quality large-scale datasets that satisfy the requirements

of the character-level and stroke-level diversities for

building promising stroke extraction models.

• We cast the stroke extraction problem into the stroke

instance segmentation problem. In this way, we build

deep stroke extraction models that scale to scenarios with

highly-diverse characters and stroke variance while pro-

ducing transferable features to beneﬁt downstream tasks.

• By leveraging the state-of-the-art instance segmentation

models and well-deﬁned evaluation metrics, we build

standardized benchmarks to facilitate further research.

Related Work

Stroke Extraction

Stroke extraction aims to extract strokes from handwritten

image (Lee and Wu 1998), which is very difﬁcult to solve

due to the complex character structure (Cao and Tan 2000)

and the large intra-class variances (Xu et al. 2016). Exist-

ing methods mainly follow stroke extraction from skele-

tonized character or from original character paradigms. For

the ﬁrst kind of approach, efforts have been put into explor-

ing the relations between strokes by resolving the fork points

issues (Fan and Wu 2000), applying afﬁne transformation

to strokes (Liu, Jia, and Tan 2006), detecting ambiguous

zone (Su, Cao, and Wang 2009) and using additional ref-

erence image (Zeng et al. 2010). However, these approaches

are limited by the thinning step that introduces stroke dis-

tortion and the loss of short strokes. Therefore, stroke ex-

traction from the original image is proposed to conquer this

limitation. These approaches focus on leveraging the rich in-

formation in characters such as stroke width and curvature

by combining multiple contour information in strokes (Lee

and Wu 1998), exploring pixel-stroke relationships (Cao and

Tan 2000), detecting strokes in multiple directions (Su and

Wang 2004) and using corner points (Yu, Wu, and Yuan

2012). The latest approach (Xu et al. 2016) considers the

advantages from both worlds to further improve the per-

formance. Nonetheless, these methods typically use hand-

crafted rules to improve the stroke extraction task only dur-

ing algorithm design. Therefore, they inherently suffer from

extracting strokes from complex characters and with highly

irregular shape. Moreover, they can not be trivially em-

ployed for downstream tasks such as font generation, lim-

iting their further application.

Instance Segmentation

The goal of instance segmentation is to segment every in-

stance (countable objects) in an image by assigning it with

pixel-wise class label. Existing approaches can be broadly

divided into two categories: two-stage (He et al. 2017;

Hsieh et al. 2021) and one-stage (Bolya et al. 2019). Two-

stage methods consist of instance detection and segmenta-

tion steps. In Mask R-CNN (He et al. 2017), one of the most

important milestones in computer vision, the segmentation

head is applied to the detected instances from the Faster

R-CNN (Ren et al. 2015) detector to acquire the instance-

wise segmentation mask. Approaches based on Mask R-

CNN typically demand dense prior proposals or anchors to

obtain decent results, leading to complicated label assign-

ment and post-processing steps. To tackle this issue, one-

stage methods such as YOLACT (Bolya et al. 2019) produce

instance masks by linearly combining the prototypes with

the mask coefﬁcients and do not depend on pre-detection

step. In this paper, we beneﬁt from the rapid development

of instance segmentation algorithms and focus on applying

the instance segmentation models to tackle the stroke ex-

traction task, thus we mainly consider the well-studied two-

stage methods such as Mask R-CNN as our baselines.

heng

shu

heng xie

gou

CCSE-KaiCCSE-HW

Figure 2: From left to right, samples of annotated Chinese

characters in CCSE-Kai dataset and CCSE-HW dataset.

Proposed Datasets

Image Collection and Annotation

To achieve promising stroke extraction performance, we har-

vest a large number of samples that cover the complex struc-

tures of Chinese characters and different styles of stroke,

which are character-level and stroke-level diversity, respec-

tively. Since the frequently used Chinese characters are re-

stricted to a small range, there may not have enough hand-

written characters with complex stroke structures. Thus, we

collect the frequently used standard font (e.g., Kai Ti) to

meet the character-level diversity requirement. Then, to sat-

isfy the stroke-level diversity, we gather handwritten Chi-

nese character images from different writers. We detail the

process of collection and annotation below.

Kai Ti Image Collection and Annotation Labeling

every stroke in an image is time-consuming and labor-

intensive. Since Kai Ti is a standard Chinese font com-

monly used in daily life, our ﬁrst thought is to collect an

annotation-free Kai Ti dataset by retrieving the spatial in-

formation from its font design database. However, the coor-

dinates of each stoke are not preserved during the font design

process. Thus, we browse the web resources extensively and

discover an open source project Make Me A Hanzi1, which

has constructed a stroke database for Kai Ti. Then, this

project is further evolved by cnchar2, which provides more

1https://github.com/skishore/makemeahanzi

2https://github.com/theajack/cnchar

user-friendly interfaces to access the Kai Ti image stroke-

by-stroke. As shown in Figure 3, the results from cnchar

have a clear stroke-wise mark with light brown denoting the

spatial mask and category of the current stroke. Regarding

the stroke category, the database of cnchar contains the most

frequently used 25 categories (see Figure 1 (a) for details).

(4) dian(1) shu (2) heng zhe (5) pie(3) heng (6) shu

Figure 3: Illustration of the Kai Ti image collection pro-

cess. We use the open source character rendering library cn-

char to generate the images of a Chinese character in a stroke

incremental manner. The character ya is written stroke by

stroke with the stroke highlighted by light brown in the im-

age and the stroke class denoted underneath.

Kai Ti

Character

Stroke-

separable

Handwriting

Stroke-

inseparable

Handwriting

Kai Ti

Character

Stroke-

separable

Handwriting

Stroke-

inseparable

Handwriting

Figure 4: Comparison between the stroke-separable and

stroke-inseparable handwriting. The corresponding Kai

Ti characters are put on the left for reference.

With the assistance of cnchar, we harvest stroke-wise

images from 9,523 unique Kai Ti Chinese characters.

Then, we use OpenCV3to produce the bounding box and

mask annotation from the light brown area, resulting in our

Kaiti CCSE (CCSE-Kai) dataset. The visualization results of

CCSE-Kai are depicted on the left of Figure 2. We can see

that CCSE-Kai provides samples with complex stroke struc-

tures. There are more than 1M stroke instances in CCSE-

Kai and the detailed statistics will be elaborated later. The

merits of our CCSE-Kai are as follows: 1) We discover an

automated method to effectively produce a stroke instance

dataset without extensive human labor. 2) CCSE-Kai sat-

isﬁes the character-level diversity by covering most of the

Chinese characters despite the usage frequency. However, its

shortcoming is obvious: lack of stroke-level diversity since

the stroke in the standard font library is relatively ﬁxed. In

this sense, the model trained with CCSE-Kai may not deliver

satisfactory results in some application scenarios, where ex-

tracting strokes from handwritten Chinese is desired.

Handwritten Image Collection and Annotation Since

CCSE-Kai only meets character-level diversity, we target

at improving the stroke-level diversity of our dataset by

leveraging the handwritten character with various styles. To

this end, we further harvest handwritten Chinese charac-

ters and label them in a stroke instance manner. Speciﬁ-

cally, we leverage the CASIA Ofﬂine Chinese Handwrit-

3https://opencv.org/

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

InstanceSegmentationforChineseCharacterStrokeExtraction,DatasetsandBenchmarksLizhaoLiu1*,KunyangLin1,ShangxinHuang1,ZhongliLi2,ChaoLi3,YunboCao2,andQingyuZhou21SouthChinaUniversityofTechnology,2TencentCloudXiaowei,3XiaomiGroup,selizhaoliu@mail.scut.edu.cn,qingyuzhou@tencent.comAbstractStrokeistheba...

展开>> 收起<<

Instance Segmentation for Chinese Character Stroke Extraction Datasets and Benchmarks Lizhao Liu1 Kunyang Lin1 Shangxin Huang1 Zhongli Li2.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Instance Segmentation for Chinese Character Stroke Extraction Datasets and Benchmarks Lizhao Liu1 Kunyang Lin1 Shangxin Huang1 Zhongli Li2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: