
FontTransformer: Few-shot High-resolution Chinese Glyph Image
Synthesis via Stacked Transformers
Yitian Liua,Zhouhui Lian*a
aWangxuan Institute of Computer Technology, Peking University, Beijing, 100871, China
ARTICLE INFO
Keywords:
font generation
style transfer
Transformers
ABSTRACT
Automatic generation of high-quality Chinese fonts from a few online training samples is a
challenging task, especially when the amount of samples is very small. Existing few-shot font
generation methods can only synthesize low-resolution glyph images that often possess incorrect
topological structures or/and incomplete strokes. To address the problem, this paper proposes
FontTransformer, a novel few-shot learning model, for high-resolution Chinese glyph image
synthesis by using stacked Transformers. The key idea is to apply the parallel Transformer to
avoid the accumulation of prediction errors and utilize the serial Transformer to enhance the
quality of synthesized strokes. Meanwhile, we also design a novel encoding scheme to feed more
glyph information and prior knowledge to our model, which further enables the generation of
high-resolution and visually-pleasing glyph images. Both qualitative and quantitative experi-
mental results demonstrate the superiority of our method compared to other existing approaches
in few-shot Chinese font synthesis task.
1. Introduction
Computer fonts are widely used in our daily lives. The legibility and aesthetic of fonts adopted in books, posters,
advertisements, etc., are critical for their producers during the designing procedures. Thereby, the demands for high-
quality fonts in various styles have increased rapidly. However, font design is a creative and time-consuming task,
especially for font libraries consisting of large amounts of characters (e.g., Chinese). For example, the official character
set GB18030-2000 consists of 27533 Chinese characters, most of which have complicated structures and contain dozens
of strokes [1]. Designing or writing out such large amounts of complex glyphs in a consistent style is time-consuming
and costly. Thus, more and more researchers and companies are interested in developing systems that can automatically
generate high-quality Chinese fonts from a few input samples.
With the help of various neural network architectures (e.g., CNNs and RNNs), researchers have proposed many
DL-based methods for Chinese font synthesis. DL-based methods aim to model the relationship between input and
output data (outlines, glyph images, or writing trajectories). Most of them are CNN-based models, such as zi2zi [2],
EMD [3], and SCFont [4]. Intuitively, we can represent a glyph as the combination of a writing trajectory and a stroke
rendering style. Thus, there are some RNN-based methods (e.g., FontRNN [5]) synthesizing the writing trajectory
for each Chinese character. Despite the great progress made in the last few years, most existing approaches still need
∗Corresponding author
lsflyt@pku.edu.cn (Y. Liu); lianzhouhui@pku.edu.cn (Z. Lian*)
https://www.icst.pku.edu.cn/zlian/ (Z. Lian*)
ORCID(s): 0000-0002-2683-7170 (Z. Lian*)
Yitian Liu, Zhouhui Lian: Preprint submitted to Elsevier Page 1 of 21
arXiv:2210.06301v2 [cs.CV] 13 Oct 2022