Efficient few-shot learning for pixel-precise handwritten document layout analysis

2025-08-18 0 0 9.77MB 9 页 10玖币
侵权投诉
Efficient few-shot learning for pixel-precise handwritten document layout
analysis
Axel De Nardin1, Silvia Zottin1, Matteo Paier1, Gian Luca Foresti1, Emanuela Colombi1,
Claudio Piciarelli1
1University of Udine
{denardin.axel, zottin.silvia, paier.matteo}@spes.uniud.it
{gianluca.foresti, emanuela.colombi, claudio.piciarelli}@uniud.it
Abstract
Layout analysis is a task of uttermost importance in
ancient handwritten document analysis and represents a
fundamental step toward the simplification of subsequent
tasks such as optical character recognition and automatic
transcription. However, many of the approaches adopted
to solve this problem rely on a fully supervised learning
paradigm. While these systems achieve very good perfor-
mance on this task, the drawback is that pixel-precise text
labeling of the entire training set is a very time-consuming
process, which makes this type of information rarely avail-
able in a real-world scenario. In the present paper, we
address this problem by proposing an efficient few-shot
learning framework that achieves performances compara-
ble to current state-of-the-art fully supervised methods on
the publicly available DIVA-HisDB dataset.
1. Introduction
Document image layout analysis is a very important
task for the humanities community for the study of an-
cient manuscripts [21]. In particular, the page segmentation
of a given document image into semantically meaningful
regions (e.g. main text, comments, decorations and back-
ground) allows them to easier and quicker study the docu-
ment and represents a fundamental step toward the simpli-
fication of subsequent tasks such as optical character recog-
nition [16] and automatic transcription [11].
Document layout analysis is a particularly challenging
task when referring to historical manuscripts. Compared to
machine-printed documents [19], ancient texts exhibit many
variations such as layout structure, decorations and differ-
ent writing styles. For example, in many manuscripts, the
main text body is entwined with additions, corrections and
marginal or interlinear glosses [21], often made by differ-
ent authors at different times. Furthermore, historical docu-
ment pages frequently suffer from high degradation due to
aging, ink stains, noise, scratches and bad conservation [9].
In addition to all these factors, even the image acquisition of
ancient text may not be appropriate with illumination issues
or inconsistencies and scan curve problems [2].
Due to the non-uniformity and integrity of the images,
many of the approaches adopted to solve this problem
rely on a fully supervised learning paradigm [15, 24, 18].
While these systems achieve very good performance on this
task, they usually need a large number of annotated im-
ages for training. The Ground Truth (GT) represented by
these annotations is critical for training and evaluating doc-
ument analysis methods, especially for complex historical
manuscripts that exhibit challenging layouts with interfer-
ing and overlapping handwriting [12]. The drawback is
that pixel-precise annotation of the entire dataset of histori-
cal document pages requires specific domain knowledge as
well as being a very time-consuming process, making this
type of information rarely available in a real-world scenario.
Nonetheless, few-shot learning approaches are still under-
explored in the literature for this task. This paper tackles
all the above issues by proposing a novel few-shot learning
framework for efficient pixel-precise layout segmentation of
historical documents. In particular, we propose two original
contributions: first, a dynamic instance generation process
that aims at providing a way of efficiently leveraging the
limited data available in this scenario and second a segmen-
tation map refinement process that provides a way of im-
proving the precision of the annotation predictions provided
by the adopted model. By combining these two components
with a powerful DeepCNN backbone network we are able
to achieve performances comparable to the ones obtained
by current state-of-the-art fully supervised approaches.
The rest of this paper is organized as follows. Section 2
arXiv:2210.15570v1 [cs.CV] 27 Oct 2022
(a) CSG18 page (b) CSG863 page (c) CB55 page
(d) CSG18 detail (e) CSG863 detail (f) CB55 detail
Figure 1: Samples from the 3 manuscripts (CSG18, CSG863 and CB55) presents in DIVA-HisDB dataset [22]. Fig. 1a– 1c
show a full page for each manuscripts, while Fig. 1d– 1f show a detail extracted from each of them.
gives an overview of some related work in page segmen-
tation for historical document images. Then Section 3 de-
scribes the three components defining the proposed frame-
work. Section 4 reports the details of our experimental setup
as well as providing an overview of the obtained results. Fi-
nally, in Section 5, are drawn the conclusions of this work
and discuss the ideas for future work.
2. Related work
Many different approaches have been proposed to tackle
the layout analysis, especially for handwritten historical
documents. This section reviews some representative state-
of-the-art methods for historical document image segmen-
tation. In general, the techniques employed for document
layout analysis are usually divided into three categories:
bottom-up, top-down and hybrid [3].
The bottom-up strategy derives document analysis dy-
namically from smaller granularity data levels such as pix-
els and connected components. Then, the analysis grows up
to form larger document regions and stops once it reaches
a page segmentation into different regions with uniform el-
ements. These techniques are flexible and do not require
any prior knowledge of the layout structure. However, usu-
ally, they demand many labeled training data that is often
not available, especially in the domain of historical docu-
ments where highly specialized expertise is needed to label
the data.
On the contrary, top-down approaches assume that pages
have a well-defined structure and layout. Various character-
istics of the document page structure are then considered,
such as white space between text regions, size of text blocks
and the measures between main texts and paratext [9]. The
page segmentation process then starts from the whole page
and cuts it into areas to produce small homogeneous re-
gions. In general, the top-down methods are easily applica-
ble but not suitable for complex layouts such as handwritten
historical documents. In addition, these methods depend on
the layout structure of the document, so they have a low
generalization capability.
Even though the research of this technique is well es-
tablished, there are still many challenging issues that nei-
ther bottom-up nor top-down strategies can address appro-
priately. For this reason, the hybrid strategy has been iden-
tified and derives from the integration of the other two main
categories [3]. Over the years, many techniques have been
used to address this task, from classical computer vision al-
gorithms to deep learning methods.
Chen et al. [5] used a convolutional autoencoder to learn
the features directly from the pixel intensity values. Then,
by using these features to train a Support Vector Machines
(SVM), this method got high-quality segmentation without
摘要:

Efficientfew-shotlearningforpixel-precisehandwrittendocumentlayoutanalysisAxelDeNardin1,SilviaZottin1,MatteoPaier1,GianLucaForesti1,EmanuelaColombi1,ClaudioPiciarelli11UniversityofUdine{denardin.axel,zottin.silvia,paier.matteo}@spes.uniud.it{gianluca.foresti,emanuela.colombi,claudio.piciarelli}@uniu...

展开>> 收起<<
Efficient few-shot learning for pixel-precise handwritten document layout analysis.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:9.77MB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注