Efficient few-shot learning for pixel-precise handwritten document layout analysis

2025-08-18 0 0 9.77MB 9 页 10玖币

侵权投诉

Efﬁcient few-shot learning for pixel-precise handwritten document layout

analysis

Axel De Nardin1, Silvia Zottin1, Matteo Paier1, Gian Luca Foresti1, Emanuela Colombi1,

Claudio Piciarelli1

1University of Udine

{denardin.axel, zottin.silvia, paier.matteo}@spes.uniud.it

{gianluca.foresti, emanuela.colombi, claudio.piciarelli}@uniud.it

Abstract

Layout analysis is a task of uttermost importance in

ancient handwritten document analysis and represents a

fundamental step toward the simpliﬁcation of subsequent

tasks such as optical character recognition and automatic

transcription. However, many of the approaches adopted

to solve this problem rely on a fully supervised learning

paradigm. While these systems achieve very good perfor-

mance on this task, the drawback is that pixel-precise text

labeling of the entire training set is a very time-consuming

process, which makes this type of information rarely avail-

able in a real-world scenario. In the present paper, we

address this problem by proposing an efﬁcient few-shot

learning framework that achieves performances compara-

ble to current state-of-the-art fully supervised methods on

the publicly available DIVA-HisDB dataset.

1. Introduction

Document image layout analysis is a very important

task for the humanities community for the study of an-

cient manuscripts [21]. In particular, the page segmentation

of a given document image into semantically meaningful

regions (e.g. main text, comments, decorations and back-

ground) allows them to easier and quicker study the docu-

ment and represents a fundamental step toward the simpli-

ﬁcation of subsequent tasks such as optical character recog-

nition [16] and automatic transcription [11].

Document layout analysis is a particularly challenging

task when referring to historical manuscripts. Compared to

machine-printed documents [19], ancient texts exhibit many

variations such as layout structure, decorations and differ-

ent writing styles. For example, in many manuscripts, the

main text body is entwined with additions, corrections and

marginal or interlinear glosses [21], often made by differ-

ent authors at different times. Furthermore, historical docu-

ment pages frequently suffer from high degradation due to

aging, ink stains, noise, scratches and bad conservation [9].

In addition to all these factors, even the image acquisition of

ancient text may not be appropriate with illumination issues

or inconsistencies and scan curve problems [2].

Due to the non-uniformity and integrity of the images,

many of the approaches adopted to solve this problem

rely on a fully supervised learning paradigm [15, 24, 18].

While these systems achieve very good performance on this

task, they usually need a large number of annotated im-

ages for training. The Ground Truth (GT) represented by

these annotations is critical for training and evaluating doc-

ument analysis methods, especially for complex historical

manuscripts that exhibit challenging layouts with interfer-

ing and overlapping handwriting [12]. The drawback is

that pixel-precise annotation of the entire dataset of histori-

cal document pages requires speciﬁc domain knowledge as

well as being a very time-consuming process, making this

type of information rarely available in a real-world scenario.

Nonetheless, few-shot learning approaches are still under-

explored in the literature for this task. This paper tackles

all the above issues by proposing a novel few-shot learning

framework for efﬁcient pixel-precise layout segmentation of

historical documents. In particular, we propose two original

contributions: ﬁrst, a dynamic instance generation process

that aims at providing a way of efﬁciently leveraging the

limited data available in this scenario and second a segmen-

tation map reﬁnement process that provides a way of im-

proving the precision of the annotation predictions provided

by the adopted model. By combining these two components

with a powerful DeepCNN backbone network we are able

to achieve performances comparable to the ones obtained

by current state-of-the-art fully supervised approaches.

The rest of this paper is organized as follows. Section 2

arXiv:2210.15570v1 [cs.CV] 27 Oct 2022

(a) CSG18 page (b) CSG863 page (c) CB55 page

(d) CSG18 detail (e) CSG863 detail (f) CB55 detail

Figure 1: Samples from the 3 manuscripts (CSG18, CSG863 and CB55) presents in DIVA-HisDB dataset [22]. Fig. 1a– 1c

show a full page for each manuscripts, while Fig. 1d– 1f show a detail extracted from each of them.

gives an overview of some related work in page segmen-

tation for historical document images. Then Section 3 de-

scribes the three components deﬁning the proposed frame-

work. Section 4 reports the details of our experimental setup

as well as providing an overview of the obtained results. Fi-

nally, in Section 5, are drawn the conclusions of this work

and discuss the ideas for future work.

2. Related work

Many different approaches have been proposed to tackle

the layout analysis, especially for handwritten historical

documents. This section reviews some representative state-

of-the-art methods for historical document image segmen-

tation. In general, the techniques employed for document

layout analysis are usually divided into three categories:

bottom-up, top-down and hybrid [3].

The bottom-up strategy derives document analysis dy-

namically from smaller granularity data levels such as pix-

els and connected components. Then, the analysis grows up

to form larger document regions and stops once it reaches

a page segmentation into different regions with uniform el-

ements. These techniques are ﬂexible and do not require

any prior knowledge of the layout structure. However, usu-

ally, they demand many labeled training data that is often

not available, especially in the domain of historical docu-

ments where highly specialized expertise is needed to label

the data.

On the contrary, top-down approaches assume that pages

have a well-deﬁned structure and layout. Various character-

istics of the document page structure are then considered,

such as white space between text regions, size of text blocks

and the measures between main texts and paratext [9]. The

page segmentation process then starts from the whole page

and cuts it into areas to produce small homogeneous re-

gions. In general, the top-down methods are easily applica-

ble but not suitable for complex layouts such as handwritten

historical documents. In addition, these methods depend on

the layout structure of the document, so they have a low

generalization capability.

Even though the research of this technique is well es-

tablished, there are still many challenging issues that nei-

ther bottom-up nor top-down strategies can address appro-

priately. For this reason, the hybrid strategy has been iden-

tiﬁed and derives from the integration of the other two main

categories [3]. Over the years, many techniques have been

used to address this task, from classical computer vision al-

gorithms to deep learning methods.

Chen et al. [5] used a convolutional autoencoder to learn

the features directly from the pixel intensity values. Then,

by using these features to train a Support Vector Machines

(SVM), this method got high-quality segmentation without

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Efficientfew-shotlearningforpixel-precisehandwrittendocumentlayoutanalysisAxelDeNardin1,SilviaZottin1,MatteoPaier1,GianLucaForesti1,EmanuelaColombi1,ClaudioPiciarelli11UniversityofUdine{denardin.axel,zottin.silvia,paier.matteo}@spes.uniud.it{gianluca.foresti,emanuela.colombi,claudio.piciarelli}@uniu...

展开>> 收起<<

Efficient few-shot learning for pixel-precise handwritten document layout analysis.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efficient few-shot learning for pixel-precise handwritten document layout analysis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: