
Contrastive Training Improves Zero-Shot Classification of
Semi-structured Documents
Muhammad Khalifa†∗
,Yogarshi Vyas‡,Shuai Wang‡,
Graham Horwood‡,Sunil Mallya§∗,Miguel Ballesteros‡
University of Michigan†, AWS AI Labs‡, Flip.ai§
khalifam@umich.edu,
{yogarshi,wshui,graham.horwood,ballemig}@amazon.com
Abstract
We investigate semi-structured document clas-
sification in a zero-shot setting. Classification
of semi-structured documents is more chal-
lenging than that of standard unstructured doc-
uments, as positional, layout, and style infor-
mation play a vital role in interpreting such
documents. The standard classification setting
where categories are fixed during both train-
ing and testing falls short in dynamic environ-
ments where new document categories could
potentially emerge. We focus exclusively on
the zero-shot setting where inference is done
on new unseen classes. To address this task,
we propose a matching-based approach that re-
lies on a pairwise contrastive objective for both
pretraining and fine-tuning. Our results show
a significant boost in Macro F1from the pro-
posed pretraining step in both supervised and
unsupervised zero-shot settings.
1 Introduction
Textual information assumes many forms ranging
from unstructured (e.g., text messages) to semi-
structured (e.g., forms, invoices, letters), all the
way to fully structured (e.g., databases or spread-
sheets). Our focus in this work is the classification
of semi-structured documents. A semi-structured
document consists of information that is organized
using a regular visual layout and includes tables,
forms, multi-columns, and (nested) bulleted lists,
and that is either understandable only in the con-
text of its visual layout or that requires substan-
tially more work to understand without the visual
layout. Automatic processing of semi-structured
documents comes with a unique set of challenges
including a non-linear text flow (Wang et al.,2021),
layout inconsistencies, and low-accuracy optical
character recognition. Prior work has shown that
integrating the two-dimensional layout informa-
tion of such documents is critical in models for
∗Work done while at AWS AI Labs.
analyzing such documents (Xu et al.,2020,2021;
Huang et al.,2022;Appalaraju et al.,2021). Due
to these challenges, methods for unstructured doc-
ument classification, such as static word vectors
(Socher et al.,2013) and standard pretrained lan-
guage models (Devlin et al.,2019;Reimers and
Gurevych,2019;Liu et al.,2019) perform poorly
with semi-structured inputs as they model text in
a one-dimensional space and ignore information
about document layout and style (Xu et al.,2020).
Past work on semi-structured document classi-
fication (Harley et al.,2015;Iwana et al.,2016;
Tensmeyer and Martinez,2017;Xu et al.,2020,
2021) has focused exclusively on the full-shot set-
ting, where the target classes are fixed and iden-
tical across training and inference, neglecting the
zero-shot setting (Xian et al.,2018), which requires
generalization to unseen classes during inference.
Our work addresses zero-shot classification of
semi-structured documents in English using the
matching framework, which has been used for
many tasks on unstructured text (Dauphin et al.,
2014;Nam et al.,2016;Pappas and Henderson,
2019;Vyas and Ballesteros,2021;Ma et al.,2022).
Under this framework, a matching (similarity) met-
ric between documents and their assigned classes is
maximized in a joint embedding space. We extend
this matching framework with two enhancements.
First, we use a pairwise contrastive objective (Reth-
meier and Augenstein,2020;Radford et al.,2021;
Gunel et al.,2021) that increases the similarity be-
tween documents and their ground-truth labels, and
decreases it for incorrect pairs of documents and
labels. We augment the textual representations of
documents with layout features representing the
positions of tokens on the page to capture the two-
dimensional nature of the documents. Second, we
propose an unsupervised contrastive pretraining
procedure to warm up the representations of doc-
uments and classes. In summary,
(i)
we study the
zero-shot classification of semi-structured docu-
arXiv:2210.05613v1 [cs.CL] 11 Oct 2022