
KALM: Knowledge-Aware Integration of Local, Document, and Global
Contexts for Long Document Understanding
Shangbin Feng1Zhaoxuan Tan2Wenqian Zhang2Zhenyu Lei2Yulia Tsvetkov1
1University of Washington 2Xi’an Jiaotong University
{shangbin, yuliats}@cs.washington.edu {tanzhaoxuan, 2194510944, fischer}@stu.xjtu.edu.cn
Abstract
With the advent of pretrained language mod-
els (LMs), increasing research efforts have
been focusing on infusing commonsense and
domain-specific knowledge to prepare LMs
for downstream tasks. These works attempt to
leverage knowledge graphs, the de facto stan-
dard of symbolic knowledge representation,
along with pretrained LMs. While existing ap-
proaches have leveraged external knowledge,
it remains an open question how to jointly in-
corporate knowledge graphs representing vary-
ing contexts—from local (e.g., sentence), to
document-level, to global knowledge—to en-
able knowledge-rich exchange across these
contexts. Such rich contextualization can be
especially beneficial for long document under-
standing tasks since standard pretrained LMs
are typically bounded by the input sequence
length. In light of these challenges, we pro-
pose KALM, a Knowledge-Aware Language
Model that jointly leverages knowledge in lo-
cal, document-level, and global contexts for
long document understanding. KALM first en-
codes long documents and knowledge graphs
into the three knowledge-aware context repre-
sentations. It then processes each context with
context-specific layers, followed by a “con-
text fusion” layer that facilitates knowledge
exchange to derive an overarching document
representation. Extensive experiments demon-
strate that KALM achieves state-of-the-art per-
formance on six long document understand-
ing tasks and datasets. Further analyses re-
veal that the three knowledge-aware contexts
are complementary and they all contribute to
model performance, while the importance and
information exchange patterns of different con-
texts vary with respect to different tasks and
datasets. 1
1
Code and data are publicly available at
https://github.
com/BunsenFeng/KALM.
1 Introduction
Large language models (LMs) have become the
dominant paradigm in NLP research, while knowl-
edge graphs (KGs) are the de facto standard of
symbolic knowledge representation. Recent ad-
vances in knowledge-aware NLP focus on combin-
ing the two paradigms (Wang et al.,2021b;Zhang
et al.,2021;He et al.,2021), infusing encyclopedic
(Vrandeˇ
ci´
c and Krötzsch,2014;Pellissier Tanon
et al.,2020), commonsense (Speer et al.,2017),
and domain-specific (Feng et al.,2021;Chang
et al.,2020) knowledge with LMs. Knowledge-
grounded models achieved state-of-the-art perfor-
mance in tasks including question answering (Sun
et al.,2022), commonsense reasoning (Kim et al.,
2022;Liu et al.,2021), and social text analysis
(Zhang et al.,2022;Hu et al.,2021).
Prior approaches to infusing LMs with knowl-
edge typically focused on three hitherto orthogonal
directions: incorporating knowledge related to lo-
cal (e.g., sentence-level), document-level, or global
context.
Local
context approaches argue that sen-
tences mention entities, and the external knowledge
of entities, such as textual descriptions (Balachan-
dran et al.,2021;Wang et al.,2021b) and metadata
(Ostapenko et al.,2022), help LMs realize they are
more than tokens.
Document-level
approaches ar-
gue that core idea entities are repeatedly mentioned
throughout the document, while related concepts
might be discussed in different paragraphs. These
methods attempt to leverage entities and knowledge
across paragraphs with document graphs (Feng
et al.,2021;Zhang et al.,2022;Hu et al.,2021).
Global
context approaches argue that unmentioned
yet connecting entities help connect the dots for
knowledge-based reasoning, thus knowledge graph
subgraphs are encoded with graph neural networks
alongside textual content (Zhang et al.,2021;Ya-
sunaga et al.,2021). However, despite their indi-
vidual pros and cons, how to integrate the three
arXiv:2210.04105v2 [cs.CL] 14 May 2023