KALM Knowledge-Aware Integration of Local Document and Global Contexts for Long Document Understanding Shangbin Feng1Zhaoxuan Tan2Wenqian Zhang2Zhenyu Lei2Yulia Tsvetkov1

2025-05-06 1 0 1.18MB 21 页 10玖币

侵权投诉

KALM: Knowledge-Aware Integration of Local, Document, and Global

Contexts for Long Document Understanding

Shangbin Feng1Zhaoxuan Tan2Wenqian Zhang2Zhenyu Lei2Yulia Tsvetkov1

1University of Washington 2Xi’an Jiaotong University

{shangbin, yuliats}@cs.washington.edu {tanzhaoxuan, 2194510944, fischer}@stu.xjtu.edu.cn

Abstract

With the advent of pretrained language mod-

els (LMs), increasing research efforts have

been focusing on infusing commonsense and

domain-speciﬁc knowledge to prepare LMs

for downstream tasks. These works attempt to

leverage knowledge graphs, the de facto stan-

dard of symbolic knowledge representation,

along with pretrained LMs. While existing ap-

proaches have leveraged external knowledge,

it remains an open question how to jointly in-

corporate knowledge graphs representing vary-

ing contexts—from local (e.g., sentence), to

document-level, to global knowledge—to en-

able knowledge-rich exchange across these

contexts. Such rich contextualization can be

especially beneﬁcial for long document under-

standing tasks since standard pretrained LMs

are typically bounded by the input sequence

length. In light of these challenges, we pro-

pose KALM, a Knowledge-Aware Language

Model that jointly leverages knowledge in lo-

cal, document-level, and global contexts for

long document understanding. KALM ﬁrst en-

codes long documents and knowledge graphs

into the three knowledge-aware context repre-

sentations. It then processes each context with

context-speciﬁc layers, followed by a “con-

text fusion” layer that facilitates knowledge

exchange to derive an overarching document

representation. Extensive experiments demon-

strate that KALM achieves state-of-the-art per-

formance on six long document understand-

ing tasks and datasets. Further analyses re-

veal that the three knowledge-aware contexts

are complementary and they all contribute to

model performance, while the importance and

information exchange patterns of different con-

texts vary with respect to different tasks and

datasets. 1

Code and data are publicly available at

https://github.

com/BunsenFeng/KALM.

1 Introduction

Large language models (LMs) have become the

dominant paradigm in NLP research, while knowl-

edge graphs (KGs) are the de facto standard of

symbolic knowledge representation. Recent ad-

vances in knowledge-aware NLP focus on combin-

ing the two paradigms (Wang et al.,2021b;Zhang

et al.,2021;He et al.,2021), infusing encyclopedic

(Vrandeˇ

ci´

c and Krötzsch,2014;Pellissier Tanon

et al.,2020), commonsense (Speer et al.,2017),

and domain-speciﬁc (Feng et al.,2021;Chang

et al.,2020) knowledge with LMs. Knowledge-

grounded models achieved state-of-the-art perfor-

mance in tasks including question answering (Sun

et al.,2022), commonsense reasoning (Kim et al.,

2022;Liu et al.,2021), and social text analysis

(Zhang et al.,2022;Hu et al.,2021).

Prior approaches to infusing LMs with knowl-

edge typically focused on three hitherto orthogonal

directions: incorporating knowledge related to lo-

cal (e.g., sentence-level), document-level, or global

context.

Local

context approaches argue that sen-

tences mention entities, and the external knowledge

of entities, such as textual descriptions (Balachan-

dran et al.,2021;Wang et al.,2021b) and metadata

(Ostapenko et al.,2022), help LMs realize they are

more than tokens.

Document-level

approaches ar-

gue that core idea entities are repeatedly mentioned

throughout the document, while related concepts

might be discussed in different paragraphs. These

methods attempt to leverage entities and knowledge

across paragraphs with document graphs (Feng

et al.,2021;Zhang et al.,2022;Hu et al.,2021).

Global

context approaches argue that unmentioned

yet connecting entities help connect the dots for

knowledge-based reasoning, thus knowledge graph

subgraphs are encoded with graph neural networks

alongside textual content (Zhang et al.,2021;Ya-

sunaga et al.,2021). However, despite their indi-

vidual pros and cons, how to integrate the three

arXiv:2210.04105v2 [cs.CL] 14 May 2023

document contexts in a knowledge-aware way re-

mains an open problem.

Controlling for varying scopes of knowledge and

context representations could beneﬁt numerous lan-

guage understanding tasks, especially those cen-

tered around long documents. Bounded by the

inherent limitation of input sequence length, exist-

ing knowledge-aware LMs are mostly designed to

handle short texts (Wang et al.,2021b;Zhang et al.,

2021). However, processing long documents con-

taining thousands of tokens (Beltagy et al.,2021)

requires attending to varying document contexts,

disambiguating long-distance co-referring entities

and events, and more.

In light of these challenges, we propose

KALM

nowledge-

ware

anguage

odel for long

document understanding. Speciﬁcally, KALM ﬁrst

derives three context- and knowledge-aware rep-

resentations from the long input document and

an external knowledge graph: the local context

represented as raw text, the document-level con-

text represented as a document graph, and the

global context represented as a knowledge graph

subgraph. KALM layers then encode each con-

text with context-speciﬁc layers, followed by our

proposed novel ContextFusion layers to enable

knowledge-rich information exchange across the

three knowledge-aware contexts. A uniﬁed docu-

ment representation is then derived from context-

speciﬁc representations that also interact with other

contexts. An illustration of the proposed KALM is

presented in Figure 1.

While KALM is a general method for long doc-

ument understanding, we evaluate the model on

six tasks and datasets that are particularly sensi-

tive to broader contexts and external knowledge:

political perspective detection, misinformation de-

tection, and roll call vote prediction. Extensive

experiments demonstrate that KALM outperforms

pretrained LMs, task-agnostic knowledge-aware

baselines, and strong task-speciﬁc baselines on all

six datasets. In ablation experiments, we further

establish KALM’s ability to enable information

exchange, better handle long documents, and im-

prove data efﬁciency. In addition, KALM and the

proposed ContextFusion layers reveal and help in-

terpret the roles and information exchange patterns

of different contexts.

2 KALM Methodology

2.1 Problem Deﬁnition

Let

d={d1,...,dn}

denote a document with

paragraphs, where each paragraph contains a

sequence of

tokens

di={wi1, . . . , wini}

Knowledge-aware long document understanding

assumes the access to an external knowledge graph

KG = (E,R,A, , ϕ)

, where

E={e1, . . . , eN}

denotes the entity set,

R={r1, . . . , rM}

de-

notes the relation set,

is the adjacency ma-

trix where

aij =k

indicates

(ei, rk, ej)∈KG

(·) : E → str

and

ϕ(·) : R → str

map the entities

and relations to their textual descriptions.

Given pre-deﬁned document labels, knowledge-

aware natural language understanding aims to learn

document representations and classify

into its

corresponding label with the help of KG.

2.2 Knowledge-Aware Contexts

We hypothesize that a holistic representation of

long documents should incorporate contexts and

relevant knowledge at three levels: the local context

(e.g., a sentence with descriptions of mentioned en-

tities), the broader document context (e.g., a long

document with cross-paragraph entity reference

structure), and the global/external context repre-

sented as external knowledge (e.g., relevant knowl-

edge base subgraphs). Each of the three contexts

uses different granularities of external knowledge,

while existing works fall short of jointly integrat-

ing the three types of representations. To this end,

KALM ﬁrstly employs different ways to introduce

knowledge in different levels of contexts.

Local context.

Represented as the raw text of

sentences and paragraphs, the local context models

the smallest unit in long document understanding.

Prior works attempted to add sentence metadata

(e.g., tense, sentiment, topic) (Zhang et al.,2022),

adopt sentence-level pretraining tasks based on KG

triples (Wang et al.,2021b), or leverage knowledge

graph embeddings along with textual representa-

tions (Hu et al.,2021). While these methods were

effective, in the face of LM-centered NLP research,

they are ad-hoc add-ons and not fully compatible

with existing pretrained LMs. As a result, KALM

proposes to directly concatenate the textual descrip-

tions of entities

(ei)

to the paragraph if

is men-

tioned. In this way, the original text is directly

augmented with the entity descriptions, informing

the LM that entities such as "Kepler" are more than

Local Context Document Context Global Context

KALM Layer

Local Context Layer Document Context Layer Global Context Layer

Attentive Pooling

Transformer Encoder

Johannes Kepler was a

German Astronomer...

Johannes Kepler Kepler was born on 27 December 1571, in

the Free Imperial City of Weil der Stadt.

Kepler notable

work

main

subject

work

location

doctoral

advisor

write

Astronomy

Johannes

Kepler

somnium

Michael

Maestlin

Graz

With the support of his mentor Michael Maestlin, Kepler

received permission from the Tübingen university.

Kepler

Attentive Pooling Attentive Pooling

... ... ...

...

MLP

Context

Fusion

KALM

layer

...

Kepler lived in an era when there was no clear

distinction between astronomy and astrology...

Kepler

Astronomia

nova

astronomy

Michael Maestlin

Local Context Layer Document Context Layer Global Context Layer

Attentive Pooling

Transformer Encoder

Attentive Pooling Attentive Pooling

... ... ...

...

Context

Fusion

J o h a n n e s K e pl e r i s a key

figure in the 17th-century

Scientific Revolution, best

known for his laws of

planetary motion.

Johannes Kepler

Scientific Revolution

Prediction

Figure 1: Overview of KALM, which encodes long documents and knowledge graphs into local, document, and

global contexts while enabling information exchange across contexts.

mere tokens and help to combat the spurious corre-

lations of pretrained LMs (McMilin). For each aug-

mented paragraph

, we adopt

LM(·)

and mean

pooling to extract a paragraph representation. We

use pretrained BART encoder (Lewis et al.,2020)

LM(·)

without further notice. We also add a

fusion token at the beginning of the paragraph se-

quence for information exchange across contexts.

After processing all

paragraphs, we obtain the

local context representation T(0) as follows:

T(0) ={t(0)

0,...,t(0)

={θrand,LM(d0

1),...,LM(d0

n)}

where

θrand

denotes a randomly initialized vector

of the fusion token in the local context and the

superscript (0) indicates the 0-th layer.

Document-level context.

Represented as the

structure of the full document, the document-

level context is responsible for modeling cross-

paragraph entities and knowledge on a document

level. While existing works attempted to incorpo-

rate external knowledge in documents via docu-

ment graphs (Feng et al.,2021;Hu et al.,2021),

they fall short of leveraging the overlapping entities

and concepts between paragraphs that underpin the

reasoning of long documents. To this end, we pro-

pose knowledge coreference, a simple and effective

mechanism for modeling text-knowledge interac-

tion on the document level. Speciﬁcally, a docu-

ment graph with

n+ 1

nodes is constructed, con-

sisting of one fusion node and

paragraph nodes.

If paragraph

and

both mention entity

in the

external KB, node

and

in the document graph

are connected with relation type

. In addition, the

fusion node is connected to every paragraph node

with a super-relation. As a result, we obtain the ad-

jacency matrix of the document graph

. Paired

with the knowledge-guided GNN to be introduced

in Section 2.3, knowledge coreference enables the

information ﬂow across paragraphs guided by ex-

ternal knowledge. Node feature initialization of the

document graph is as follows:

G(0) ={g(0)

0,...,g(0)

={θrand,LM(d1),...,LM(dn)}

Global context.

Represented as external knowl-

edge graphs, the global context is responsible for

leveraging unseen entities and facilitating KG-

based reasoning. Existing works mainly focused on

extracting knowledge graph subgraphs (Yasunaga

et al.,2021;Zhang et al.,2021) and encoding them

alongside document content. Though many tricks

are proposed to extract and prune KG subgraphs,

in KALM, we employ a straightforward approach:

for all mentioned entities in the long document,

KALM merges their

-hop neighborhood to obtain

a knowledge graph subgraph. We use

k= 2

follow-

ing previous works (Zhang et al.,2021;Vashishth

et al.,2019), striking a balance between KB struc-

ture and computational efﬁciency while KALM

could support any

settings. A fusion entity is

then introduced and connected with every other

entity, resulting in a connected graph. In this way,

KALM cuts back on the preprocessing for model-

ing global knowledge and better preserve the infor-

mation in the KG. Knowledge graph embedding

methods (Bordes et al.,2013) are then adopted to

initialize node features of the KG subgraph:

K(0) ={k(0)

0,...,k(0)

|ρ(d)|}

={θrand,KGE(e1),...,KGE(e|ρ(d)|)}

where

KGE(·)

denotes the knowledge graph em-

beddings trained on the original KG,

|ρ(d)|

indi-

cates the number of mentioned entities identiﬁed in

document

. We use TransE (Bordes et al.,2013)

to learn KB embeddings and use them for

KGE(·)

while the knowledge base embeddings are kept

frozen in the KALM training process.

2.3 KALM Layers

After obtaining the local, document-level, and

global context representations of long documents,

we employ KALM layers to learn document repre-

sentations. Speciﬁcally, each KALM layer consists

of three context-speciﬁc layers to process each con-

text. A ContextFusion layer is then adopted to

enable the knowledge-rich information exchange

across the three contexts.

2.3.1 Context-Speciﬁc Layers

Local context layer.

The local context is repre-

sented as a sequence of vectors extracted from

the knowledge-enriched text with the help of pre-

trained LMs. We adopt transformer encoder layers

(Vaswani et al.,2017) to encode the local context:

T(`)={˜

t(`)

0,...,˜

t(`)

=φTrmEnc({t(`)

0,...,t(`)

n})

where

φ(·)

denotes non-linearity,

TrmEnc

denotes

the transformer encoder layer, and

t(`)

denotes the

transformed representation of the fusion token. We

omit the layer subscript (`)for brevity.

Document-level context layer.

The document-

level context is represented as a document graph

based on knowledge coreference. To better exploit

the entity-based relations in the document graph,

we propose a knowledge-aware GNN architecture

to enable

knowledge-guided message passing

the document graph:

G={˜

g0,...,˜

gn= GNN{g0,...,gn}

where

GNN(·)

denotes the proposed knowledge-

guided graph neural networks as follows:

gi=φαi,iΘgi+X

j∈N (i)

Θgj

where

αi,j

denotes the knowledge-guided attention

weight and is deﬁned as follows:

αi,j =

expELU(aT[Θgi||Θgj||Θf(KGE(ag

ij ))])

Pk∈N (i)expELU(aT[Θgi||Θgk||Θf(KGE(ag

ik))])

where

denotes the transformed representation of

the fusion node,

and

are learnable parameters,

is the

-th row and

-th column value of adja-

cency matrix

of the document graph,

ELU

de-

notes the exponential linear unit activation function

(Clevert et al.,2015), and

f(·)

is a learnable linear

layer.

Θf(KGE(ag

ij ))

is responsible for enabling

the knowledge-guided message passing on the doc-

ument graph, enabling KALM to incorporate the

entity and concept patterns in different paragraphs

and their document-level interactions.

Global context layer.

The global context is

represented as a relevant knowledge graph sub-

graph. We follow previous works and adopt GATs

(Veliˇ

ckovi´

c et al.,2018) to encode the global con-

text:

K={˜

k0,...,˜

k|ρ(d)|}

= GAT{k0,...,k|ρ(d)|}

where

denotes the transformed representation

of the fusion entity.

2.3.2 ContextFusion Layer

The local, document, and global contexts model

external knowledge within sentences, across the

document, and beyond the document. These con-

texts are closely connected and a robust long doc-

ument understanding method should reﬂect their

interactions. Existing approaches mostly leverage

only one or two of the contexts (Wang et al.,2021b;

Feng et al.,2021;Zhang et al.,2022), falling short

of jointly leveraging the three knowledge-aware

contexts. In addition, they mostly adopted direct

concatenation or MLP layers (Zhang et al.,2022,

2021;Hu et al.,2021), falling short of enabling

context-speciﬁc information to ﬂow across con-

texts in a knowledge-rich manner. As a result, we

propose the ContextFusion layer to tackle these

challenges. We ﬁrstly take a local perspective and

extract the representations of the fusion tokens,

nodes, and entities in each context:

htL,gL,kLi=h˜

t0,˜

g0,˜

k0i

We then take a global perspective and use

the fusion token/node/entity as the query to con-

duct attentive pooling

ap(·,·)

across all other to-

kens/nodes/entities in each context:

htG,gG,kGi=hap˜

t0,{˜

ti}n

i=1,

ap˜

g0,{˜

gi}n

i=1,ap˜

k0,{˜

ki}n

i=1i

where attentive pooling ap(·,·)is deﬁned as:

apq,{ki}n

i=1=

i=1

expq·ki

j=1 expq·kjki

In this way, the fusion token/node/entity in each

context serves as the information exchange portal.

We then use a transformer encoder layer to enable

information exchange across the contexts:

h˜

tL,˜

gL,˜

kL,˜

tG,˜

gG,˜

kGi

=φTrmEnchtL,gL,kL,tG,gG,kGi

As a result,

, and

are the representa-

tions of the fusion token/node/entity that incorpo-

rates information from other contexts. We formu-

late the output of the l-th layer as follows:

T(`+1) ={˜

t(`)

L,˜

t(`)

1,...,˜

t(`)

n},

G(`+1) ={˜

g(`)

L,˜

g(`)

1,...,˜

g(`)

n},

K(`+1) ={˜

k(`)

L,˜

k(`)

1,...,˜

k(`)

Our proposed ContextFusion layer is interactive

since it enables the information to ﬂow across dif-

ferent document contexts, instead of direct concate-

nation or hierarchical processing. The attention

weights in

TrmEnc(·)

of the ContextFusion layer

could also provide insights into the roles and im-

portance of each document context, which will be

further explored in Section 3.3. To the best of

our knowledge, KALM is the ﬁrst work to jointly

consider the three levels of document context and

enable information exchange across document con-

texts.

2.4 Learning and Inference

After a total of

KALM layers, we obtain the ﬁ-

nal document representation as

h˜

t(P)

L,˜

g(P)

L,˜

k(P)

Given the document label

a∈ A

, the la-

bel probability is formulated as

p(a|d)∝

expMLPa([˜

t(P)

L,˜

g(P)

L,˜

k(P)

L])

. We then opti-

mize KALM with the cross entropy loss func-

tion. At inference time, the predicted label is

argmaxap(a|d).

3 Experiment

3.1 Experiment Settings

Tasks and Datasets.

We propose KALM, a gen-

eral method for knowledge-aware long document

understanding. We evaluate KALM on three tasks

that especially beneﬁt from external knowledge

and broader context: political perspective detec-

tion, misinformation detection, and roll call vote

prediction. We follow previous works to adopt Se-

mEval (Kiesel et al.,2019) and Allsides (Li and

Goldwasser,2019) for political perspective detec-

tion, LUN (Rashkin et al.,2017) and SLN (Rubin

et al.,2016) for misinformation detection, and the

2 datasets proposed in Mou et al. (2021) for roll

call vote prediction. For external KGs, we follow

existing works to adopt the KGs in KGAP (Feng

et al.,2021), CompareNet (Hu et al.,2021), and

ConceptNet (Speer et al.,2017) for the three tasks.

Baseline methods.

We compare KALM with

three types of baseline methods for holistic evalu-

ation: pretrained LMs, task-agnostic knowledge-

aware methods, and task-speciﬁc models. For pre-

trained LMs, we evaluate RoBERTa (Liu et al.,

2019b), Electra (Clark et al.,2019), DeBERTa (He

et al.,2020), BART (Lewis et al.,2020), and Long-

Former (Beltagy et al.,2020) on the three tasks.

For task-agnostic baselines, we evaluate KGAP

(Feng et al.,2021), GreaseLM (Zhang et al.,2021),

and GreaseLM+ on the three tasks. Task-speciﬁc

models are introduced in the following sections.

For pretrained LMs, task-agnostic methods, and

KALM, we run each method ﬁve times and report

the average performance and standard deviation.

For task-speciﬁc models, we compare with the re-

sults originally reported since we follow the exact

same experiment settings and data splits.

3.2 Model Performance

We present the performance of task-speciﬁc meth-

ods, pretrained LMs, task-agnostic knowledge-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

KALM:Knowledge-AwareIntegrationofLocal,Document,andGlobalContextsforLongDocumentUnderstandingShangbinFeng1ZhaoxuanTan2WenqianZhang2ZhenyuLei2YuliaTsvetkov11UniversityofWashington2Xi'anJiaotongUniversity{shangbin,yuliats}@cs.washington.edu{tanzhaoxuan,2194510944,fischer}@stu.xjtu.edu.cnAbstractWithth...

展开>> 收起<<

KALM Knowledge-Aware Integration of Local Document and Global Contexts for Long Document Understanding Shangbin Feng1Zhaoxuan Tan2Wenqian Zhang2Zhenyu Lei2Yulia Tsvetkov1.pdf

共21页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

KALM Knowledge-Aware Integration of Local Document and Global Contexts for Long Document Understanding Shangbin Feng1Zhaoxuan Tan2Wenqian Zhang2Zhenyu Lei2Yulia Tsvetkov1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: