Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network Yuxiang Nie123 Heyan Huang123 Wei Wei4 Xian-Ling Mao123

2025-04-24 0 0 506.68KB 13 页 10玖币

侵权投诉

Capturing Global Structural Information in Long Document Question

Answering with Compressive Graph Selector Network

Yuxiang Nie123, Heyan Huang123∗

, Wei Wei4, Xian-Ling Mao123

1School of Computer Science and Technology, Beijing Institute of Technology

2Beijing Engineering Research Center of High Volume Language Information Processing

and Cloud Computing Applications

3Beijing Institute of Technology Southeast Academy of Information Technology

4Huazhong University of Science and Technology

{nieyx,hhy63,maoxl}@bit.edu.cn,weiw@hust.edu.cn

Abstract

Long document question answering is a chal-

lenging task due to its demands for complex

reasoning over long text. Previous works usu-

ally take long documents as non-structured ﬂat

texts or only consider the local structure in

long documents. However, these methods usu-

ally ignore the global structure of the long

document, which is essential for long-range

understanding. To tackle this problem, we

propose Compressive Graph Selector Network

(CGSN) to capture the global structure in a

compressive and iterative manner. The pro-

posed model mainly focuses on the evidence

selection phase of long document question an-

swering. Speciﬁcally, it consists of three mod-

ules: local graph network, global graph net-

work and evidence memory network. Firstly,

the local graph network builds the graph struc-

ture of the chunked segment in token, sentence,

paragraph and segment levels to capture the

short-term dependency of the text. Secondly,

the global graph network selectively receives

the information of each level from the local

graph, compresses them into the global graph

nodes and applies graph attention to the global

graph nodes to build the long-range reasoning

over the entire text in an iterative way. Thirdly,

the evidence memory network is designed to

alleviate the redundancy problem in the evi-

dence selection by saving the selected result

in the previous steps. Extensive experiments

show that the proposed model outperforms pre-

vious methods on two datasets.1

1 Introduction

Long document question answering (LDQA) is a

task to select relevant evidence and answer ques-

tions over long text (Dasigi et al.,2021). Compared

to the traditional QA tasks, whose input is often

∗Corresponding author

We have released our codes and data in

https://github.

com/JerrryNie/CGSN.

under 512 tokens

, the input of LDQA can be more

than 20K tokens.

LDQA methods can be divided into two cate-

gories: end-to-end methods and select-then-read

methods. The end-to-end methods usually take a

question and a long text as input to select evidence

and produce the answer in one step. For exam-

ple, Dasigi et al. (2021) use Longformer-Encoder-

Decoder (LED) model to select evidence in the

encoder part and generate answers in the decoder

part. The select-then-read methods ﬁrstly apply an

evidence selection model to obtain evidence pieces

in a long document and then use an answer genera-

tion model to generate answers given the evidence

pieces and the question. These methods mainly fo-

cus on the evidence selection phase. For example,

Karpukhin et al. (2020) and Zhu et al. (2021) select

paragraphs in an open domain retrieving manner.

Zheng et al. (2020a) and Ainslie et al. (2020) build

structure on the chunked documents for evidence

selection. Gong et al. (2020) model information

ﬂows among chunks to enhance the ability of the

model in selecting the evidence. However, most

of the two kinds of works ignore the global struc-

ture of a long document when selecting evidence

pieces, which is crucial to long-range understand-

ing. Therefore, improvement on the evidence se-

lection phase is needed.

Motivated by the human reading process: se-

lectively memorizing the important pieces of

information and integrating them, we propose

an evidence selection model in the select-then-

read method, named

ompressive

raph

elector

etwork (CGSN). It aims to capture the global

structural information in a compressive and itera-

tive manner. Speciﬁcally, the model is composed

of three modules: the local graph network, the

global graph network and the evidence memory

In this paper, ‘token’ means sub-tokens split from a text

sequence by a speciﬁc pre-trained tokenizer.

arXiv:2210.05499v2 [cs.CL] 19 Oct 2022

network. Firstly, the local graph takes a segment

of the document as input and implements graph

attention among tokens, sentences, paragraphs and

the segment itself. Secondly, the global graph mod-

ule selectively receives the information from the

local graph and compresses it with the stored infor-

mation via multi-head attention. Then, the graph

attention is applied to the global graph to integrate

the global structural information, which is written

back to the local graph nodes to enhance the expres-

sion of local nodes for evidence selection. Thirdly,

the evidence memory network receives and summa-

rizes the evidence selecting results and sends them

into the global network to alleviate the evidence

redundancy problem.

Extensive experiments on two datasets show that

CGSN outperforms previous methods in the evi-

dence selection phase. Using the same answer gen-

erator as the previous methods do, CGSN further

reaches the best results in the answer generation

phase.

Our contributions are as follows:

•

To the best of our knowledge, we are the ﬁrst

to consider the global structure in the long

document QA task.

• With the enhancement of global structural in-

formation, the proposed model, CGSN outper-

forms previous methods.

2 Related Works

Long Document Question Answering.

Long

Document question answering aims to answer the

question with the comprehension of the long doc-

ument and applies multi-hop reasoning among re-

trieved evidence paragraphs. Dasigi et al. (2021)

take advantage of the pre-trained model LED (Belt-

agy et al.,2020) and treat the input as a long se-

quence to predict the evidence paragraphs and gen-

erate the answer. Zheng et al. (2020a) and Ainslie

et al. (2020) model the structure on the chunked

document to select the evidence paragraph. Al-

though Ainslie et al. (2020) claims that they ex-

plicitly model the structure of long documents, the

input of their model is limited in 4K tokens, which

can be regarded as a relatively long chunk. Gong

et al. (2020) use the recurrent mechanism to en-

able information ﬂow through different chunks for

evidence selection. Karpukhin et al. (2020) and

Zhu et al. (2021) search for relevant evidence from

3A ‘segment’ is a series of paragraphs in a document.

individual paragraphs in the long document. How-

ever, most of these works model the long document

as a ﬂat sequence or consider the local structure

in the document segments while global structural

information of the document is nearly neglected.

Graph Neural Networks.

Graph neural network

(GNN) is popular in various tasks (Yao et al.,2019;

Schlemper et al.,2019) due to its effectiveness in

modeling structural information. Among differ-

ent variants of GNNs, Graph Attention Network

(Velickovic et al.,2018) (GAT) can take advantage

of the attention mechanism in a graph, attending

neighborhood node features to the node by differ-

ent attention weights. Zheng et al. (2020b) make

use of a graph multi-attention network to predict

trafﬁc conditions. Abu-El-Haija et al. (2018) take

advantage of the graph attention to automatically

guide the random walk in graph generation. In

natural language tasks, due to the limit of memory

usage, GAT is often used to model short sequences.

Therefore, modeling the graph structure of the long

sequence is nearly unexplored.

Memory Networks.

Memory network (Weston

et al.,2015) is used in memorizing long-term infor-

mation via learnable reading/writing components.

It is ﬁrst applied to the QA task for knowledge

base reasoning, which also achieves much progress

in summarization (Cui and Hu,2021) and visual

question answering. To memorize plenty of infor-

mation, the memory network learns to read and

recurrently write into an external memory via at-

tention. Miller et al. (2016) propose Key-Value

Memory Network to ﬂexibly access knowledge for

question answering. Lu et al. (2020) design a con-

text memory for cross-passage evidence reasoning.

However, these methods only consider the memory

on a single level, while structural information is

disregarded.

3 Compressive Graph Selector Network

In this section, we ﬁrst formalize the long docu-

ment question answering (LDQA) task, and then

introduce the proposed evidence selection model,

i.e.

ompressive

raph

elector

etwork (CGSN).

As for the answer generator, we use a vanilla LED

as the answer generator and describe the implemen-

tation details in Appendix C. Finally, we discuss

the advantages of the select-then-read methods over

the end-to-end methods.

Segment

Paragraph

Sentence

Token

Local Graph

Nodes

FFNN

P1 P2 Pn

...

Evidence Memory

Gate

Doc

Paragraph

Sentence

Global Graph

Nodes

Local Graph

Global Graph

Outputs

[CLS] Question [SEP] Paragraph1... [CLS] Question [SEP] ParagraphN [SEP]

T=t-1 T=t+1

T=t

Local graph attention

Receive and

compress

Global graph attention

Local enhancement

Cache evidence

Figure 1: The architecture of CGSN.

3.1 Problem Formulation

The input to LDQA is a question

q= [q1, q2, ...,

qm]

coupled with a document

d= [p1,p2, ...,

pn],pi= [ti1, ti2, ..., ti,ki](1 ≤i≤n)

where

denotes the length of the question,

denotes

the number of paragraphs in the document and

denotes the length of paragraph

. The length of

the document is deﬁned as the sum of the length

of each paragraph:

c=Pn

iki

. In the LDQA set-

ting, the length

is often unlimited, which can be

larger than 20K. The goal of LDQA is to produce

the evidence paragraphs

{pei}eq

i=1

and generate the

free-form answer

a= [a1, a2, ..., ar]

based on

and

, where

pei

denotes the

th paragraph in the

document (the

th paragraph in the evidence set),

eqis the number of evidence for question q.

3.2 Overview of the Model

To explore the global graph structure of long se-

quences, we propose Compressive Graph Selector

Network (CGSN), which operates in an iterative

and compressive way. CGSN is composed of three

modules, local graph network, global graph net-

work and evidence memory network. As shown

in Figure 1, ﬁrstly, in time step

4T=t

, the local

graph network takes the

tth

segment of a document

as the input and models the graph structure in token,

sentence, paragraph and segment levels. Secondly,

the global graph selectively receives the informa-

The ‘time step’ is the order of the segment to be pro-

cessed.

tion from each granularity, compresses them into

the corresponding global graph nodes, implements

the graph attention among the global graph nodes

and sends the global-attended information back to

the local graph to enhance the expression of the lo-

cal graph nodes for evidence selection. Thirdly, the

evidence memory network receives the enhanced

paragraph nodes from the local graph, summarizes

and caches them via the predicting logits. At the

beginning of the time step

T=t+ 1

, the stored

memory is sent and fused with the global graph

nodes in order to alleviate the redundant evidence

selection problem. The detailed architecture is de-

scribed in Appendix A.

3.3 Local Graph Network

Input Format

Let

Segk= [pk,1, ..., pk,Nseg ]

the

th segment in a document, which composed of

Nseg

paragraphs. To build the local graph network,

ﬁrstly, we encode the

Nseg

paragraphs paired with

the question. For each question-paragraph pair, the

input format is “

[CLS] q [SEP] pi[SEP]

”, where

1≤i≤Nseg

. We set the embeddings of each in-

put pair as

E∈R`×dw

, where

is the length of

the input and

is the dimension of the embed-

ding. The

Nseg

embedding sequences are stacked

Ek∈RNseg×`×dw

and sent into the encoder

as follows:

Hk=fe(Ek)(1)

where

Hk∈RNseg×`×dh

is the contextual encod-

ing,

denotes its dimension. In general, we use

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

CapturingGlobalStructuralInformationinLongDocumentQuestionAnsweringwithCompressiveGraphSelectorNetworkYuxiangNie123,HeyanHuang123,WeiWei4,Xian-LingMao1231SchoolofComputerScienceandTechnology,BeijingInstituteofTechnology2BeijingEngineeringResearchCenterofHighVolumeLanguageInformationProcessingandClo...

展开>> 收起<<

Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network Yuxiang Nie123 Heyan Huang123 Wei Wei4 Xian-Ling Mao123.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network Yuxiang Nie123 Heyan Huang123 Wei Wei4 Xian-Ling Mao123

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: