Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network Yuxiang Nie123 Heyan Huang123 Wei Wei4 Xian-Ling Mao123

2025-04-24 0 0 506.68KB 13 页 10玖币
侵权投诉
Capturing Global Structural Information in Long Document Question
Answering with Compressive Graph Selector Network
Yuxiang Nie123, Heyan Huang123
, Wei Wei4, Xian-Ling Mao123
1School of Computer Science and Technology, Beijing Institute of Technology
2Beijing Engineering Research Center of High Volume Language Information Processing
and Cloud Computing Applications
3Beijing Institute of Technology Southeast Academy of Information Technology
4Huazhong University of Science and Technology
{nieyx,hhy63,maoxl}@bit.edu.cn,weiw@hust.edu.cn
Abstract
Long document question answering is a chal-
lenging task due to its demands for complex
reasoning over long text. Previous works usu-
ally take long documents as non-structured flat
texts or only consider the local structure in
long documents. However, these methods usu-
ally ignore the global structure of the long
document, which is essential for long-range
understanding. To tackle this problem, we
propose Compressive Graph Selector Network
(CGSN) to capture the global structure in a
compressive and iterative manner. The pro-
posed model mainly focuses on the evidence
selection phase of long document question an-
swering. Specifically, it consists of three mod-
ules: local graph network, global graph net-
work and evidence memory network. Firstly,
the local graph network builds the graph struc-
ture of the chunked segment in token, sentence,
paragraph and segment levels to capture the
short-term dependency of the text. Secondly,
the global graph network selectively receives
the information of each level from the local
graph, compresses them into the global graph
nodes and applies graph attention to the global
graph nodes to build the long-range reasoning
over the entire text in an iterative way. Thirdly,
the evidence memory network is designed to
alleviate the redundancy problem in the evi-
dence selection by saving the selected result
in the previous steps. Extensive experiments
show that the proposed model outperforms pre-
vious methods on two datasets.1
1 Introduction
Long document question answering (LDQA) is a
task to select relevant evidence and answer ques-
tions over long text (Dasigi et al.,2021). Compared
to the traditional QA tasks, whose input is often
Corresponding author
1
We have released our codes and data in
https://github.
com/JerrryNie/CGSN.
under 512 tokens
2
, the input of LDQA can be more
than 20K tokens.
LDQA methods can be divided into two cate-
gories: end-to-end methods and select-then-read
methods. The end-to-end methods usually take a
question and a long text as input to select evidence
and produce the answer in one step. For exam-
ple, Dasigi et al. (2021) use Longformer-Encoder-
Decoder (LED) model to select evidence in the
encoder part and generate answers in the decoder
part. The select-then-read methods firstly apply an
evidence selection model to obtain evidence pieces
in a long document and then use an answer genera-
tion model to generate answers given the evidence
pieces and the question. These methods mainly fo-
cus on the evidence selection phase. For example,
Karpukhin et al. (2020) and Zhu et al. (2021) select
paragraphs in an open domain retrieving manner.
Zheng et al. (2020a) and Ainslie et al. (2020) build
structure on the chunked documents for evidence
selection. Gong et al. (2020) model information
flows among chunks to enhance the ability of the
model in selecting the evidence. However, most
of the two kinds of works ignore the global struc-
ture of a long document when selecting evidence
pieces, which is crucial to long-range understand-
ing. Therefore, improvement on the evidence se-
lection phase is needed.
Motivated by the human reading process: se-
lectively memorizing the important pieces of
information and integrating them, we propose
an evidence selection model in the select-then-
read method, named
C
ompressive
G
raph
S
elector
N
etwork (CGSN). It aims to capture the global
structural information in a compressive and itera-
tive manner. Specifically, the model is composed
of three modules: the local graph network, the
global graph network and the evidence memory
2
In this paper, ‘token’ means sub-tokens split from a text
sequence by a specific pre-trained tokenizer.
arXiv:2210.05499v2 [cs.CL] 19 Oct 2022
network. Firstly, the local graph takes a segment
3
of the document as input and implements graph
attention among tokens, sentences, paragraphs and
the segment itself. Secondly, the global graph mod-
ule selectively receives the information from the
local graph and compresses it with the stored infor-
mation via multi-head attention. Then, the graph
attention is applied to the global graph to integrate
the global structural information, which is written
back to the local graph nodes to enhance the expres-
sion of local nodes for evidence selection. Thirdly,
the evidence memory network receives and summa-
rizes the evidence selecting results and sends them
into the global network to alleviate the evidence
redundancy problem.
Extensive experiments on two datasets show that
CGSN outperforms previous methods in the evi-
dence selection phase. Using the same answer gen-
erator as the previous methods do, CGSN further
reaches the best results in the answer generation
phase.
Our contributions are as follows:
To the best of our knowledge, we are the first
to consider the global structure in the long
document QA task.
With the enhancement of global structural in-
formation, the proposed model, CGSN outper-
forms previous methods.
2 Related Works
Long Document Question Answering.
Long
Document question answering aims to answer the
question with the comprehension of the long doc-
ument and applies multi-hop reasoning among re-
trieved evidence paragraphs. Dasigi et al. (2021)
take advantage of the pre-trained model LED (Belt-
agy et al.,2020) and treat the input as a long se-
quence to predict the evidence paragraphs and gen-
erate the answer. Zheng et al. (2020a) and Ainslie
et al. (2020) model the structure on the chunked
document to select the evidence paragraph. Al-
though Ainslie et al. (2020) claims that they ex-
plicitly model the structure of long documents, the
input of their model is limited in 4K tokens, which
can be regarded as a relatively long chunk. Gong
et al. (2020) use the recurrent mechanism to en-
able information flow through different chunks for
evidence selection. Karpukhin et al. (2020) and
Zhu et al. (2021) search for relevant evidence from
3A ‘segment’ is a series of paragraphs in a document.
individual paragraphs in the long document. How-
ever, most of these works model the long document
as a flat sequence or consider the local structure
in the document segments while global structural
information of the document is nearly neglected.
Graph Neural Networks.
Graph neural network
(GNN) is popular in various tasks (Yao et al.,2019;
Schlemper et al.,2019) due to its effectiveness in
modeling structural information. Among differ-
ent variants of GNNs, Graph Attention Network
(Velickovic et al.,2018) (GAT) can take advantage
of the attention mechanism in a graph, attending
neighborhood node features to the node by differ-
ent attention weights. Zheng et al. (2020b) make
use of a graph multi-attention network to predict
traffic conditions. Abu-El-Haija et al. (2018) take
advantage of the graph attention to automatically
guide the random walk in graph generation. In
natural language tasks, due to the limit of memory
usage, GAT is often used to model short sequences.
Therefore, modeling the graph structure of the long
sequence is nearly unexplored.
Memory Networks.
Memory network (Weston
et al.,2015) is used in memorizing long-term infor-
mation via learnable reading/writing components.
It is first applied to the QA task for knowledge
base reasoning, which also achieves much progress
in summarization (Cui and Hu,2021) and visual
question answering. To memorize plenty of infor-
mation, the memory network learns to read and
recurrently write into an external memory via at-
tention. Miller et al. (2016) propose Key-Value
Memory Network to flexibly access knowledge for
question answering. Lu et al. (2020) design a con-
text memory for cross-passage evidence reasoning.
However, these methods only consider the memory
on a single level, while structural information is
disregarded.
3 Compressive Graph Selector Network
In this section, we first formalize the long docu-
ment question answering (LDQA) task, and then
introduce the proposed evidence selection model,
i.e.
C
ompressive
G
raph
S
elector
N
etwork (CGSN).
As for the answer generator, we use a vanilla LED
as the answer generator and describe the implemen-
tation details in Appendix C. Finally, we discuss
the advantages of the select-then-read methods over
the end-to-end methods.
Segment
Paragraph
Sentence
Token
Local Graph
Nodes
FFNN
P1 P2 Pn
...
Evidence Memory
.
Gate
Doc
Paragraph
Sentence
Global Graph
Nodes
Local Graph
Global Graph
Outputs
[CLS] Question [SEP] Paragraph1... [CLS] Question [SEP] ParagraphN [SEP]
T=t-1 T=t+1
T=t
Local graph attention
Receive and
compress
Global graph attention
Local enhancement
Cache evidence
Figure 1: The architecture of CGSN.
3.1 Problem Formulation
The input to LDQA is a question
q= [q1, q2, ...,
qm]
coupled with a document
d= [p1,p2, ...,
pn],pi= [ti1, ti2, ..., ti,ki](1 in)
where
m
denotes the length of the question,
n
denotes
the number of paragraphs in the document and
ki
denotes the length of paragraph
i
. The length of
the document is defined as the sum of the length
of each paragraph:
c=Pn
iki
. In the LDQA set-
ting, the length
c
is often unlimited, which can be
larger than 20K. The goal of LDQA is to produce
the evidence paragraphs
{pei}eq
i=1
and generate the
free-form answer
a= [a1, a2, ..., ar]
based on
q
and
d
, where
pei
denotes the
ei
th paragraph in the
document (the
i
th paragraph in the evidence set),
eqis the number of evidence for question q.
3.2 Overview of the Model
To explore the global graph structure of long se-
quences, we propose Compressive Graph Selector
Network (CGSN), which operates in an iterative
and compressive way. CGSN is composed of three
modules, local graph network, global graph net-
work and evidence memory network. As shown
in Figure 1, firstly, in time step
4T=t
, the local
graph network takes the
tth
segment of a document
as the input and models the graph structure in token,
sentence, paragraph and segment levels. Secondly,
the global graph selectively receives the informa-
4
The ‘time step’ is the order of the segment to be pro-
cessed.
tion from each granularity, compresses them into
the corresponding global graph nodes, implements
the graph attention among the global graph nodes
and sends the global-attended information back to
the local graph to enhance the expression of the lo-
cal graph nodes for evidence selection. Thirdly, the
evidence memory network receives the enhanced
paragraph nodes from the local graph, summarizes
and caches them via the predicting logits. At the
beginning of the time step
T=t+ 1
, the stored
memory is sent and fused with the global graph
nodes in order to alleviate the redundant evidence
selection problem. The detailed architecture is de-
scribed in Appendix A.
3.3 Local Graph Network
Input Format
Let
Segk= [pk,1, ..., pk,Nseg ]
be
the
k
th segment in a document, which composed of
Nseg
paragraphs. To build the local graph network,
firstly, we encode the
Nseg
paragraphs paired with
the question. For each question-paragraph pair, the
input format is “
[CLS] q [SEP] pi[SEP]
”, where
1iNseg
. We set the embeddings of each in-
put pair as
ER`×dw
, where
`
is the length of
the input and
dw
is the dimension of the embed-
ding. The
Nseg
embedding sequences are stacked
as
EkRNseg×`×dw
and sent into the encoder
fe
as follows:
Hk=fe(Ek)(1)
where
HkRNseg×`×dh
is the contextual encod-
ing,
dh
denotes its dimension. In general, we use
摘要:

CapturingGlobalStructuralInformationinLongDocumentQuestionAnsweringwithCompressiveGraphSelectorNetworkYuxiangNie123,HeyanHuang123,WeiWei4,Xian-LingMao1231SchoolofComputerScienceandTechnology,BeijingInstituteofTechnology2BeijingEngineeringResearchCenterofHighVolumeLanguageInformationProcessingandClo...

展开>> 收起<<
Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network Yuxiang Nie123 Heyan Huang123 Wei Wei4 Xian-Ling Mao123.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:506.68KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注