
network. Firstly, the local graph takes a segment
3
of the document as input and implements graph
attention among tokens, sentences, paragraphs and
the segment itself. Secondly, the global graph mod-
ule selectively receives the information from the
local graph and compresses it with the stored infor-
mation via multi-head attention. Then, the graph
attention is applied to the global graph to integrate
the global structural information, which is written
back to the local graph nodes to enhance the expres-
sion of local nodes for evidence selection. Thirdly,
the evidence memory network receives and summa-
rizes the evidence selecting results and sends them
into the global network to alleviate the evidence
redundancy problem.
Extensive experiments on two datasets show that
CGSN outperforms previous methods in the evi-
dence selection phase. Using the same answer gen-
erator as the previous methods do, CGSN further
reaches the best results in the answer generation
phase.
Our contributions are as follows:
•
To the best of our knowledge, we are the first
to consider the global structure in the long
document QA task.
• With the enhancement of global structural in-
formation, the proposed model, CGSN outper-
forms previous methods.
2 Related Works
Long Document Question Answering.
Long
Document question answering aims to answer the
question with the comprehension of the long doc-
ument and applies multi-hop reasoning among re-
trieved evidence paragraphs. Dasigi et al. (2021)
take advantage of the pre-trained model LED (Belt-
agy et al.,2020) and treat the input as a long se-
quence to predict the evidence paragraphs and gen-
erate the answer. Zheng et al. (2020a) and Ainslie
et al. (2020) model the structure on the chunked
document to select the evidence paragraph. Al-
though Ainslie et al. (2020) claims that they ex-
plicitly model the structure of long documents, the
input of their model is limited in 4K tokens, which
can be regarded as a relatively long chunk. Gong
et al. (2020) use the recurrent mechanism to en-
able information flow through different chunks for
evidence selection. Karpukhin et al. (2020) and
Zhu et al. (2021) search for relevant evidence from
3A ‘segment’ is a series of paragraphs in a document.
individual paragraphs in the long document. How-
ever, most of these works model the long document
as a flat sequence or consider the local structure
in the document segments while global structural
information of the document is nearly neglected.
Graph Neural Networks.
Graph neural network
(GNN) is popular in various tasks (Yao et al.,2019;
Schlemper et al.,2019) due to its effectiveness in
modeling structural information. Among differ-
ent variants of GNNs, Graph Attention Network
(Velickovic et al.,2018) (GAT) can take advantage
of the attention mechanism in a graph, attending
neighborhood node features to the node by differ-
ent attention weights. Zheng et al. (2020b) make
use of a graph multi-attention network to predict
traffic conditions. Abu-El-Haija et al. (2018) take
advantage of the graph attention to automatically
guide the random walk in graph generation. In
natural language tasks, due to the limit of memory
usage, GAT is often used to model short sequences.
Therefore, modeling the graph structure of the long
sequence is nearly unexplored.
Memory Networks.
Memory network (Weston
et al.,2015) is used in memorizing long-term infor-
mation via learnable reading/writing components.
It is first applied to the QA task for knowledge
base reasoning, which also achieves much progress
in summarization (Cui and Hu,2021) and visual
question answering. To memorize plenty of infor-
mation, the memory network learns to read and
recurrently write into an external memory via at-
tention. Miller et al. (2016) propose Key-Value
Memory Network to flexibly access knowledge for
question answering. Lu et al. (2020) design a con-
text memory for cross-passage evidence reasoning.
However, these methods only consider the memory
on a single level, while structural information is
disregarded.
3 Compressive Graph Selector Network
In this section, we first formalize the long docu-
ment question answering (LDQA) task, and then
introduce the proposed evidence selection model,
i.e.
C
ompressive
G
raph
S
elector
N
etwork (CGSN).
As for the answer generator, we use a vanilla LED
as the answer generator and describe the implemen-
tation details in Appendix C. Finally, we discuss
the advantages of the select-then-read methods over
the end-to-end methods.