is limited by the following two aspects:
(1)
These
methods only model the pairwise interaction be-
tween sentences, while sentence interactions could
be triadic, tetradic, or of a higher-order in natu-
ral language (Ding et al.,2020). How to capture
high-order cross-sentence relations for extractive
summarization is still an open question. (2) These
graph-based approaches rely on either semantic
or discourses structure cross-sentence relation but
are incapable of fusing sentence interactions from
different perspectives. Sentences within a docu-
ment could have various types of interactions, such
as embedding similarity, keywords coreference,
topical modeling from the semantic perspective,
and section or rhetorical structure from the dis-
course perspective. Capturing multi-type cross-
sentence relations could benefit sentence repre-
sentation learning and sentence salience modeling.
Figure 1is an illustration showing different types of
sentence interactions provide different connectiv-
ity for document graph construction, which covers
both local and global context information.
To address the above issues, we propose HEGEL
(
H
yp
E
r
G
raph transformer for
E
xtractive
L
ong doc-
ument summarization), a graph-based model de-
signed for summarizing long documents with rich
discourse information. To better model high-order
cross-sentence relations, we represent a document
as a hypergraph, a generalization of graph struc-
ture, in which an edge can join any number of ver-
tices. We then introduce three types of hyperedges
that model sentence relations from different per-
spectives, including section structure, latent topic,
and keywords coreference, respectively. We also
propose hypergraph transformer layers to update
and learn effective sentence embeddings on hyper-
graphs. We validate HEGEL by conducting exten-
sive experiments and analyses on two benchmark
datasets, and experimental results demonstrate the
effectiveness and efficiency of HEGEL. We high-
light our contributions as follows:
(i)
We propose a hypergraph neural model,
HEGEL, for long document summarization. To
the best of our knowledge, we are the first to
model high-order cross-sentence relations with hy-
pergraphs for extractive document summarization.
(ii)
We propose three types of hyperedges (sec-
tion, topic, and keyword) that capture sentence de-
pendency from different perspectives. Hypergraph
transformer layers are then designed to update and
learn effective sentence representations by message
passing on the hypergraph.
(iii)
We validate HEGEL on two benchmarked
datasets (arXiv and PubMed), and the experimental
results demonstrate its effectiveness over state-of-
the-art baselines. We also conduct ablation studies
and qualitative analysis to investigate the model
performance further.
2 Related Works
2.1 Scientific Paper Summarization
With the promising progress on short news summa-
rization, research interest in long-form documents
like academic papers has arisen. Cohan et al. (2018)
proposed benchmark datasets ArXiv and PubMed,
and employed pointer generator network with hi-
erarchical encoder and discourse-aware decoder.
Xiao and Carenini (2019) proposed an encoder-
decoder model by incorporating global and local
contexts. Ju et al. (2021) introduced an unsuper-
vised extractive approach to summarize long sci-
entific documents based on the Information Bottle-
neck principle. Dong et al. (2020) came up with
an unsupervised ranking model by incorporating
hierarchical graph representation and asymmetri-
cal positional cues. Recently, Ruan et al. (2022)
proposed to apply pre-trained language model with
hierarchical structure information.
2.2 Graph based summarization
Graph-based models have been exploited for ex-
tractive summarization to capture cross-sentence
dependencies. Unsupervised graph summarization
methods rely on graph connectivity to score and
rank sentences (Radev et al.,2004;Zheng and La-
pata,2019;Dong et al.,2020). Researchers also
explore supervised graph neural networks for sum-
marization. Yasunaga et al. (2017) applied Graph
Convolutional Network (GCN) on the approximate
discourse graph. Xu et al. (2019) proposed to apply
GCN on structural discourse graphs based on RST
trees and coreference mentions. Cui et al. (2020)
leveraged topical information by building topic-
sentence graphs. Recently, Wang et al. (2020) pro-
posed to construct word-document heterogeneous
graphs and use word nodes as the intermediary be-
tween sentences. Jing et al. (2021) proposed to
use multiplex graph to consider different sentence
relations. Our paper follows this line of work on
developing novel graph neural networks for sin-
gle document extractive summarization. The main
difference is that we construct a hypergraph from