HEGEL Hypergraph Transformer for Long Document Summarization Haopeng Zhang and Xiao Liu and Jiawei Zhang IFM Lab Department of Computer Science University of California Davis CA USA

2025-05-06 0 0 1.25MB 11 页 10玖币
侵权投诉
HEGEL: Hypergraph Transformer for Long Document Summarization
Haopeng Zhang and Xiao Liu and Jiawei Zhang
IFM Lab, Department of Computer Science, University of California, Davis, CA, USA
haopeng,xiao,jiawei@ifmlab.org
Abstract
Extractive summarization for long documents
is challenging due to the extended structured
input context. The long-distance sentence
dependency hinders cross-sentence relations
modeling, the critical step of extractive sum-
marization. This paper proposes HEGEL, a
hypergraph neural network for long document
summarization by capturing high-order cross-
sentence relations. HEGEL updates and learns
effective sentence representations with hyper-
graph transformer layers and fuses different
types of sentence dependencies, including la-
tent topics, keywords coreference, and section
structure. We validate HEGEL by conduct-
ing extensive experiments on two benchmark
datasets, and experimental results demonstrate
the effectiveness and efficiency of HEGEL.
1 Introduction
Extractive summarization aims to generate a
shorter version of a document while preserving the
most salient information by directly extracting rel-
evant sentences from the original document. With
recent advances in neural networks and large pre-
trained language models (Devlin et al.,2018;Lewis
et al.,2019), researchers have achieved promis-
ing results in news summarization (around 650
words/document) (Nallapati et al.,2016a;Cheng
and Lapata,2016;See et al.,2017;Zhang et al.,
2022;Narayan et al.,2018;Liu and Lapata,2019).
However, these models struggle when applied to
long documents like scientific papers. The input
length of a scientific paper can range from
2000
to
7,000
words, and the expected summary (abstract)
is more than
200
words compared to
40
words in
news headlines.
Scientific paper extractive summarization is
highly challenging due to the long structured in-
put. The extended context hinders sequential mod-
els like RNN from capturing sentence-level long-
distance dependency and cross-sentence relations,
Figure 1: An illustration of modeling cross-sentence
relations from section structure, latent topic, and key-
word coreference perspectives.
which are essential for extractive summarization.
In addition, the quadratic computation complexity
of attention with respect to the input tokens length
makes Transformer (Vaswani et al.,2017) based
models not applicable. Moreover, long documents
typically cover diverse topics and have richer struc-
tural information than short news, which is difficult
for sequential models to capture.
As a result, researchers have turned to graph
neural network (GNN) approaches to model cross-
sentence relations. They generally represent a doc-
ument with a sentence-level graph and turn extrac-
tive summarization into a node classification prob-
lem. These work construct graph from document
in different manners, such as inter-sentence cosine
similarity graph in (Erkan and Radev,2004;Dong
et al.,2020), Rhetorical Structure Theory (RST)
tree relation graph in (Xu et al.,2019), approximate
discourse graph in (Yasunaga et al.,2017), topic-
sentence graph in (Cui and Hu,2021) and word-
document heterogeneous graph in (Wang et al.,
2020). However, the usability of these approaches
arXiv:2210.04126v1 [cs.CL] 9 Oct 2022
is limited by the following two aspects:
(1)
These
methods only model the pairwise interaction be-
tween sentences, while sentence interactions could
be triadic, tetradic, or of a higher-order in natu-
ral language (Ding et al.,2020). How to capture
high-order cross-sentence relations for extractive
summarization is still an open question. (2) These
graph-based approaches rely on either semantic
or discourses structure cross-sentence relation but
are incapable of fusing sentence interactions from
different perspectives. Sentences within a docu-
ment could have various types of interactions, such
as embedding similarity, keywords coreference,
topical modeling from the semantic perspective,
and section or rhetorical structure from the dis-
course perspective. Capturing multi-type cross-
sentence relations could benefit sentence repre-
sentation learning and sentence salience modeling.
Figure 1is an illustration showing different types of
sentence interactions provide different connectiv-
ity for document graph construction, which covers
both local and global context information.
To address the above issues, we propose HEGEL
(
H
yp
E
r
G
raph transformer for
E
xtractive
L
ong doc-
ument summarization), a graph-based model de-
signed for summarizing long documents with rich
discourse information. To better model high-order
cross-sentence relations, we represent a document
as a hypergraph, a generalization of graph struc-
ture, in which an edge can join any number of ver-
tices. We then introduce three types of hyperedges
that model sentence relations from different per-
spectives, including section structure, latent topic,
and keywords coreference, respectively. We also
propose hypergraph transformer layers to update
and learn effective sentence embeddings on hyper-
graphs. We validate HEGEL by conducting exten-
sive experiments and analyses on two benchmark
datasets, and experimental results demonstrate the
effectiveness and efficiency of HEGEL. We high-
light our contributions as follows:
(i)
We propose a hypergraph neural model,
HEGEL, for long document summarization. To
the best of our knowledge, we are the first to
model high-order cross-sentence relations with hy-
pergraphs for extractive document summarization.
(ii)
We propose three types of hyperedges (sec-
tion, topic, and keyword) that capture sentence de-
pendency from different perspectives. Hypergraph
transformer layers are then designed to update and
learn effective sentence representations by message
passing on the hypergraph.
(iii)
We validate HEGEL on two benchmarked
datasets (arXiv and PubMed), and the experimental
results demonstrate its effectiveness over state-of-
the-art baselines. We also conduct ablation studies
and qualitative analysis to investigate the model
performance further.
2 Related Works
2.1 Scientific Paper Summarization
With the promising progress on short news summa-
rization, research interest in long-form documents
like academic papers has arisen. Cohan et al. (2018)
proposed benchmark datasets ArXiv and PubMed,
and employed pointer generator network with hi-
erarchical encoder and discourse-aware decoder.
Xiao and Carenini (2019) proposed an encoder-
decoder model by incorporating global and local
contexts. Ju et al. (2021) introduced an unsuper-
vised extractive approach to summarize long sci-
entific documents based on the Information Bottle-
neck principle. Dong et al. (2020) came up with
an unsupervised ranking model by incorporating
hierarchical graph representation and asymmetri-
cal positional cues. Recently, Ruan et al. (2022)
proposed to apply pre-trained language model with
hierarchical structure information.
2.2 Graph based summarization
Graph-based models have been exploited for ex-
tractive summarization to capture cross-sentence
dependencies. Unsupervised graph summarization
methods rely on graph connectivity to score and
rank sentences (Radev et al.,2004;Zheng and La-
pata,2019;Dong et al.,2020). Researchers also
explore supervised graph neural networks for sum-
marization. Yasunaga et al. (2017) applied Graph
Convolutional Network (GCN) on the approximate
discourse graph. Xu et al. (2019) proposed to apply
GCN on structural discourse graphs based on RST
trees and coreference mentions. Cui et al. (2020)
leveraged topical information by building topic-
sentence graphs. Recently, Wang et al. (2020) pro-
posed to construct word-document heterogeneous
graphs and use word nodes as the intermediary be-
tween sentences. Jing et al. (2021) proposed to
use multiplex graph to consider different sentence
relations. Our paper follows this line of work on
developing novel graph neural networks for sin-
gle document extractive summarization. The main
difference is that we construct a hypergraph from
(a) (b)
Figure 2: (a) The overall architecture of HEGEL. (b) Two-phase message passing in hypergraph transformer layer
a document that could capture high-order cross-
sentence relations instead of pairwise relations, and
fuse different types of sentence dependencies, in-
cluding section structure, latent topics, and key-
words coreference.
3 Method
In this section, we introduce HEGEL in great detail.
We first present how to construct a hypergraph for
a given long document. After encoding sentences
into contextualized representations, we extract their
section, latent topic, and keyword coreference re-
lations and fuse them into a hypergraph. Then,
our hypergraph transformer layer will update and
learn sentence representations according to the hy-
pergraph. Finally, HEGEL will score the salience
of sentences based on the updated sentence repre-
sentations to determine if the sentence should be
included in the summary. The overall architecture
of our model is shown in Figure 2(a).
3.1 Document as a Hypergraph
A hypergraph is defined as a graph
G= (V,E)
,
where
V={v1, . . . , vn}
represents the set of
nodes, and
E={e1, . . . , em}
represents the set
of hyperedges in the graph. Here each hyperedge
e
connects two or more nodes (i.e.,
σ(e)2)
.
Specifically, we use the notations
ve
and
v /e
to denote node
v
is connected to hyperedge
e
or
not in the graph
G
, respectively. The topological
structure of hypergraph can also be represented by
its incidence matrix ARn×m:
Aij =1,if viej
0,if vi/ej
(1)
Given a document
D={s1, s2, ..., sn}
, each
sentence
si
is represented by a corresponding node
vi∈ V
. A Hyperedge
ej
will be created if a sub-
set of nodes
Vj⊂ V
share common semantic or
structural information.
3.1.1 Node Representation
We first adopt sentence-BERT (Reimers and
Gurevych,2019) as sentence encoder to embed
the semantic meanings of sentences as
X=
{x1,x2, ..., xn}
. Note that the sentence-BERT is
only used for initial sentence embedding, but not
updated in HEGEL.
To preserve the sequential information, we also
add positional encoding following Transformer
(Vaswani et al.,2017). We adopt the hierarchi-
cal position embedding (Ruan et al.,2022), where
position of each sentence
si
can be represented as
two parts: the section index of the sentence
psec
i
,
and the sentence index in its corresponding section
psen
i
. The hierarchical position embedding (HPE)
of sentence sican be calculated as:
HPE(si) = γ1PE(psec
i) + γ2PE(psen
i),(2)
where
γ1, γ2
are two hyperparameters to adjust the
scale of positional encoding and
PE(·)
refers to the
position encoding function:
摘要:

HEGEL:HypergraphTransformerforLongDocumentSummarizationHaopengZhangandXiaoLiuandJiaweiZhangIFMLab,DepartmentofComputerScience,UniversityofCalifornia,Davis,CA,USAhaopeng,xiao,jiawei@ifmlab.orgAbstractExtractivesummarizationforlongdocumentsischallengingduetotheextendedstructuredinputcontext.Thelong-di...

展开>> 收起<<
HEGEL Hypergraph Transformer for Long Document Summarization Haopeng Zhang and Xiao Liu and Jiawei Zhang IFM Lab Department of Computer Science University of California Davis CA USA.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.25MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注