HEGEL Hypergraph Transformer for Long Document Summarization Haopeng Zhang and Xiao Liu and Jiawei Zhang IFM Lab Department of Computer Science University of California Davis CA USA

2025-05-06 0 0 1.25MB 11 页 10玖币

侵权投诉

HEGEL: Hypergraph Transformer for Long Document Summarization

Haopeng Zhang and Xiao Liu and Jiawei Zhang

IFM Lab, Department of Computer Science, University of California, Davis, CA, USA

haopeng,xiao,jiawei@ifmlab.org

Abstract

Extractive summarization for long documents

is challenging due to the extended structured

input context. The long-distance sentence

dependency hinders cross-sentence relations

modeling, the critical step of extractive sum-

marization. This paper proposes HEGEL, a

hypergraph neural network for long document

summarization by capturing high-order cross-

sentence relations. HEGEL updates and learns

effective sentence representations with hyper-

graph transformer layers and fuses different

types of sentence dependencies, including la-

tent topics, keywords coreference, and section

structure. We validate HEGEL by conduct-

ing extensive experiments on two benchmark

datasets, and experimental results demonstrate

the effectiveness and efﬁciency of HEGEL.

1 Introduction

Extractive summarization aims to generate a

shorter version of a document while preserving the

most salient information by directly extracting rel-

evant sentences from the original document. With

recent advances in neural networks and large pre-

trained language models (Devlin et al.,2018;Lewis

et al.,2019), researchers have achieved promis-

ing results in news summarization (around 650

words/document) (Nallapati et al.,2016a;Cheng

and Lapata,2016;See et al.,2017;Zhang et al.,

2022;Narayan et al.,2018;Liu and Lapata,2019).

However, these models struggle when applied to

long documents like scientiﬁc papers. The input

length of a scientiﬁc paper can range from

2000

7,000

words, and the expected summary (abstract)

is more than

200

words compared to

words in

news headlines.

Scientiﬁc paper extractive summarization is

highly challenging due to the long structured in-

put. The extended context hinders sequential mod-

els like RNN from capturing sentence-level long-

distance dependency and cross-sentence relations,

Figure 1: An illustration of modeling cross-sentence

relations from section structure, latent topic, and key-

word coreference perspectives.

which are essential for extractive summarization.

In addition, the quadratic computation complexity

of attention with respect to the input tokens length

makes Transformer (Vaswani et al.,2017) based

models not applicable. Moreover, long documents

typically cover diverse topics and have richer struc-

tural information than short news, which is difﬁcult

for sequential models to capture.

As a result, researchers have turned to graph

neural network (GNN) approaches to model cross-

sentence relations. They generally represent a doc-

ument with a sentence-level graph and turn extrac-

tive summarization into a node classiﬁcation prob-

lem. These work construct graph from document

in different manners, such as inter-sentence cosine

similarity graph in (Erkan and Radev,2004;Dong

et al.,2020), Rhetorical Structure Theory (RST)

tree relation graph in (Xu et al.,2019), approximate

discourse graph in (Yasunaga et al.,2017), topic-

sentence graph in (Cui and Hu,2021) and word-

document heterogeneous graph in (Wang et al.,

2020). However, the usability of these approaches

arXiv:2210.04126v1 [cs.CL] 9 Oct 2022

is limited by the following two aspects:

(1)

These

methods only model the pairwise interaction be-

tween sentences, while sentence interactions could

be triadic, tetradic, or of a higher-order in natu-

ral language (Ding et al.,2020). How to capture

high-order cross-sentence relations for extractive

summarization is still an open question. (2) These

graph-based approaches rely on either semantic

or discourses structure cross-sentence relation but

are incapable of fusing sentence interactions from

different perspectives. Sentences within a docu-

ment could have various types of interactions, such

as embedding similarity, keywords coreference,

topical modeling from the semantic perspective,

and section or rhetorical structure from the dis-

course perspective. Capturing multi-type cross-

sentence relations could beneﬁt sentence repre-

sentation learning and sentence salience modeling.

Figure 1is an illustration showing different types of

sentence interactions provide different connectiv-

ity for document graph construction, which covers

both local and global context information.

To address the above issues, we propose HEGEL

(

raph transformer for

xtractive

ong doc-

ument summarization), a graph-based model de-

signed for summarizing long documents with rich

discourse information. To better model high-order

cross-sentence relations, we represent a document

as a hypergraph, a generalization of graph struc-

ture, in which an edge can join any number of ver-

tices. We then introduce three types of hyperedges

that model sentence relations from different per-

spectives, including section structure, latent topic,

and keywords coreference, respectively. We also

propose hypergraph transformer layers to update

and learn effective sentence embeddings on hyper-

graphs. We validate HEGEL by conducting exten-

sive experiments and analyses on two benchmark

datasets, and experimental results demonstrate the

effectiveness and efﬁciency of HEGEL. We high-

light our contributions as follows:

(i)

We propose a hypergraph neural model,

HEGEL, for long document summarization. To

the best of our knowledge, we are the ﬁrst to

model high-order cross-sentence relations with hy-

pergraphs for extractive document summarization.

(ii)

We propose three types of hyperedges (sec-

tion, topic, and keyword) that capture sentence de-

pendency from different perspectives. Hypergraph

transformer layers are then designed to update and

learn effective sentence representations by message

passing on the hypergraph.

(iii)

We validate HEGEL on two benchmarked

datasets (arXiv and PubMed), and the experimental

results demonstrate its effectiveness over state-of-

the-art baselines. We also conduct ablation studies

and qualitative analysis to investigate the model

performance further.

2 Related Works

2.1 Scientiﬁc Paper Summarization

With the promising progress on short news summa-

rization, research interest in long-form documents

like academic papers has arisen. Cohan et al. (2018)

proposed benchmark datasets ArXiv and PubMed,

and employed pointer generator network with hi-

erarchical encoder and discourse-aware decoder.

Xiao and Carenini (2019) proposed an encoder-

decoder model by incorporating global and local

contexts. Ju et al. (2021) introduced an unsuper-

vised extractive approach to summarize long sci-

entiﬁc documents based on the Information Bottle-

neck principle. Dong et al. (2020) came up with

an unsupervised ranking model by incorporating

hierarchical graph representation and asymmetri-

cal positional cues. Recently, Ruan et al. (2022)

proposed to apply pre-trained language model with

hierarchical structure information.

2.2 Graph based summarization

Graph-based models have been exploited for ex-

tractive summarization to capture cross-sentence

dependencies. Unsupervised graph summarization

methods rely on graph connectivity to score and

rank sentences (Radev et al.,2004;Zheng and La-

pata,2019;Dong et al.,2020). Researchers also

explore supervised graph neural networks for sum-

marization. Yasunaga et al. (2017) applied Graph

Convolutional Network (GCN) on the approximate

discourse graph. Xu et al. (2019) proposed to apply

GCN on structural discourse graphs based on RST

trees and coreference mentions. Cui et al. (2020)

leveraged topical information by building topic-

sentence graphs. Recently, Wang et al. (2020) pro-

posed to construct word-document heterogeneous

graphs and use word nodes as the intermediary be-

tween sentences. Jing et al. (2021) proposed to

use multiplex graph to consider different sentence

relations. Our paper follows this line of work on

developing novel graph neural networks for sin-

gle document extractive summarization. The main

difference is that we construct a hypergraph from

(a) (b)

Figure 2: (a) The overall architecture of HEGEL. (b) Two-phase message passing in hypergraph transformer layer

a document that could capture high-order cross-

sentence relations instead of pairwise relations, and

fuse different types of sentence dependencies, in-

cluding section structure, latent topics, and key-

words coreference.

3 Method

In this section, we introduce HEGEL in great detail.

We ﬁrst present how to construct a hypergraph for

a given long document. After encoding sentences

into contextualized representations, we extract their

section, latent topic, and keyword coreference re-

lations and fuse them into a hypergraph. Then,

our hypergraph transformer layer will update and

learn sentence representations according to the hy-

pergraph. Finally, HEGEL will score the salience

of sentences based on the updated sentence repre-

sentations to determine if the sentence should be

included in the summary. The overall architecture

of our model is shown in Figure 2(a).

3.1 Document as a Hypergraph

A hypergraph is deﬁned as a graph

G= (V,E)

where

V={v1, . . . , vn}

represents the set of

nodes, and

E={e1, . . . , em}

represents the set

of hyperedges in the graph. Here each hyperedge

connects two or more nodes (i.e.,

σ(e)≥2)

Speciﬁcally, we use the notations

v∈e

and

v /∈e

to denote node

is connected to hyperedge

not in the graph

, respectively. The topological

structure of hypergraph can also be represented by

its incidence matrix A∈Rn×m:

Aij =1,if vi∈ej

0,if vi/∈ej

(1)

Given a document

D={s1, s2, ..., sn}

, each

sentence

is represented by a corresponding node

vi∈ V

. A Hyperedge

will be created if a sub-

set of nodes

Vj⊂ V

share common semantic or

structural information.

3.1.1 Node Representation

We ﬁrst adopt sentence-BERT (Reimers and

Gurevych,2019) as sentence encoder to embed

the semantic meanings of sentences as

{x1,x2, ..., xn}

. Note that the sentence-BERT is

only used for initial sentence embedding, but not

updated in HEGEL.

To preserve the sequential information, we also

add positional encoding following Transformer

(Vaswani et al.,2017). We adopt the hierarchi-

cal position embedding (Ruan et al.,2022), where

position of each sentence

can be represented as

two parts: the section index of the sentence

psec

and the sentence index in its corresponding section

psen

. The hierarchical position embedding (HPE)

of sentence sican be calculated as:

HPE(si) = γ1PE(psec

i) + γ2PE(psen

i),(2)

where

γ1, γ2

are two hyperparameters to adjust the

scale of positional encoding and

PE(·)

refers to the

position encoding function:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HEGEL:HypergraphTransformerforLongDocumentSummarizationHaopengZhangandXiaoLiuandJiaweiZhangIFMLab,DepartmentofComputerScience,UniversityofCalifornia,Davis,CA,USAhaopeng,xiao,jiawei@ifmlab.orgAbstractExtractivesummarizationforlongdocumentsischallengingduetotheextendedstructuredinputcontext.Thelong-di...

展开>> 收起<<

HEGEL Hypergraph Transformer for Long Document Summarization Haopeng Zhang and Xiao Liu and Jiawei Zhang IFM Lab Department of Computer Science University of California Davis CA USA.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

HEGEL Hypergraph Transformer for Long Document Summarization Haopeng Zhang and Xiao Liu and Jiawei Zhang IFM Lab Department of Computer Science University of California Davis CA USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: