A Uniﬁed Encoder-Decoder Framework with Entity Memory Zhihan Zhang1 Wenhao Yu1 Chenguang Zhu2 Meng Jiang1 1University of Notre Dame Notre Dame IN USA

2025-04-27 0 0 554.51KB 17 页 10玖币

侵权投诉

A Uniﬁed Encoder-Decoder Framework with Entity Memory

Zhihan Zhang1, Wenhao Yu1, Chenguang Zhu2, Meng Jiang1

1University of Notre Dame, Notre Dame, IN, USA

2Microsoft Cognitive Services Research, Redmond, WA, USA

1{zzhang23, wyu1, mjiang2}@nd.edu;2chezhu@microsoft.com

Abstract

Entities, as important carriers of real-world

knowledge, play a key role in many NLP tasks.

We focus on incorporating entity knowledge

into an encoder-decoder framework for infor-

mative text generation. Existing approaches

tried to index, retrieve, and read external doc-

uments as evidence, but they suffered from a

large computational overhead. In this work,

we propose an Encoder-Decoder framework

with an entity Memory, namely EDMem. The

entity knowledge is stored in the memory

as latent representations, and the memory is

pre-trained on Wikipedia along with encoder-

decoder parameters. To precisely generate en-

tity names, we design three decoding meth-

ods to constrain entity generation by linking

entities in the memory. EDMem is a uni-

ﬁed framework that can be used on various

entity-intensive question answering and gen-

eration tasks. Extensive experimental results

show that EDMem outperforms both memory-

based auto-encoder models and non-memory

encoder-decoder models.1

1 Introduction

A large amount of real-world knowledge is related

to entities, e.g., persons, nations, and events. Entity

knowledge is the information describing facts and

attributes related to entities. Many entity-intensive

NLP tasks require models obtain entity knowledge

to generate informative outputs, such as answer-

ing factual questions (Kwiatkowski et al.,2019),

explaining claims (Onoe et al.,2021), or making

informative conversations (Dinan et al.,2019). Pre-

trained encoder-decoder models can be directly ap-

plied on such entity-intensive tasks (Ye et al.,2020;

Roberts et al.,2020), but their ability to store and

use knowledge is still questionable (Lewis et al.,

2021;Wang et al.,2021). A popular approach

Code will be available at

https://github.com/

DM2-ND/EDMem

Transformer

Who wrote [Es] Evening Class [Ee] ?

Entity Linking

Head

Entity

Memory

Transformer

[BOS] [Es] Maeve Binchy

[Es] Maeve Binchy [Ee]

Language

Modeling Head

Encoder Decoder

Figure 1: An overview of the EDMem framework. H

denotes the ﬁnal hidden states of the encoder.

to incorporate knowledge into the generation pro-

cess is retrieving evidence documents from external

sources (Lewis et al.,2020b;Izacard and Grave,

2021;Oguz et al.,2020;Yu et al.,2022c). How-

ever, they suffer from signiﬁcant computational

overheads in indexing, retrieving, and reading a

large number of extra documents (Lee et al.,2021;

de Jong et al.,2022). Therefore, it is important

to give encoder-decoder models access to entity

knowledge without sacriﬁcing too much efﬁciency.

Recently it has been proposed to use an in-model

memory to augment auto-encoder models with en-

tity knowledge on entity linking tasks (Févry et al.,

2020;Verga et al.,2021;Sun et al.,2021). The en-

tity memory stores entity knowledge as dense vec-

tors which can be directly incorporated into the hid-

den states of Transformer models (Vaswani et al.,

2017), with no need to encode extra text. How-

ever, the auto-encoder framework in previous ap-

proaches can only select entities from a pre-deﬁned

entity vocabulary. Hence, they are not able to give

an entity outside the vocabulary, nor to generate

answers or text beyond a single entity.

In this paper, we propose a novel

ncoder-

ecoder framework with an entity

Mem

ory (ED-

Mem), as shown in Figure 1. EDMem is a uni-

arXiv:2210.03273v3 [cs.CL] 24 Apr 2023

ﬁed framework on various entity-intensive QA and

generation tasks, in which we train an entity mem-

ory for efﬁcient knowledge incorporation. First,

EDMem is pre-trained on Wikipedia documents,

where it learns entity embeddings in the memory

along with an encoder-decoder model. EDMem

learns to select relevant entities from the memory

via an entity linking objective, and learns to gener-

ate answers using entity knowledge via a language

modeling objective. Second, to precisely generate

entity names, we design three decoding methods

that utilize the entity linking ability of EDMem

in its generation process, when we ﬁne-tune it on

downstream tasks. These include (1) free-form:

left-to-right generation with entity identiﬁers; (2)

static entity linking: ﬁrst select entities by entity

linking, build preﬁx trees for the selected entities,

and then perform constrained entity generation us-

ing the trees; (3) dynamic entity linking: select

entities on-the-ﬂy for constrained entity generation.

We conduct experiments on two popular testbeds

of entity knowledge: open-domain QA and entity-

intensive generation. With the incorporation of en-

tity knowledge, EDMem outperforms non-memory

encoder-decoder models on both tasks, and it re-

tains the efﬁciency advantage of closed-book (i.e.,

non-retrieval) models. Compared to memory-based

auto-encoders, EDMem achieves both higher over-

all accuracy (+9%) and better entity precision

(+8%) on open-domain QA datasets, and it gener-

ates high-quality text from the memory-supported

decoder on generation datasets when auto-encoders

fail to do so. To summarize, EDMem is the ﬁrst

knowledge-augmented closed-book framework to

perform both tasks in a uniﬁed manner.

2 Related Work

Closed-Book Models

Closed-book models are

pre-trained models that store knowledge in their

own parameters. For example, COMET (Bosse-

lut et al.,2019) ﬁne-tuned GPT2 (Radford et al.,

2018) to construct knowledge graphs by gener-

ating commonsense triples. Recently, ﬁne-tuned

BART (Lewis et al.,2020a) or T5 (Raffel et al.,

2020) models are proved to be competitive on

open-domain QA (Ye et al.,2020;Roberts et al.,

2020). Therefore, closed-book models are able to

memorize some entity knowledge after pre-trained

on massive data. However, studies showed that

closed-book models just recalled similar inputs and

answers in their pre-training corpus (Wang et al.,

2021), and their performances were behind open-

book models.

Open-Book Models

Open-book models ﬁrst re-

trieve evidence documents from external cor-

pora and read these documents to predict an an-

swer (Chen et al.,2017). REALM (Guu et al.,

2020) proposed a self-supervised approach to pre-

train a retriever-reader model. DPR (Karpukhin

et al.,2020) devised a contrastive objective to train

a dense bi-encoder retriever on open-domain QA.

Subsequent approaches combined DPR with a gen-

erative objective to build large, powerful models

on open-domain QA and generation tasks (Lewis

et al.,2020b;Izacard and Grave,2021;Sachan

et al.,2021;Yu et al.,2022a). However, open-

book models have to process the raw text of all re-

trieved documents, which leads to extremely long

inference time. Besides, additional overheads are

brought by loading the document index and retriev-

ing evidence documents for each example.

Entity Memory

EaE (Févry et al.,2020) was

the ﬁrst to pre-train an entity memory with an auto-

encoder framework to perform entity prediction on

open-domain QA. FILM (Verga et al.,2021) fol-

lowed EaE and added a fact memory containing

representations of Wikidata triples. To better en-

code relational knowledge, OPQL (Sun et al.,2021)

learned latent relational representations for arbi-

trary entity pairs. Recent work focused on learn-

ing a huge mention-level memory (~150M entries)

with extensive pre-training (de Jong et al.,2022) or

leveraging the entity memory in domain adaptive

training (Kang et al.,2022). These models are all

based on an auto-encoder framework. Thus, they

are able to predict entities IDs but would fail to gen-

erate any non-entity answers or sentences. There

is a preprint paper contemporaneous to our work

which trained a memory with an encoder-decoder

model (Chen et al.,2022). However, it used QA

pairs as memory entries instead of entities, limiting

its application to QA tasks. Besides, their memory

is much heavier (60M entries) than ours (1M).

3 Proposed Framework

Suppose we have a pre-deﬁned vocabulary of

entities

E={e1, . . . , eN}

. A mention is the ac-

tual tokens in context which refer to an entity. The

set of all mentions in the corpus is denoted as

Thus, there is a global alias table

T:E → 2M

where each entity is mapped to all its mentions.

The input of EDMem is a sequence of tokens

length

, and the target output is another sequence

y= [y1,· · · , yT]

of length

. Both sequences

contain a pre-labeled set of mentions. Each men-

tion refers to an entity in

. We add two special

tokens

[Es]

and

[Ee]

to represent “entity start” and

“entity end” boundaries of a mention, e.g., “

[Es]

Brett Hart

[Ee]

is the president of the

[Es]

United

Airlines

[Ee]

”. These special tokens come from

either Wikipedia hyperlinks (in pre-training, §3.3)

or an entity linking model (in ﬁne-tuning, §3.4).

3.1 Architecture

An overview of EDMem is presented in Figure 1.

The framework has a transformer encoder, a trans-

former decoder, an entity memory, and two pre-

diction heads. Both the encoder and decoder have

two parts: (

L1×

) lower layers and (

L2×

) upper

layers. Transformer layers in EDMem have the

same architecture with BART (Lewis et al.,2020a).

At the end of lower layers, EDMem is allowed to

use the hidden states as a query to access the entity

memory. The knowledge representation obtained

by each memory access is summed and normalized

with the hidden states before performing further

reasoning in upper layers. Two prediction heads

use the ﬁnal hidden states of the decoder for pre-

diction: an LM head for token prediction and an

entity linking head for entity prediction (Details are

in §3.3). In practice, we follow EaE (Févry et al.,

2020) to set L1= 4 and L2= 8.

3.2 Entity Memory

The entity memory contains a large embedding

table, which stores the embeddings of entities in

. Intuitively, an entity embedding contains the

contextual information around all mentions of the

entity in Wikipedia documents. During encoding

and decoding, EDMem queries the entity memory

whenever it encounters a mention. It recognizes

mentions by identifying the

[Es]

token. EDMem

takes the hidden state of the

[Es]

token as query to

retrieve relevant knowledge from the entity mem-

ory by attending to the entity embedding table (bias

terms are omitted):

hent

s=Wout(

i=1

αi·ei),(1)

where αi=exp (e|

iWinhlow

j=1 exp (e|

jWinhlow

s).(2)

is the embedding of entity

hlow

denotes

the hidden state of the

[Es]

token (from lower en-

coder/decoder layers).

hent

is the aggregated entity

representation, which is summed and normalized

with

hlow

to put into upper layers.

Win

and

Wout

are linear projection layers for dimension matching.

Following EaE, during inference, we aggregate the

entity representaion of top 100 entities (sorted by

αi) instead of attending to all Nentities.

3.3 Pre-Training

3.3.1 Pre-Training Corpus

We pre-train EDMem on the whole Wikipedia cor-

pus. All documents are split into 128-token pas-

sages. In addition, we set a 10-token sliding win-

dow between passages to avoid an entity being split

into two adjacent chunks. Such a setting yields a to-

tal of 39M passages, of which we hold out 0.5% of

them as the validation set during pre-training. We

leverage Wikipedia hyperlinks as gold annotations

of 249M mentions and their linked entities. Since

hyperlinks do not cover all mentions in text, we

heuristically label missing mentions to create more

training signals for the entity memory. We use the

alias table

to label all mentions in a Wikipedia

page if they match either (1) a linked entity in the

same page, or (2) the title entity of this page. This

leads to a total of 468M mentions in the pre-training

corpus. We collect 1M most frequently linked enti-

ties to form the entity vocabulary

. More details

can be found in Appendix A.

3.3.2 Pre-Training Objective

Our pre-training objective is a combination of lan-

guage modeling and entity linking. For language

modeling objectives, we randomly corrupt parts

of the input sequence and train EDMem to recon-

struct the original sequence. We adopt two kinds

of sequence corruption: random token masking

and salient span masking. In random token mask-

ing, each token has a probability of

Prtm

to be

replaced by a [MASK] token. Salient span mask-

ing is adapted from (Guu et al.,2020), where each

mention has a probability of

Pssm

that all tokens

within the mention are replaced by [MASK]. Such

explicit masking of whole mention names encour-

ages EDMem to rely on the entity memory in pre-

dicting mentions, which facilitates the learning of

entity embeddings. The LM head performs token

prediction through a linear-softmax layer, and the

LM loss is the negative log-likelihood of the target

sequence: LLM =−PT

j=1 log P(yj|x, y1:j−1).

United States

United Kingdom

Ireland

Memory Decoder

Which country ... ?

Select

Build

Prefix tree:

Decoder

Softmax

States

[BOS] [Es] United

[Es]

United

Ireland [Ee]

States

Kingdom

[Ee]

[Es]

Decoder

Softmax

States

[BOS] [Es] United

Memory

[Es]

Select

United States

United Kingdom

Build

Prefix tree:

[Es]

United

Ireland [Ee]

States

Kingdom

[Ee]

Ireland

Decoder

Softmax

States

[BOS] [Es] United

Memory

(a) Free-Form Generation

(b) Static Entity Linking

Figure 2: Three decoding methods in downstream

tasks.

EDMem utilizes direct supervision signals to

the entity memory for entity representation learn-

ing. The entity linking loss is applied each time it

queries the entity memory. Besides in the middle of

the encoder and decoder, EDMem queries the mem-

ory in the entity linking head, as shown in Figure 1.

The entity linking head predicts the corresponding

entity using the hidden states of each mention, the

same as Equation (2). We use a cross-entropy loss

to maximize the attention weights of the labelled

entities:

LEL =−Pmlog αi

, where

is a men-

tion in the input or output sequence that is linked

to the

-th entity in

. The ﬁnal loss function is

LLM +λELLEL

, where the coefﬁcient

λEL

is a

hyper-parameter.

3.4 Fine-Tuning

EDMem is ﬁne-tuned on downstream tasks via an

LM objective and an entity linking objective. The

LM objective is to maximize the probability of the

task-speciﬁc output. The entity linking objective

links mentions to entities in the memory, the same

as pre-training. Mention boundaries are pre-labeled

using an state-of-the-art entity linking model (Li

et al.,2020). In entity-intensive downstream tasks,

the entity memory assists sequence generation by

not only providing entity knowledge but also gener-

ating entity names. Thus, we design three decoding

settings to let the entity linking objective assist se-

quence generation. A sketch of different settings is

given in Figure 2.

Free-Form Generation

In this setting, the

model generates the output sequence entirely based

on the probability given by the LM head. This

includes the special tokens

[Es]

and

[Ee]

which

indicate an access to the memory. There is no con-

straint on what tokens to generate between

[Es]

and

[Ee]

,i.e., the subsequence

[Es], yi,· · · , yj,[Ee]

may not be a valid entity name in the entity vocabu-

lary. One advantage is that the model processes the

entity knowledge in a latent manner, which does

not explicitly affect the probability distribution of

the language model. However, this may affect the

model’s performance in tasks where entity names

are strictly required, e.g., open-domain QA tasks

where exact match is used as evaluation.

Static Entity Linking

Static entity linking ex-

plicitly restricts the model to generate entity names

for QA. Here, the decoding process is divided into

two steps: entity linking and constrained gener-

ation. First, given a question, the model selects

one or multiple entities as references. As shown

in Figure 2(b), the question with an appended

[Es]

token as a placeholder is passed into the decoder,

and the entity linking head is trained to predict the

entity ID of the gold answer

. Then we have the

selected top-

entities for each test question. We

restrict the generation space to the top-

entities

when the model is trying to generate an entity name.

To achieve this, inspired by (Cao et al.,2021), we

build a preﬁx tree for

entities for each test exam-

ple. The preﬁx tree tells the model which tokens

are allowed to generate given a preﬁx (i.e., previ-

ous generated tokens). When the model generates

[Es]

token, we restrict the following generated

tokens to be one of the

entity names (i.e., one of

the paths in the preﬁx tree). In this way, the model

can either generate an entity answer (by generat-

ing

[Es]

and traversing the pre-built preﬁx tree), or

generate a non-entity answer (if no

[Es]

token is

generated). Readers can refer to (Cao et al.,2021)

for more implementation details.

Dynamic Entity Linking

Static entity linking is

applicable only when the downstream task can be

converted into an entity linking objective. Another

way to generate entities is to predict the entities

on-the-ﬂy. After each time the model generates

Training examples with non-entity answers are discarded.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AUniedEncoder-DecoderFrameworkwithEntityMemoryZhihanZhang1,WenhaoYu1,ChenguangZhu2,MengJiang11UniversityofNotreDame,NotreDame,IN,USA2MicrosoftCognitiveServicesResearch,Redmond,WA,USA1{zzhang23,wyu1,mjiang2}@nd.edu;2chezhu@microsoft.comAbstractEntities,asimportantcarriersofreal-worldknowledge,playak...

展开>> 收起<<

A Uniﬁed Encoder-Decoder Framework with Entity Memory Zhihan Zhang1 Wenhao Yu1 Chenguang Zhu2 Meng Jiang1 1University of Notre Dame Notre Dame IN USA.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Uniﬁed Encoder-Decoder Framework with Entity Memory Zhihan Zhang1 Wenhao Yu1 Chenguang Zhu2 Meng Jiang1 1University of Notre Dame Notre Dame IN USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: