Nonparametric Decoding for Generative Retrieval Hyunji Lee1Jaeyoung Kim3Hoyeon Chang1Hanseok Oh1Sohee Yang1 Vlad Karpukhin2Yi Lu2Minjoon Seo1

2025-05-02 0 0 669.61KB 18 页 10玖币

侵权投诉

Nonparametric Decoding for Generative Retrieval

Hyunji Lee1Jaeyoung Kim3∗Hoyeon Chang1Hanseok Oh1Sohee Yang1

Vlad Karpukhin2Yi Lu2Minjoon Seo1

1KAIST AI 2Forethought.AI 3Kakao

{hyunji.amy.lee, hanseok, sohee.yang, retapurayo, minjoon}@kaist.ac.kr

{vlad.karpukhin, yi.lu}@forethought.ai jay.eong@kakaocorp.com

Abstract

The generative retrieval model depends solely

on the information encoded in its model param-

eters without external memory, its information

capacity is limited and ﬁxed. To overcome the

limitation, we propose Nonparametric Decod-

ing (Np Decoding) which can be applied to

existing generative retrieval models. Np De-

coding uses nonparametric contextualized vo-

cab embeddings (external memory) rather than

vanilla vocab embeddings as decoder vocab em-

beddings. By leveraging the contextualized vo-

cab embeddings, the generative retrieval model

is able to utilize both the parametric and non-

parametric space. Evaluation over 9 datasets (8

single-hop and 1 multi-hop) in the document

retrieval task shows that applying Np Decod-

ing to generative retrieval models signiﬁcantly

improves the performance. We also show that

Np Decoding is data- and parameter-efﬁcient,

and shows high performance in the zero-shot

setting.1

1 Introduction

Text retrieval is often formulated as ﬁnding the

most relevant items from a large corpus given an

input query. The bi-encoder approach of using an

encoder to map the documents and the query to

a common vector space and performing a nearest

neighbor search has been a common practice in text

retrieval tasks (Karpukhin et al.,2020;Wu et al.,

2020;Ni et al.,2021). Despite its high performance

and popularity, it has an embedding space bottle-

neck (Luan et al.,2021;Lee et al.,2022); limited

expressiveness due to ﬁxed-size embeddings and

misses the ﬁne-grained interaction between embed-

dings as they interact in L2 or inner product space.

Moreover, the bi-encoder approach requires large

storage space to save all document embeddings.

∗Work done during internship at KAIST AI.

1The code and datasets used in our work is at

https://github.com/amy-hyunji/Contextualized-Generative-

Retrieval.

A recently-proposed alternative to the bi-encoder

approach is using a generative retrieval model (Cao

et al.,2021;Tay et al.,2022;Bevilacqua et al.,

2022;Lee et al.,2022;Wang et al.,2022;Lafferty

and Zhai,2003;Croft and Lafferty,2010). It is an

autoregressive model that retrieves the most rele-

vant sequence by generating the target sequence

(e.g., title, passage, document ID) token-by-token.

It overcomes the embedding space bottleneck by

interacting in the parametric space. Also, it is stor-

age efﬁcient by not having any external memory.

However, the information capacity of such fully

parametric models tends to be bounded by their

sizes as it has to encode all information in its pa-

rameters (Tay et al.,2022;Roberts et al.,2020).

To this end, we propose Nonparametric Decod-

ing (Np Decoding), a decoding method for gener-

ative retrieval models. It uses nonparametric con-

textualized vocab embeddings rather than vanilla

vocab embeddings as decoder vocab embeddings.

The contextualized vocab embeddings are output

embeddings of an encoder that constructs a non-

parametric dense vector space and are frozen dur-

ing the training step whereas the vanilla vocab em-

beddings are trainable model vocab embeddings

that construct a parametric space of the model.

Therefore, by using Np Decoding, the generative

retrieval model does not have to rely solely on its

own parameters but can utilize the surrounding

information encoded in the contextualized vocab

embeddings (external memory). Note that while it

utilizes the dense vector space as in the bi-encoder

approach, unlike the approach, it does not have

embedding space bottleneck as it is a variant of

the generative retrieval model, and saves storage

space by storing only clustering centroid embed-

dings (Section 3.5).

As shown in Figure 1, any generative retrieval

model can incorporate Np Decoding by replacing

the decoder vocab embeddings from the vanilla

embedding matrix to contextualized embedding

arXiv:2210.02068v3 [cs.IR] 28 May 2023

Vanilla Embedding Space

Contextualized

Embedding Space

Cape .Town

.Cape

.South

Africa

.of

.Climate

.Cape

.Town

.Climate

.of

.Africa

.South

: Vanilla

: w/ NP Decoding

“During which season does cape town

receive rainfall”

Softmax

Climate of South Africa

Vanilla Embedding Matrix

Cape

…

Climate

Town

South

Africa

Cape Town: Cape Town is one

of South Africa’s three

CE Encoder

…

Contextualized

Embedding Matrix (CE)

Cape

Town

❄

❄: frozen 🔥: trainable

save only the target

sequence (e.g., title)

Generative

Retriever

Figure 1: Np Decoding can be applied to any generative retrieval model by replacing the decoder vocab embeddings from the

vanilla embedding matrix with the contextualized embedding matrix (CE). CE is composed of the output embeddings of the

language model encoder (CE Encoder). Only the retrieval target sequences are added to CE, which in this ﬁgure we use the title

(Cape Town) as the target sequence. Unlike vanilla vocab embeddings, contextualized vocab embeddings that consist CE contain

context information, and a single token can have multiple token embeddings. This creates a more expressive and ﬁne-grained

contextualized embedding space compared to vanilla embedding space as shown on the right side of the ﬁgure.

matrix (CE) for both the training and the infer-

ence steps. By the replacement, Np Decoding has

two key beneﬁts over vanilla decoding. First, the

generative retrieval model can utilize not only its

parametric space but also its nonparametric space.

The nonparametric space is constructed with de-

coder vocab embeddings of Np Decoding (CE),

nonparametric and context-aware embeddings that

capture surrounding information. Second, CE al-

lows a token to have multiple token embeddings,

unlike vanilla vocab embeddings where a token has

a unique embedding. Therefore, the decoder vocab

embedding space of CE becomes more expressive

and ﬁne-grained (right side of Figure 1). Since hav-

ing a well-constructed CE is important for achiev-

ing high performance, we propose three different

encoders (CE Encoder) used to output contextual-

ized vocab embeddings added to CE (Section 3).

We demonstrate that CE Encoder with contrastive

learning results in a signiﬁcant increase in perfor-

mance.

The main contributions of our paper are as fol-

lows:

•

We propose Nonparametric Decoding (Np De-

coding), a simple and novel decoding method

that can be applied to all existing generative

retrieval models. Experimental results over

9 datasets show that Np Decoding can signif-

icantly improve the performance of existing

generative retrieval models by leveraging both

the parametric and the nonparametric space;

4.4% R-precision improvement for single-hop,

5.4% Recall@2 improvement for multi-hop

datasets.

•

We present various CE Encoder and show that

training CE Encoder with contrastive learning

further increases the performance by a large

margin.

•

We show generative retrieval models with Np

Decoding are data- and parameter-efﬁcient,

and show higher performance in a zero-shot

setting.

2 Related Work

Generative Retrieval Generative retrieval mod-

els retrieve relevant items by generating sub/either

the identiﬁers or entire sequences of the items.

GENRE (Cao et al.,2021) retrieves a document

by generating the titles with a constrained beam

search. DSI (Tay et al.,2022) assigns a unique ID

to each item in the corpus and retrieves the item

by generating the ID of the most relevant docu-

ment. SEAL (Bevilacqua et al.,2022) retrieves any

span from any position in the corpus by using FM-

Index. GMR (Lee et al.,2022) retrieves the most

relevant item by generating the whole sequence.

Though high performance, as generative retrieval

models solely rely on the information stored in

their parameter, the information capacity is limited

and ﬁxed. To overcome the limitation, we propose

Nonparametric Decoding (Np Decoding) for gen-

erative retrieval models. By replacing the decoder

vocab embeddings with nonparametric contextual-

ized vocab embeddings, the model is able to utilize

not only the parametric space but also the nonpara-

metric space of contextualized embeddings.

Memory Augmented Models KNN-LM (Khan-

delwal et al.,2020), TRIME (Zhong et al.,2022),

RAG (Lewis et al.,2020), and RETRO (Borgeaud

et al.,2022) are memory augmented models which

use both the parametric space of the model and

the non-parametric space of the external memory.

KNN-LM improves the LM performance by gener-

ating the next token through interpolation between

the nearest neighbor distribution (distance in the

contextualized embedding space) and the model

vocab distribution only during the inference step.

TRIME expands the work to use the objective also

during the training step. RAG and RETRO ﬁrst

retrieve relevant texts with the retriever from the

external memory and generate the output based

on the retrieved texts. Moreover, concurrent work

NPM (Min et al.,2022) proposes a nonparametric

masked language model which operates over the

nonparametric distribution of the external memory.

Generative retrieval models with Nonparametric

Decoding also utilize the external memory, but

rather than considering it as an external source,

it is incorporated with the model by utilizing the

external memory as decoder vocab embeddings.

3 Nonparametric Decoding

Generative retrieval is the task of retrieving the

most relevant retrieval target (e.g., title, passage,

document identiﬁer) by generating the target token-

by-token when given an input query. The training

objective of the generative retrieval model is to

maximize

P((t1,· · · , tn)|q) =

i=1

P(ti|q, t<i)(1)

where

t∗

denotes the tokens of the retrieval target

and

is the input query. Such an approach has

shown high performance while using a low stor-

age footprint (Cao et al.,2021;Tay et al.,2022;

Bevilacqua et al.,2022;Lee et al.,2022). However,

it has limitation in that the model depends solely

on the information encoded in its own parameters.

Thus, the performance is likely to be bounded by

how much information can be stored in the model

parameter (Tay et al.,2022;Roberts et al.,2020).

To address the limitation, we propose a new

decoding method called Nonparametric Decod-

ing (Np Decoding) for generative retrieval. To in-

corporate Np Decoding on the existing generative

retrieval model, the only amendment is to use the

frozen contextualized vocab embedding (external

memory) rather than the vanilla vocab embedding

as the decoder vocab embedding during each gen-

eration step (Figure 1). The embeddings are the

output embeddings of an encoder when given a tar-

get sequence as input. Note that existing generative

retrieval models such as GENRE and DSI utilize

the pre-trained language model architecture as-is:

vanilla vocab embedding as the decoder vocab em-

bedding.

In Section 3.1, we show the key beneﬁts of us-

ing Np Decoding over vanilla decoding. For Sec-

tion 3.2 to Section 3.4, we show the details of base

Np Decoding (BASE), and two variants (ASYNC,

CONTRA). In Section 3.5, we describe how we

reduce the number of contextualized token embed-

dings.

3.1 Key Beneﬁts

Using Np Decoding has two key beneﬁts over

vanilla decoding. First, the generative retrieval

model with Np Decoding can utilize not only the

information encoded in its own parameters (para-

metric space) but also the surrounding information

encoded in the contextualized vocab embeddings

(nonparametric space) during each decoding step.

Second, the generative retrieval model with Np

Decoding has more expressive and ﬁne-grained

decoder vocab embedding space than that of the

model with vanilla decoding. As in Figure 1, Np

Decoding allows a single token to have multiple

contextualized token embeddings for the decoder

vocab embeddings (e.g., the same token "Cape" has

two different contextualized embeddings) depend-

ing on the surrounding information of the token,

whereas vanilla decoding allows only a single to-

ken embedding for a single token. Note that we do

not save all possible token embeddings, but reduce

the number of tokens to save without performance

degradation by practical tactics (Section 3.5).

3.2 BASE Nonparametric Decoding

In this work, we propose three different Np De-

coding (Base Nonparametric Decoding and two

variants) which we name the three different Np De-

coding based on the characteristics of the Contex-

tualized Embedding Encoders (CE Encoder). CE

Encoder is an encoder that outputs contextualized

token embeddings when given a target sequence

(e.g., title, document ID, passage) as input. The

contextualized token embeddings are added to CE

Details of how we construct CE for different target se-

quences are in Section 4.3.

Climate(1):

Cape Town: Cape Town is one

of South Africa’s three

EMB

1.1 0.1 0.1

…

1.7 0.6 0.5 …1.6 0.4

Cape(2):

1.7 0.6 0.6 …0.8 0.5

Town(1):

lacrosse

Cape

Box

Climate

Contextualized Embedding Space

“how many players on a box

lacrosse team”

: Negative

: Positive

first decoder

output embedding

🔥: Trainable

Figure 2: Token-level contrastive learning of CONTRA Np

Decoding. Given a query ("how many players on a box

lacrosse team") and target sequence ("Box lacrosse"), we train

T5 on token-level contrastive learning where all tokens of the

target sequence are the positive pairs and the rest of the tokens

in CE are negative pairs.

the decoder vocab embedding matrix of generative

retriever with Np Decoding. BASE Nonparametric

Decoding (BASE) uses the most basic CE Encoder,

the pre-trained T5 encoder as-is. CE is constructed

once with the output embeddings of CE Encoder

before the generative retrieval training step. Note

that during the training step of the generative re-

trieval, CE Encoder is frozen (Figure 1).

3.3 ASYNC Nonparametric Decoding

Asynchronous Nonparametric Decoding (ASYNC)

uses CE Encoder which is asynchronously replaced

every

epoch by the encoder of generative re-

triever during the generative retrieval training step.

By replacing CE Encoder periodically, ASYNC has

more coherency between CE Encoder and the gen-

erative retriever than BASE. After every replace-

ment (

epoch), we construct a new CE with the

output embeddings of replaced CE Encoder and

resume training the generative retriever. Note that

during the generative retrieval training step, CE

Encoder is frozen but simply replaced, and only

generative retriever is trainable. We keep

N= 20

for all experiments. See Appendix C.3 for details

on how Naffects the performance.

3.4 CONTRASTIVE Nonparametric Decoding

CONTRASTIVE Nonparametric Decoding

(CONTRA) uses CE Encoder trained on token-level

contrastive learning. The CE Encoder constructs

CE, the nonparametric decoder vocab space of

generative retrieval model with Np Decoding.

The token-level contrastive learning (Equation 2)

is performed as an intermediate step before

training T5 on the generative retrieval task

(Equation 1). Bi-encoder retrieval models with

contrastive loss have shown high performance

as the model learns to construct well-structured

global embedding space and regularize the space

to be uniform (Ni et al.,2021;Gao et al.,2021b;

Gao and Callan,2022;Izacard et al.,2022). In a

similar way, CE Encoder with contrastive learning

constructs a more meaningful dense vector space

(non-parametric space of the generative retriever)

than CE Encoder of BASE.

As in Figure 2, given a query, we train the ﬁrst

output embedding of the T5 decoder

with all to-

kens of the target sequence as positive pairs and

the rest of the tokens in CE

as negative pairs. Af-

ter training T5 with token-level contrastive learn-

ing, we construct the CE with its encoder as CE

Encoder, and then further train the model on the

generative retrieval task.

Step 1. Token-level Contrastive Learning

Given a training dataset of pairs

{(q,t)}

where

is the query text, and

is the retrieval target (e.g.,

the title of the document to retrieve) composed

of multiple tokens

(

1≤i≤k

where

is the

length of the target), we split the training dataset

into

separate pairs

{(q,ti)}

to construct the train-

ing dataset of query-token. With the query-token

dataset, we train the ﬁrst output token embedding

from the T5 decoder to be close to all token em-

beddings in

T+

when given query

as an input

to generative retriever (Figure 2).

T+

is a set of

positive token embeddings

(tokens that make up

one retrieval target), and

T−

is the set of negative

token embeddings

(all other token embeddings in

CE). The objective is to minimize the contrastive

loss:

L(q,t+

1,· · · ,t+

|T +|,t−

1,· · · ,t−

|T −|)

=−log Pt+∈T +e<q,t+>

Σt+∈T +e<q,t+>+ Σt−∈T −e<q,t−>

(2)

where

⟨,⟩

is the inner product value between

the two embeddings. We also experiment with a

contrastive loss having a single token per target as

positive and in-batch negatives loss (Appendix A.1)

where the contrastive loss with multiple tokens

We use the embedding of decoder (Ni et al.,2021), not

the encoder, to initialize generative retriever with both the

encoder and the decoder trained on contrastive learning.

As we freeze the token embeddings (CE) and only train

the T5, calculating over entire embedding space is possible.

CE used in the step is constructed with the output embeddings

of the pre-trained T5 encoder model.

5T+={t+

1,· · · ,t+

k}(k=|T +|)

6T−={t−

1,· · · ,t−

|T −|}

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NonparametricDecodingforGenerativeRetrievalHyunjiLee1JaeyoungKim3∗HoyeonChang1HanseokOh1SoheeYang1VladKarpukhin2YiLu2MinjoonSeo11KAISTAI2Forethought.AI3Kakao{hyunji.amy.lee,hanseok,sohee.yang,retapurayo,minjoon}@kaist.ac.kr{vlad.karpukhin,yi.lu}@forethought.aijay.eong@kakaocorp.comAbstractThegenerat...

展开>> 收起<<

Nonparametric Decoding for Generative Retrieval Hyunji Lee1Jaeyoung Kim3Hoyeon Chang1Hanseok Oh1Sohee Yang1 Vlad Karpukhin2Yi Lu2Minjoon Seo1.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Nonparametric Decoding for Generative Retrieval Hyunji Lee1Jaeyoung Kim3Hoyeon Chang1Hanseok Oh1Sohee Yang1 Vlad Karpukhin2Yi Lu2Minjoon Seo1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: