
Memory Augmented Models KNN-LM (Khan-
delwal et al.,2020), TRIME (Zhong et al.,2022),
RAG (Lewis et al.,2020), and RETRO (Borgeaud
et al.,2022) are memory augmented models which
use both the parametric space of the model and
the non-parametric space of the external memory.
KNN-LM improves the LM performance by gener-
ating the next token through interpolation between
the nearest neighbor distribution (distance in the
contextualized embedding space) and the model
vocab distribution only during the inference step.
TRIME expands the work to use the objective also
during the training step. RAG and RETRO first
retrieve relevant texts with the retriever from the
external memory and generate the output based
on the retrieved texts. Moreover, concurrent work
NPM (Min et al.,2022) proposes a nonparametric
masked language model which operates over the
nonparametric distribution of the external memory.
Generative retrieval models with Nonparametric
Decoding also utilize the external memory, but
rather than considering it as an external source,
it is incorporated with the model by utilizing the
external memory as decoder vocab embeddings.
3 Nonparametric Decoding
Generative retrieval is the task of retrieving the
most relevant retrieval target (e.g., title, passage,
document identifier) by generating the target token-
by-token when given an input query. The training
objective of the generative retrieval model is to
maximize
P((t1,· · · , tn)|q) =
n
Y
i=1
P(ti|q, t<i)(1)
where
t∗
denotes the tokens of the retrieval target
and
q
is the input query. Such an approach has
shown high performance while using a low stor-
age footprint (Cao et al.,2021;Tay et al.,2022;
Bevilacqua et al.,2022;Lee et al.,2022). However,
it has limitation in that the model depends solely
on the information encoded in its own parameters.
Thus, the performance is likely to be bounded by
how much information can be stored in the model
parameter (Tay et al.,2022;Roberts et al.,2020).
To address the limitation, we propose a new
decoding method called Nonparametric Decod-
ing (Np Decoding) for generative retrieval. To in-
corporate Np Decoding on the existing generative
retrieval model, the only amendment is to use the
frozen contextualized vocab embedding (external
memory) rather than the vanilla vocab embedding
as the decoder vocab embedding during each gen-
eration step (Figure 1). The embeddings are the
output embeddings of an encoder when given a tar-
get sequence as input. Note that existing generative
retrieval models such as GENRE and DSI utilize
the pre-trained language model architecture as-is:
vanilla vocab embedding as the decoder vocab em-
bedding.
In Section 3.1, we show the key benefits of us-
ing Np Decoding over vanilla decoding. For Sec-
tion 3.2 to Section 3.4, we show the details of base
Np Decoding (BASE), and two variants (ASYNC,
CONTRA). In Section 3.5, we describe how we
reduce the number of contextualized token embed-
dings.
3.1 Key Benefits
Using Np Decoding has two key benefits over
vanilla decoding. First, the generative retrieval
model with Np Decoding can utilize not only the
information encoded in its own parameters (para-
metric space) but also the surrounding information
encoded in the contextualized vocab embeddings
(nonparametric space) during each decoding step.
Second, the generative retrieval model with Np
Decoding has more expressive and fine-grained
decoder vocab embedding space than that of the
model with vanilla decoding. As in Figure 1, Np
Decoding allows a single token to have multiple
contextualized token embeddings for the decoder
vocab embeddings (e.g., the same token "Cape" has
two different contextualized embeddings) depend-
ing on the surrounding information of the token,
whereas vanilla decoding allows only a single to-
ken embedding for a single token. Note that we do
not save all possible token embeddings, but reduce
the number of tokens to save without performance
degradation by practical tactics (Section 3.5).
3.2 BASE Nonparametric Decoding
In this work, we propose three different Np De-
coding (Base Nonparametric Decoding and two
variants) which we name the three different Np De-
coding based on the characteristics of the Contex-
tualized Embedding Encoders (CE Encoder). CE
Encoder is an encoder that outputs contextualized
token embeddings when given a target sequence
(e.g., title, document ID, passage) as input. The
contextualized token embeddings are added to CE
2
,
2
Details of how we construct CE for different target se-
quences are in Section 4.3.