Decoding a Neural Retrievers Latent Space for Query Suggestion Leonard AdolphsyMichelle Chen HuebscherzChristian Buckz Sertan GirginzOlivier BachemzMassimiliano CiaramitazThomas Hofmanny

2025-05-06 0 0 712.4KB 20 页 10玖币

侵权投诉

Decoding a Neural Retriever’s Latent Space for Query Suggestion

Leonard Adolphs†Michelle Chen Huebscher‡Christian Buck‡

Sertan Girgin‡Olivier Bachem‡Massimiliano Ciaramita‡Thomas Hofmann†

†ETH Zürich

ladolphs@inf.ethz.ch

‡Google Research

Abstract

Neural retrieval models have superseded clas-

sic bag-of-words methods such as BM25 as

the retrieval framework of choice. However,

neural systems lack the interpretability of bag-

of-words models; it is not trivial to connect a

query change to a change in the latent space

that ultimately determines the retrieval results.

To shed light on this embedding space, we

learn a “query decoder” that, given a latent

representation of a neural search engine, gen-

erates the corresponding query. We show that

it is possible to decode a meaningful query

from its latent representation and, when mov-

ing in the right direction in latent space, to de-

code a query that retrieves the relevant para-

graph. In particular, the query decoder can be

useful to understand “what should have been

asked” to retrieve a particular paragraph from

the collection. We employ the query decoder

to generate a large synthetic dataset of query

reformulations for MSMarco, leading to im-

proved retrieval performance. On this data, we

train a pseudo-relevance feedback (PRF) T5

model for the application of query suggestion

that outperforms both query reformulation and

PRF information retrieval baselines.

1 Introduction

Neural encoder models (Karpukhin et al.,2020;Ni

et al.,2021;Izacard et al.,2021) have improved

document retrieval in various settings. They have

become an essential building block for applications

in open-domain question answering (Karpukhin

et al.,2020;Lewis et al.,2020b;Izacard and Grave,

2021), open-domain conversational agents (Shuster

et al.,2021;Adolphs et al.,2021), and, recently,

language modeling (Shuster et al.,2022). Neural

encoders embed documents and queries in a shared

(or joint) latent space, so that paragraphs can be

ranked and retrieved based on their vector similar-

ity with a given query. This constitutes a concep-

tually powerful approach to discovering semantic

similarities between queries and documents that is

often found to be more nuanced than simple term

frequency statistics typical of classic sparse rep-

resentations. However, such encoders may come

with shortcomings in practice. First, they are prone

to domain overﬁtting, failing to consistently outper-

form bag-of-words approaches on out-of-domain

queries (Thakur et al.,2021). Second, they are no-

toriously hard to interpret as similarity is no longer

controlled by word overlap, but rather by seman-

tic similarities that lack explainability. Third, they

may be non-robust as small changes in the query

can lead to inexplicably different retrieval results.

In bag-of-words models, it can be straightfor-

ward to modify a query to retrieve a given docu-

ment: e.g., following insights from relevance feed-

back (Rocchio,1971), by increasing the weight of

terms contained in the target document (Adolphs

et al.,2022;Huebscher et al.,2022). This approach

is not trivially applicable to neural retrieval models

as it is unclear how an added term might change

the latent code of a query.

In this paper, we look into the missing link connect-

ing latent codes back to actual queries. We thus

propose to train a “query decoder”, which maps

embeddings in the shared query-document space to

query strings, inverting the ﬁxed encoder of the neu-

ral retriever (cf. Figure 1a). As we will show, such

a decoder lets us ﬁnd queries that are optimized to

retrieve a given target document. It deciphers what

information is in the latent code of a document and

how to phrase a query to retrieve it.

We use this model to explore the latent space of

a state-of-the-art neural retrieval model, GTR (Ni

et al.,2021). In particular, we leverage the structure

of the latent space by traversing from the embed-

ding of a speciﬁc query to its human-labeled gold

paragraph and use our query decoder to generate

reformulation examples from intermediate points

along the path as shown in Figure 1b. We ﬁnd

that using this approach, we can generate a large

arXiv:2210.12084v1 [cs.CL] 21 Oct 2022

Query Encoder

(fixed)

Query

Decoder

Query

Embedding

(a)

Original

Query

Gold

Paragraph

Query Decoder

Retrieved Documents

Latent Space

Reformulation 1

Reformulation 2

Reformulation 3

Query Space

(b)

Who is the chess champion of the world?

Who was the chess champion of the 90s?

Who is the reigning world chess champion?

Who had the highest FIDE ranking ever?

(c)

Figure 1: We train a query decoder (QD) model that inverts the shared encoder of a neural retrieval model (a). Then,

we leverage the structure of the latent space of a neural retrieval model by traversing from query to gold paragraph

embeddings and using our query decoder to generate a dataset of successful query reformulations (b). Finally, we

train a pseudo-relevance feedback query suggestion model on this dataset that predicts promising rewrites, given a

query and its search results (c).

dataset of query reformulations on MSMarco-train

(Nguyen et al.,2016) that improve retrieval per-

formance without needing additional human label-

ing. We use this dataset to train a pseudo-relevance

feedback (PRF) query suggestion model. Here,

we ﬁne-tune a T5-large model (Raffel et al.,2020)

that uses the original query, together with its top-5

GTR search results, as the input context to predict

a query suggestions as depicted in Figure 1c. We

show that our model provides ﬂuent, diverse query

suggestions with better retrieval performance than

various baselines, including a T5 model trained

on question editing (Chu et al.,2020), and a PRF

query expansion model (Pal et al.,2013).

We make the resources to reproduce the results

publicly available1.

2 Related Work

Neural Retriever

Classic retrieval systems such

as BM25 (Robertson and Zaragoza,2009) use term

frequency statistics to determine the relevancy of

a document for a given query. Recently, neural

retrieval models have become more popular and

started to outperform classic systems on multiple

search tasks. Karpukhin et al. (2020) use a dual-

encoder setup based on BERT-base (Devlin et al.,

2019), called DPR, to encode query and documents

separately and use maximum inner product search

(Shrivastava and Li,2014) to ﬁnd a match. They

use this model to improve recall and answer quality

for multiple open-domain question-answer datasets,

including OpenQA-NQ (Lee et al.,2019). Ni et al.

(2021) show that scaling up the dual encoder archi-

tecture improves the retrieval performance. They

train a shared dual encoder model, based on T5

1https://github.com/leox1v/query_decoder

(Raffel et al.,2020), in a multi-stage manner, in-

cluding ﬁne-tuning on MSMarco (Nguyen et al.,

2016), and evaluate on the range of retrieval tasks

of the BEIR benchmark (Thakur et al.,2021). Izac-

ard et al. (2021) show that one can train an unsu-

pervised dense retriever and be competitive against

strong baselines on the BEIR benchmark.

Xiong et al. (2021) propose approximate nearest

neighbor negative contrastive learning (ANCE) to

learn a dense retrieval system. On top of this

dense retriever, Li et al. (2022) consider a pseudo-

relevance feedback method. Other than our ap-

proach, this method does not provide the user with

rephrased queries.

Applications of Neural Retrievers

Neural re-

trieval models have been at the core of recent

improvements among a range of different NLP

tasks. Lewis et al. (2020b) augment a language

generation model, BART (Lewis et al.,2020a),

with a DPR neural retriever and evaluate on multi-

ple knowledge-intensive NLP tasks; most notably,

they improve over previous models on multiple

open-domain QA benchmarks using an abstractive

method.

Izacard and Grave (2021) propose the Fusion-in-

Decoder method to aggregate a large set of docu-

ments from the neural retriever and provide them

to the model during answer generation. Their fo-

cus is on open-domain QA where they signiﬁcantly

outperform previous models when considering a

large set of documents during decoding.

Shuster et al. (2021) use neural retrieval models

to improve conversational agents in knowledge-

grounded dialogue. They show that the issue of

hallucination – i.e., generating factual incorrect

knowledge statements – can be signiﬁcantly re-

duced when using a neural-retriever-in-the-loop

architecture. Separating the retrieval-augmented

knowledge generation and the conversational re-

sponse generation can further improve the issue

of hallucination in knowledge-grounded dialogue

and helps fuse modular QA and dialogue models

(Adolphs et al.,2021). Recently, retrieval query

generation approaches have been proposed to im-

prove open-domain dialogue (Komeili et al.,2021)

and language modeling (Shuster et al.,2022).

Query Generation

Query optimization is a long-

standing problem in IR (Lau and Horvitz,1999;

Teevan et al.,2004). Recent work has investigated

query reﬁnement with reinforcement learning for

Open Domain and Conversational Question An-

swering (Nogueira and Cho,2017;Buck et al.,

2018;Wu et al.,2021).

The methods presented in this paper are a natural

complement to the work of Adolphs et al. (2022),

who propose a heuristic approach to generate multi-

step query reﬁnements, used to train sequential

query generation models for the task of learning

to search. Their method is also inspired by rele-

vance feedback, but they seek to reach the gold

document purely in language space, by brute force

exploration. For this purpose, they use specialized

search operators to condition the retrieval results as

desired. Huebscher et al. (2022) show that, when

paired with a hybrid sparse/dense retrieval environ-

ment, the search agents trained on this kind of syn-

thetic data combine effective corpus exploration,

competitive performance and interpretability.

Web-GPT (Nakano et al.,2021) presents an end-

to-end search modeling approach based on human

demonstrations, in a similar spirit our work could

be seen as way of involving humans-in-loop by

proposing better queries.

Fixed-vector decoders

Probabilistic decoders

mapping from a ﬁxed size vector space to natural

language have also been explored in auto-encoder

settings. A key challenge in this line of work lies in

obtaining decoders that are robust, i.e., they gener-

ate natural text for a variety of input vectors. Bow-

man et al. (2016) proposed using a RNN-based

language model in combination with variational

autoencoders (VAE) (Kingma and Welling,2013)

which add Gaussian space to the decoder input.

Zhao et al. (2018) proposed the use of Adversar-

ial Autoencoder (AAE) (Makhzani et al.,2015)

to which Shen et al. (2020) added data denoising

Who is the chess champion of the world?

GTR Query

Encoder

Query

Decoder

Who is the best chess player?

GTR Query

Encoder

Cos Sim

Data F1 Cos Sim

MSMarco 0.750 0.960

NQ 0.886 0.980

Table 1: Decoding metrics of the Query Decoder (QD)

based on the GTR-base neural retrieval model. The F1

score is the F1 word overlap between the original query,

of MSMarco or NQ, and the output of the query de-

coder model when provided with the GTR encoding of

the query. The cosine similarity is measured between

the re-encoding of the generated query and the encod-

ing of the original query. The ﬁgure above depicts the

metrics visually with a toy example for clarity.

by randomly dropping words in the input and the

reconstructing the full output.

Recently, RNN-based decoders have been replaced

by Transformer-based language models (Vaswani

et al.,2017), for example by Montero et al. (2021),

Park and Lee (2021) and Li et al. (2020).

3 Query Decoder

Training

We train a T5 (Raffel et al.,2020),

decoder-only model, to (re-)generate a query from

its embedding obtained in a neural retrieval model.

As training data, we use a subset of 3 million

queries of the PAQ dataset (Lewis et al.,2021).

We use the GTR-base (Ni et al.,2021), shared-

encoder model, to generate the embeddings and

use the queries as the targets. The objective of the

query decoder learning is to invert the mapping of

the ﬁxed GTR encoder model, as visually depicted

in Figure 1a. More training details of the query

decoder are provided in Appendix A.2.1.

Query Reconstruction Evaluation

We con-

sider the round-trip consistency as a ﬁrst step in

evaluating the query decoder’s effectiveness. A

query

is encoded via GTR and then decoded by

our decoder to generate

. We use queries from

MSMarco, and NQ test sets of the BEIR bench-

mark (Thakur et al.,2021). As a ﬁrst metric, we

compute the F1 score between the original

and its

reconstruction

. Since word-overlap is imperfect

in measuring query drift, we further re-encode

and compare its latent code with the code for

via

their cosine similarity. The results of these evalua-

tions are reported in Table 1, where we also provide

an illustrative example of the proposed approach.

For both datasets, MSMarco and NQ, the metrics

of F1 and cosine similarity are generally high, indi-

cating that the GTR code carries information that

allows for close approximate query reconstruction.

Query

Decoder

Decoded Query GTR

retrieval

Paragraph

GTR

Encoder

Paragraph

Data Top1 Top3 Top5

MSMarco 0.551 0.737 0.796

NQ 0.721 0.863 0.897

Table 2: Share of gold paragraphs for which we can de-

code a query that retrieves the given paragraph within

its top-k GTR search results. The ﬁgure above depicts

the metric evaluation visually for clarity.

Paragraph to Query Evaluation

Many interest-

ing use cases rely on the ability to generate queries

from passages of text (Du et al.,2017;Kumar et al.,

2018). As GTR embeds document paragraphs and

queries into the the same space, the query decoder

can also be used to invert the retrieval process. We

thus evaluate the decoder quality by starting from

a document paragraph, decoding a query from its

embedding and then running the GTR search en-

gine on that query to check if this query retrieves

the desired paragraph as a top-ranked result. We

test this in an experiment with human-labeled gold

paragraphs from MSMarco and NQ, using top-k as

the success metric. The results reported in Table 2

are very encouraging in that the desired paragraph

is indeed found very often among the topmost GTR

search results. Two example paragraph decodings

from MSMarco are shown in Table 3; for both de-

codings, the gold paragraph is retrieved at the top

position.

Latent Space Traversal Decoding

We have

shown that query decoding can reconstruct queries

and that it can ﬁnd retrieval queries for target pas-

sages. We now turn to a more concrete practical

Original Query

nebl coin price [Rank: 2]

Decoding from Gold Paragraph

what is the current price of neblio today belo [Rank: 1]

Gold Paragraph

Neblio Price Chart US Dollar (NEBL/USD) Neblio price for

today is $16.3125. It has a current circulating supply of 12.8

Million coins and a total volume exchanged of $9,701,465

Original Query

when is champaign il midterm elections [Rank: 3]

Decoding from Gold Paragraph

when is the general election in illinois 2018 [Rank: 1]

Gold Paragraph

Illinois elections, 2018. A general election will be held

in the U.S. state of Illinois on November 6, 2018. All of

Illinois’ executive ofﬁcers will be up for election as well as

all of Illinois’ eighteen seats in the United States House of

Representatives.

Table 3: Examples of query decodings from the gold

paragraph. The rank indicates the retrieval position of

the gold paragraph using the corresponding query.

application, namely to automatically generate a

data set of query reformulations, from which strate-

gies for interactive retrieval can be learned. In this

context, reformulated queries should remain seman-

tically similar to the original query and not overﬁt

to the target passage. They should be somewhat ’in

between’ the query and the gold passage, as any

passage is likely to contain answers to multiple,

different questions. This can be operationalized by

decoding queries from points along the line con-

necting the embeddings of the query and its target

passage as depicted in Figure 1b.

To validate this idea, we apply it to the MSMarco

and NQ retrieval dataset where each query is paired

with a human-labeled gold paragraph. In particu-

lar, we move in

equidistant increments from the

original query embedding

to the gold paragraph

embedding d, i.e.

qκ=q+κ

k(d−q)κ= 0, . . . , k (1)

and generate a reformulation at each step.

As a

sanity check, Figure 3shows the average retrieval

performance of the decoded queries when mov-

ing from the original query embedding to the gold

paragraph embedding for MSMarco and NQ. For

both datasets, the normalized discounted cumula-

tive gain (nDCG) (Järvelin and Kekäläinen,2002)

We underline that this procedure can be seen as a latent

space equivalent of the ’Rocchio Session’ process for generat-

ing synthetic search sequences of Adolphs et al. (2022).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DecodingaNeuralRetriever'sLatentSpaceforQuerySuggestionLeonardAdolphsyMichelleChenHuebscherzChristianBuckzSertanGirginzOlivierBachemzMassimilianoCiaramitazThomasHofmannyyETHZürichladolphs@inf.ethz.chzGoogleResearchAbstractNeuralretrievalmodelshavesupersededclas-sicbag-of-wordsmethodssuchasBM25asther...

展开>> 收起<<

Decoding a Neural Retrievers Latent Space for Query Suggestion Leonard AdolphsyMichelle Chen HuebscherzChristian Buckz Sertan GirginzOlivier BachemzMassimiliano CiaramitazThomas Hofmanny.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Decoding a Neural Retrievers Latent Space for Query Suggestion Leonard AdolphsyMichelle Chen HuebscherzChristian Buckz Sertan GirginzOlivier BachemzMassimiliano CiaramitazThomas Hofmanny

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: