Decoding a Neural Retrievers Latent Space for Query Suggestion Leonard AdolphsyMichelle Chen HuebscherzChristian Buckz Sertan GirginzOlivier BachemzMassimiliano CiaramitazThomas Hofmanny

2025-05-06 0 0 712.4KB 20 页 10玖币
侵权投诉
Decoding a Neural Retriever’s Latent Space for Query Suggestion
Leonard AdolphsMichelle Chen HuebscherChristian Buck
Sertan GirginOlivier BachemMassimiliano CiaramitaThomas Hofmann
ETH Zürich
ladolphs@inf.ethz.ch
Google Research
Abstract
Neural retrieval models have superseded clas-
sic bag-of-words methods such as BM25 as
the retrieval framework of choice. However,
neural systems lack the interpretability of bag-
of-words models; it is not trivial to connect a
query change to a change in the latent space
that ultimately determines the retrieval results.
To shed light on this embedding space, we
learn a “query decoder” that, given a latent
representation of a neural search engine, gen-
erates the corresponding query. We show that
it is possible to decode a meaningful query
from its latent representation and, when mov-
ing in the right direction in latent space, to de-
code a query that retrieves the relevant para-
graph. In particular, the query decoder can be
useful to understand “what should have been
asked” to retrieve a particular paragraph from
the collection. We employ the query decoder
to generate a large synthetic dataset of query
reformulations for MSMarco, leading to im-
proved retrieval performance. On this data, we
train a pseudo-relevance feedback (PRF) T5
model for the application of query suggestion
that outperforms both query reformulation and
PRF information retrieval baselines.
1 Introduction
Neural encoder models (Karpukhin et al.,2020;Ni
et al.,2021;Izacard et al.,2021) have improved
document retrieval in various settings. They have
become an essential building block for applications
in open-domain question answering (Karpukhin
et al.,2020;Lewis et al.,2020b;Izacard and Grave,
2021), open-domain conversational agents (Shuster
et al.,2021;Adolphs et al.,2021), and, recently,
language modeling (Shuster et al.,2022). Neural
encoders embed documents and queries in a shared
(or joint) latent space, so that paragraphs can be
ranked and retrieved based on their vector similar-
ity with a given query. This constitutes a concep-
tually powerful approach to discovering semantic
similarities between queries and documents that is
often found to be more nuanced than simple term
frequency statistics typical of classic sparse rep-
resentations. However, such encoders may come
with shortcomings in practice. First, they are prone
to domain overfitting, failing to consistently outper-
form bag-of-words approaches on out-of-domain
queries (Thakur et al.,2021). Second, they are no-
toriously hard to interpret as similarity is no longer
controlled by word overlap, but rather by seman-
tic similarities that lack explainability. Third, they
may be non-robust as small changes in the query
can lead to inexplicably different retrieval results.
In bag-of-words models, it can be straightfor-
ward to modify a query to retrieve a given docu-
ment: e.g., following insights from relevance feed-
back (Rocchio,1971), by increasing the weight of
terms contained in the target document (Adolphs
et al.,2022;Huebscher et al.,2022). This approach
is not trivially applicable to neural retrieval models
as it is unclear how an added term might change
the latent code of a query.
In this paper, we look into the missing link connect-
ing latent codes back to actual queries. We thus
propose to train a “query decoder”, which maps
embeddings in the shared query-document space to
query strings, inverting the fixed encoder of the neu-
ral retriever (cf. Figure 1a). As we will show, such
a decoder lets us find queries that are optimized to
retrieve a given target document. It deciphers what
information is in the latent code of a document and
how to phrase a query to retrieve it.
We use this model to explore the latent space of
a state-of-the-art neural retrieval model, GTR (Ni
et al.,2021). In particular, we leverage the structure
of the latent space by traversing from the embed-
ding of a specific query to its human-labeled gold
paragraph and use our query decoder to generate
reformulation examples from intermediate points
along the path as shown in Figure 1b. We find
that using this approach, we can generate a large
arXiv:2210.12084v1 [cs.CL] 21 Oct 2022
Query Encoder
(fixed)
Query
Decoder
Query
Query
Embedding
(a)
Original
Query
Gold
Paragraph
P1
P0
Query Decoder
P3
P2
Retrieved Documents
Latent Space
Reformulation 1
Reformulation 2
Reformulation 3
Query Space
(b)
Who is the chess champion of the world?
Who was the chess champion of the 90s?
Who is the reigning world chess champion?
Who had the highest FIDE ranking ever?
(c)
Figure 1: We train a query decoder (QD) model that inverts the shared encoder of a neural retrieval model (a). Then,
we leverage the structure of the latent space of a neural retrieval model by traversing from query to gold paragraph
embeddings and using our query decoder to generate a dataset of successful query reformulations (b). Finally, we
train a pseudo-relevance feedback query suggestion model on this dataset that predicts promising rewrites, given a
query and its search results (c).
dataset of query reformulations on MSMarco-train
(Nguyen et al.,2016) that improve retrieval per-
formance without needing additional human label-
ing. We use this dataset to train a pseudo-relevance
feedback (PRF) query suggestion model. Here,
we fine-tune a T5-large model (Raffel et al.,2020)
that uses the original query, together with its top-5
GTR search results, as the input context to predict
a query suggestions as depicted in Figure 1c. We
show that our model provides fluent, diverse query
suggestions with better retrieval performance than
various baselines, including a T5 model trained
on question editing (Chu et al.,2020), and a PRF
query expansion model (Pal et al.,2013).
We make the resources to reproduce the results
publicly available1.
2 Related Work
Neural Retriever
Classic retrieval systems such
as BM25 (Robertson and Zaragoza,2009) use term
frequency statistics to determine the relevancy of
a document for a given query. Recently, neural
retrieval models have become more popular and
started to outperform classic systems on multiple
search tasks. Karpukhin et al. (2020) use a dual-
encoder setup based on BERT-base (Devlin et al.,
2019), called DPR, to encode query and documents
separately and use maximum inner product search
(Shrivastava and Li,2014) to find a match. They
use this model to improve recall and answer quality
for multiple open-domain question-answer datasets,
including OpenQA-NQ (Lee et al.,2019). Ni et al.
(2021) show that scaling up the dual encoder archi-
tecture improves the retrieval performance. They
train a shared dual encoder model, based on T5
1https://github.com/leox1v/query_decoder
(Raffel et al.,2020), in a multi-stage manner, in-
cluding fine-tuning on MSMarco (Nguyen et al.,
2016), and evaluate on the range of retrieval tasks
of the BEIR benchmark (Thakur et al.,2021). Izac-
ard et al. (2021) show that one can train an unsu-
pervised dense retriever and be competitive against
strong baselines on the BEIR benchmark.
Xiong et al. (2021) propose approximate nearest
neighbor negative contrastive learning (ANCE) to
learn a dense retrieval system. On top of this
dense retriever, Li et al. (2022) consider a pseudo-
relevance feedback method. Other than our ap-
proach, this method does not provide the user with
rephrased queries.
Applications of Neural Retrievers
Neural re-
trieval models have been at the core of recent
improvements among a range of different NLP
tasks. Lewis et al. (2020b) augment a language
generation model, BART (Lewis et al.,2020a),
with a DPR neural retriever and evaluate on multi-
ple knowledge-intensive NLP tasks; most notably,
they improve over previous models on multiple
open-domain QA benchmarks using an abstractive
method.
Izacard and Grave (2021) propose the Fusion-in-
Decoder method to aggregate a large set of docu-
ments from the neural retriever and provide them
to the model during answer generation. Their fo-
cus is on open-domain QA where they significantly
outperform previous models when considering a
large set of documents during decoding.
Shuster et al. (2021) use neural retrieval models
to improve conversational agents in knowledge-
grounded dialogue. They show that the issue of
hallucination – i.e., generating factual incorrect
knowledge statements – can be significantly re-
duced when using a neural-retriever-in-the-loop
architecture. Separating the retrieval-augmented
knowledge generation and the conversational re-
sponse generation can further improve the issue
of hallucination in knowledge-grounded dialogue
and helps fuse modular QA and dialogue models
(Adolphs et al.,2021). Recently, retrieval query
generation approaches have been proposed to im-
prove open-domain dialogue (Komeili et al.,2021)
and language modeling (Shuster et al.,2022).
Query Generation
Query optimization is a long-
standing problem in IR (Lau and Horvitz,1999;
Teevan et al.,2004). Recent work has investigated
query refinement with reinforcement learning for
Open Domain and Conversational Question An-
swering (Nogueira and Cho,2017;Buck et al.,
2018;Wu et al.,2021).
The methods presented in this paper are a natural
complement to the work of Adolphs et al. (2022),
who propose a heuristic approach to generate multi-
step query refinements, used to train sequential
query generation models for the task of learning
to search. Their method is also inspired by rele-
vance feedback, but they seek to reach the gold
document purely in language space, by brute force
exploration. For this purpose, they use specialized
search operators to condition the retrieval results as
desired. Huebscher et al. (2022) show that, when
paired with a hybrid sparse/dense retrieval environ-
ment, the search agents trained on this kind of syn-
thetic data combine effective corpus exploration,
competitive performance and interpretability.
Web-GPT (Nakano et al.,2021) presents an end-
to-end search modeling approach based on human
demonstrations, in a similar spirit our work could
be seen as way of involving humans-in-loop by
proposing better queries.
Fixed-vector decoders
Probabilistic decoders
mapping from a fixed size vector space to natural
language have also been explored in auto-encoder
settings. A key challenge in this line of work lies in
obtaining decoders that are robust, i.e., they gener-
ate natural text for a variety of input vectors. Bow-
man et al. (2016) proposed using a RNN-based
language model in combination with variational
autoencoders (VAE) (Kingma and Welling,2013)
which add Gaussian space to the decoder input.
Zhao et al. (2018) proposed the use of Adversar-
ial Autoencoder (AAE) (Makhzani et al.,2015)
to which Shen et al. (2020) added data denoising
Who is the chess champion of the world?
GTR Query
Encoder
Query
Decoder
Who is the best chess player?
GTR Query
Encoder
Cos Sim
F1
Data F1 Cos Sim
MSMarco 0.750 0.960
NQ 0.886 0.980
Table 1: Decoding metrics of the Query Decoder (QD)
based on the GTR-base neural retrieval model. The F1
score is the F1 word overlap between the original query,
of MSMarco or NQ, and the output of the query de-
coder model when provided with the GTR encoding of
the query. The cosine similarity is measured between
the re-encoding of the generated query and the encod-
ing of the original query. The figure above depicts the
metrics visually with a toy example for clarity.
by randomly dropping words in the input and the
reconstructing the full output.
Recently, RNN-based decoders have been replaced
by Transformer-based language models (Vaswani
et al.,2017), for example by Montero et al. (2021),
Park and Lee (2021) and Li et al. (2020).
3 Query Decoder
Training
We train a T5 (Raffel et al.,2020),
decoder-only model, to (re-)generate a query from
its embedding obtained in a neural retrieval model.
As training data, we use a subset of 3 million
queries of the PAQ dataset (Lewis et al.,2021).
We use the GTR-base (Ni et al.,2021), shared-
encoder model, to generate the embeddings and
use the queries as the targets. The objective of the
query decoder learning is to invert the mapping of
the fixed GTR encoder model, as visually depicted
in Figure 1a. More training details of the query
decoder are provided in Appendix A.2.1.
Query Reconstruction Evaluation
We con-
sider the round-trip consistency as a first step in
evaluating the query decoder’s effectiveness. A
query
q
is encoded via GTR and then decoded by
our decoder to generate
q0
. We use queries from
MSMarco, and NQ test sets of the BEIR bench-
mark (Thakur et al.,2021). As a first metric, we
compute the F1 score between the original
q
and its
reconstruction
q0
. Since word-overlap is imperfect
in measuring query drift, we further re-encode
q0
and compare its latent code with the code for
q
via
their cosine similarity. The results of these evalua-
tions are reported in Table 1, where we also provide
an illustrative example of the proposed approach.
For both datasets, MSMarco and NQ, the metrics
of F1 and cosine similarity are generally high, indi-
cating that the GTR code carries information that
allows for close approximate query reconstruction.
Query
Decoder
Decoded Query GTR
retrieval
Paragraph
GTR
Encoder
Paragraph
Data Top1 Top3 Top5
MSMarco 0.551 0.737 0.796
NQ 0.721 0.863 0.897
Table 2: Share of gold paragraphs for which we can de-
code a query that retrieves the given paragraph within
its top-k GTR search results. The figure above depicts
the metric evaluation visually for clarity.
Paragraph to Query Evaluation
Many interest-
ing use cases rely on the ability to generate queries
from passages of text (Du et al.,2017;Kumar et al.,
2018). As GTR embeds document paragraphs and
queries into the the same space, the query decoder
can also be used to invert the retrieval process. We
thus evaluate the decoder quality by starting from
a document paragraph, decoding a query from its
embedding and then running the GTR search en-
gine on that query to check if this query retrieves
the desired paragraph as a top-ranked result. We
test this in an experiment with human-labeled gold
paragraphs from MSMarco and NQ, using top-k as
the success metric. The results reported in Table 2
are very encouraging in that the desired paragraph
is indeed found very often among the topmost GTR
search results. Two example paragraph decodings
from MSMarco are shown in Table 3; for both de-
codings, the gold paragraph is retrieved at the top
position.
Latent Space Traversal Decoding
We have
shown that query decoding can reconstruct queries
and that it can find retrieval queries for target pas-
sages. We now turn to a more concrete practical
Original Query
nebl coin price [Rank: 2]
Decoding from Gold Paragraph
what is the current price of neblio today belo [Rank: 1]
Gold Paragraph
Neblio Price Chart US Dollar (NEBL/USD) Neblio price for
today is $16.3125. It has a current circulating supply of 12.8
Million coins and a total volume exchanged of $9,701,465
Original Query
when is champaign il midterm elections [Rank: 3]
Decoding from Gold Paragraph
when is the general election in illinois 2018 [Rank: 1]
Gold Paragraph
Illinois elections, 2018. A general election will be held
in the U.S. state of Illinois on November 6, 2018. All of
Illinois’ executive officers will be up for election as well as
all of Illinois’ eighteen seats in the United States House of
Representatives.
Table 3: Examples of query decodings from the gold
paragraph. The rank indicates the retrieval position of
the gold paragraph using the corresponding query.
application, namely to automatically generate a
data set of query reformulations, from which strate-
gies for interactive retrieval can be learned. In this
context, reformulated queries should remain seman-
tically similar to the original query and not overfit
to the target passage. They should be somewhat ’in
between’ the query and the gold passage, as any
passage is likely to contain answers to multiple,
different questions. This can be operationalized by
decoding queries from points along the line con-
necting the embeddings of the query and its target
passage as depicted in Figure 1b.
To validate this idea, we apply it to the MSMarco
and NQ retrieval dataset where each query is paired
with a human-labeled gold paragraph. In particu-
lar, we move in
k
equidistant increments from the
original query embedding
q
to the gold paragraph
embedding d, i.e.
qκ=q+κ
k(dq)κ= 0, . . . , k (1)
and generate a reformulation at each step.
2
As a
sanity check, Figure 3shows the average retrieval
performance of the decoded queries when mov-
ing from the original query embedding to the gold
paragraph embedding for MSMarco and NQ. For
both datasets, the normalized discounted cumula-
tive gain (nDCG) (Järvelin and Kekäläinen,2002)
2
We underline that this procedure can be seen as a latent
space equivalent of the ’Rocchio Session’ process for generat-
ing synthetic search sequences of Adolphs et al. (2022).
摘要:

DecodingaNeuralRetriever'sLatentSpaceforQuerySuggestionLeonardAdolphsyMichelleChenHuebscherzChristianBuckzSertanGirginzOlivierBachemzMassimilianoCiaramitazThomasHofmannyyETHZürichladolphs@inf.ethz.chzGoogleResearchAbstractNeuralretrievalmodelshavesupersededclas-sicbag-of-wordsmethodssuchasBM25asther...

展开>> 收起<<
Decoding a Neural Retrievers Latent Space for Query Suggestion Leonard AdolphsyMichelle Chen HuebscherzChristian Buckz Sertan GirginzOlivier BachemzMassimiliano CiaramitazThomas Hofmanny.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:20 页 大小:712.4KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注