Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization Nishant Yadavy Nicholas Monath Rico Angelly

2025-05-03 0 0 2.21MB 24 页 10玖币
侵权投诉
Efficient Nearest Neighbor Search for Cross-Encoder Models
using Matrix Factorization
Nishant Yadav, Nicholas Monath, Rico Angell,
Manzil Zaheer, and Andrew McCallum
University of Massachusetts Amherst Google Research
{nishantyadav, rangell, mccallum}@cs.umass.edu
{nmonath,manzilzaheer}@google.com
Abstract
Efficient k-nearest neighbor search is a fun-
damental task, foundational for many prob-
lems in NLP. When the similarity is measured
by dot-product between dual-encoder vectors
or `2-distance, there already exist many scal-
able and efficient search methods. But not
so when similarity is measured by more ac-
curate and expensive black-box neural simi-
larity models, such as cross-encoders, which
jointly encode the query and candidate neigh-
bor. The cross-encoders’ high computational
cost typically limits their use to reranking can-
didates retrieved by a cheaper model, such as
dual encoder or TF-IDF. However, the accu-
racy of such a two-stage approach is upper-
bounded by the recall of the initial candidate
set, and potentially requires additional train-
ing to align the auxiliary retrieval model with
the cross-encoder model. In this paper, we
present an approach that avoids the use of a
dual-encoder for retrieval, relying solely on
the cross-encoder. Retrieval is made efficient
with CUR decomposition, a matrix decompo-
sition approach that approximates all pairwise
cross-encoder distances from a small subset
of rows and columns of the distance matrix.
Indexing items using our approach is compu-
tationally cheaper than training an auxiliary
dual-encoder model through distillation. Em-
pirically, for k > 10, our approach provides
test-time recall-vs-computational cost trade-
offs superior to the current widely-used meth-
ods that re-rank items retrieved using a dual-
encoder or TF-IDF.
1 Introduction
Finding top-
k
scoring items for a given query is a
fundamental sub-routine of recommendation and
information retrieval systems (Kowalski,2007;Das
et al.,2017). For instance, in question answering
systems, the query corresponds to a question and
the item corresponds to a document or a passage.
Neural networks are widely used to model the sim-
ilarity between a query and an item in such ap-
plications (Zamani et al.,2018;Hofstätter et al.,
2019;Karpukhin et al.,2020;Qu et al.,2021). In
this work, we focus on efficient
k
-nearest neigh-
bor search for one such similarity function – the
cross-encoder model.
Cross-encoder models output a scalar similarity
score by jointly encoding the query-item pair and
often generalize better to new domains and unseen
data (Chen et al.,2020;Wu et al.,2020;Thakur
et al.,2021) as compared to dual-encoder
1
models
which independently embed the query and the item
in a vector space, and use simple functions such as
dot-product to measure similarity. However, due
to the black-box nature of the cross-encoder based
similarity function, the computational cost for brute
force search with cross-encoders is prohibitively
high. This often limits the use of cross-encoder
models to re-ranking items retrieved using a sepa-
rate retrieval model such as a dual-encoder or a TF-
IDF-based model (Logeswaran et al.,2019;Zhang
and Stratos,2021;Qu et al.,2021). The accuracy of
such a two-stage approach is upper bounded by the
recall of relevant items by the initial retrieval model.
Much of recent work either attempts to distill in-
formation from an expensive but more expressive
cross-encoder model into a cheaper student model
such as a dual-encoder (Wu et al.,2020;Hofstätter
et al.,2020;Lu et al.,2020;Qu et al.,2021;Liu
et al.,2022), or focuses on cheaper alternatives to
the cross-encoder model while attempting to cap-
ture fine-grained interactions between the query
and the item (Humeau et al.,2020;Khattab and
Zaharia,2020;Luan et al.,2021).
In this work, we tackle the fundamental task
of efficient
k
-nearest neighbor search for a given
query according to the cross-encoder. Our pro-
posed approach, ANNCUR, uses CUR decompo-
sition (Mahoney and Drineas,2009), a matrix fac-
torization approach, to approximate cross-encoder
1also referred to as two-tower models, Siamese networks
arXiv:2210.12579v1 [cs.CL] 23 Oct 2022
Item
(i)
Query
(q)
Item
(i)
Query
(q)
Score
Query
Embedding
Item
Embedding
Score
Joint Query-Item
Embedding
Linear Layer Score
Contextualized
Query Embedding
Contextualized
Item Embedding
Query
(q)
Item
(i)
… …
Special Tokens Query & Item Tokens
a) Dual-Encoder b) [CLS]-Cross-Encoder c) [EMB]-Cross-Encoder
Model (e.g. BERT) Model (e.g. BERT)
Model
(e.g. BERT)
Model
(e.g. BERT)
(a) Model architecture
10 0 10
Query-Item Score
103
102
101
100
Score Density
DE
[emb]-CE
[cls]-CE
(b) Query-item score distribution
Figure 1: Model architecture and score distribution for three neural scoring functions. Dual-Encoder (DE) models
score a query-item pair using independently computed query and item embeddings. [CLS]-CE model computes the
score by jointly encoding the query-item pair followed by passing the joint query-item embedding through a linear
layer. Our proposed [EMB]-CE model embeds special tokens amongst query and item tokens, and computes the
query-item score using contextualixed query and item embeddings extracted using the special tokens after jointly
encoding the query-item pair.
scores for all items, and retrieves
k
-nearest neigh-
bor items while only making a small number of
calls to the cross-encoder. Our proposed method
selects a fixed set of anchor queries and anchor
items, and uses scores between anchor queries and
all items to generate latent embeddings for indexing
the item set. At test time, we generate latent em-
bedding for the query using cross-encoder scores
for the test query and anchor items, and use it to
approximate scores of all items for the given query
and/or retrieve top-
k
items according to the approx-
imate scores. In contrast to distillation-based ap-
proaches, our proposed approach does not involve
any additional compute-intensive training of a stu-
dent model such as dual-encoder via distillation.
In general, the performance of a matrix
factorization-based method depends on the rank of
the matrix being factorized. In our case, the entries
of the matrix are cross-encoder scores for query-
item pairs. To further improve rank of the score
matrix, and in-turn performance of the proposed
matrix factorization based approach, we propose
[EMB]-CE which uses a novel dot-product based
scoring mechanism for cross-encoder models (see
Figure 1a). In contrast to the widely used [CLS]-
CE approach of pooling query-item representation
into a single vector followed by scoring using a
linear layer, [EMB]-CE produces a score matrix
with a much lower rank while performing at par
with [CLS]-CE on the downstream task.
We run extensive experiments with cross-
encoder models trained for the downstream task
of entity linking. The query and item in this case
correspond to a mention of an entity in text and a
document with an entity description respectively.
For the task of retrieving
k
-nearest neighbors ac-
cording to the cross-encoder, our proposed ap-
proach presents superior recall-vs-computational
cost trade-offs over using dual-encoders trained
via distillation as well as over unsupervised TF-
IDF-based methods (§3.2). We also evaluate the
proposed method for various indexing and test-time
cost budgets as well as study the effect of various
design choices in §3.3 and §3.4.
2 Matrix Factorization for Nearest
Neighbor Search
2.1 Task Description and Background
Given a scoring function
fθ:Q × I R
that
maps a query-item pair to a scalar score, and a
query
q∈ Q
, the
k
-nearest neighbor task is to
retrieve top-
k
scoring items according to the given
scoring function fθfrom a fixed item set I.
In NLP, queries and items are typically repre-
sented as a sequence of tokens and the scoring func-
tion is typically parameterized using deep neural
models such as transformers (Vaswani et al.,2017).
There are two popular choices for the scoring func-
tion – the cross-encoder (CE) model, and the dual-
encoder (DE) model. The CE model scores a given
query-item pair by concatenating the query and the
item using special tokens, passing them through a
model (such as a transformer
T
) to obtain repre-
sentation for the input pair followed by computing
the score using linear weights wRd.
f(CE)
θ(q, i) = w>pool(T(concat(q, i)))
While effective, computing a similarity between a
query-item pair requires a full forward pass of the
model, which is often quite computationally bur-
densome. As a result, previous work uses auxiliary
retrieval models such as BM25 (Robertson et al.,
× ×
××
×
×
<latexit sha1_base64="oQVU+O3z+3b0Tq1Z2Qro1XJK9+E=">AAACGXicjVC7SgNBFJ2NrxhfUUubwRCwCrsiaBm0sREUzAOSJcxObpIhs7PrzF0xLPkOCxt/xUbEUiv/xkmyhSYWHhg4nHMvc88JYikMuu6Xk1taXlldy68XNja3tneKu3t1EyWaQ41HMtLNgBmQQkENBUpoxhpYGEhoBMOLid+4B21EpG5xFIMfsr4SPcEZWsm/6rQRHjDlURiMO8WSW3GnoIvEy0iJZPjfeKf40e5GPAlBIZfMmJbnxuinTKPgEsaFdmIgZnzI+tCyVLEQjJ9Ok41p2Spd2ou0fQrpVP25kbLQmJG9nZZDhgMz703Ev7xWgr0zPxUqThAUn33USyTFiE5qol2hgaMcWcK4FvZWygdMM462zIKN7s0HXST144pn+c1JqXqedZYnB+SQHBGPnJIquSTXpEY4uSOP5Jm8Ok/Oi/PmvM9Gc062s09+wfn8Blismds=</latexit>
<latexit sha1_base64="iRIzka+GmqobpB2vQZh808GeGOE=">AAACGHicjVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48qthXaUDbbTbt0swm7L2IJ/RsevPhXvIh47c1/4zbmoK0HBxaGmfd4OxMkUhh03U+ntLS8srpWXq9sbG5t71R391omTjXjTRbLWN8H1HApFG+iQMnvE81pFEjeDkaXM7/9wLURsbrDccL9iA6UCAWjaKXuba+L/BEzqtikV625dTcHWSReQWpQ4H/jveq0249ZGnGFTFJjOp6boJ9RjYJJPql0U8MTykZ0wDuWKhpx42d5sAk5skqfhLG2TyHJ1Z8bGY2MGUeBnYwoDs28NxP/8jophud+JlSSIrfR80NhKgnGZNYS6QvNGcqxJZRpYf9K2JBqytB2WbHRvfmgi6R1UvcsvzmtNS6KzspwAIdwDB6cQQOu4BqawCCBJ3iBN+fZeXXenY/v0ZJT7OzDLzjTL37AmWc=</latexit>
<latexit sha1_base64="JUB7YfaHaMti6YKxto8slfNvgVU=">AAACDXicjVA9SwNBFHwXv2L8ilraLAbBKtyJoGXQxtKAlwjJEfY275Ile3vH7p4QjvwCCxv/io2Irb2d/8ZNcoUmFg4sDDPzePsmTAXXxnW/nNLK6tr6RnmzsrW9s7tX3T9o6SRTDH2WiETdh1Sj4BJ9w43A+1QhjUOB7XB0PfXbD6g0T+SdGacYxHQgecQZNVZq+r1qza27M5Bl4hWkBgX+F+9VP7v9hGUxSsME1brjuakJcqoMZwInlW6mMaVsRAfYsVTSGHWQz66ZkBOr9EmUKPukITP150ROY63HcWiTMTVDvehNxb+8TmaiyyDnMs0MSjZfFGWCmIRMqyF9rpAZMbaEMsXtXwkbUkWZsQVW7One4qHLpHVW9yxvntcaV0VnZTiCYzgFDy6gATdwCz4wQHiEZ3h1npwX5815n0dLTjFzCL/gfHwDEuyUUg==</latexit>
<latexit sha1_base64="W4leED71u4SckQpFJZHUTWamqBk=">AAACGXicjVC7SgNBFJ31GeMramkzGASrsCuClkEbG0HBPCBZwuzkbjJkdnaduSuGJd9hYeOv2IhYauXfOEm20MTCAwOHc87lzj1BIoVB1/1yFhaXlldWC2vF9Y3Nre3Szm7dxKnmUOOxjHUzYAakUFBDgRKaiQYWBRIaweBi7DfuQRsRq1scJuBHrKdEKDhDK/lXnTbCA2YIBkedUtmtuBPQeeLlpExy/C/eKX20uzFPI1DIJTOm5bkJ+hnTKLiEUbGdGkgYH7AetCxVLALjZ5PLRvTQKl0axto+hXSi/pzIWGTMMApsMmLYN7PeWPzLa6UYnvmZUEmKoPh0UZhKijEd10S7QgNHObSEcS3sXynvM8042jKL9nRv9tB5Uj+ueJbfnJSr53lnBbJPDsgR8cgpqZJLck1qhJM78kieyavz5Lw4b877NLrg5DN75Becz2+MZ5n6</latexit>
<latexit sha1_base64="lq40gTxOaMOgNZc7QqqNalgvZTY=">AAACGXicjVC7SgNBFJ31GeMramkzGASrsCuClsE0lgrmAckSZid3kyGzs+vMXTEs+Q4LG3/FRsRSK//GSbKFJhYeGDiccy537gkSKQy67peztLyyurZe2Chubm3v7Jb29hsmTjWHOo9lrFsBMyCFgjoKlNBKNLAokNAMhrWJ37wHbUSsbnGUgB+xvhKh4Ayt5Ne6HYQHzBAMjrulsltxp6CLxMtJmeT4X7xb+uj0Yp5GoJBLZkzbcxP0M6ZRcAnjYic1kDA+ZH1oW6pYBMbPppeN6bFVejSMtX0K6VT9OZGxyJhRFNhkxHBg5r2J+JfXTjG88DOhkhRB8dmiMJUUYzqpifaEBo5yZAnjWti/Uj5gmnG0ZRbt6d78oYukcVrxLL85K1cv884K5JAckRPikXNSJVfkmtQJJ3fkkTyTV+fJeXHenPdZdMnJZw7ILzif33tVmfA=</latexit>
<latexit sha1_base64="bKAE4JOEb8raZhxNI+U/0tfYu5Q=">AAACHXicjVDLSsNAFL2pr1ofjbp0M1gEVyURQZdFEXSnYB/QxjKZTtqhk0mYmQgl5EtcuPFX3Ii4cCP+jZM2C21deGDgcM693DnHjzlT2nG+rNLS8srqWnm9srG5tV21d3ZbKkokoU0S8Uh2fKwoZ4I2NdOcdmJJcehz2vbHF7nffqBSsUjc6UlMvRAPBQsYwdpIfbt6eZ/2QqxHBPP0Osv6ds2pO1OgReIWpAYF/jfetz96g4gkIRWacKxU13Vi7aVYakY4zSq9RNEYkzEe0q6hAodUeek0XYYOjTJAQSTNExpN1Z8bKQ6VmoS+mcyDqHkvF//yuokOzryUiTjRVJDZoSDhSEcorwoNmKRE84khmEhm/orICEtMtCm0YqK780EXSeu47hp+e1JrnBedlWEfDuAIXDiFBlzBDTSBQAKP8Ayv1pP1Yr1Z77PRklXs7MEvWJ/fRpeazA==</latexit>
<latexit sha1_base64="ofS5HxXRmR3fc/7DxHU4ulXbCng=">AAACNXicjVDLSsNAFJ34rPUVdelmsIiuSiKCboSiCLpTMW2hiWUynbZDJw9mbsQS8gl+jQs3foU7F25E3PoLTmIX2rrwwsDhnHOZe44fC67Asl6MqemZ2bn50kJ5cWl5ZdVcW6+rKJGUOTQSkWz6RDHBQ+YAB8GasWQk8AVr+IOTXG/cMql4FF7DMGZeQHoh73JKQFNtc+f0JnUDAn1KRHqeZfgIO9gFHjCFr9ousDtISUiztlmxqlYxeBLYI1BBo/mfvW0+u52IJgELgQqiVMu2YvBSIoFTwbKymygWEzogPdbSMCT6LC8tUmd4WzMd3I2kfiHggv25kZJAqWHga2ceUI1rOfmX1kqge+ilPIwTYDp68VE3ERginFeIO1wyCmKoAaGS61sx7RNJKOiiyzq6PR50EtT3qrbGl/uV2vGosxLaRFtoF9noANXQGbpADqLoHj2gJ/RmPBqvxrvx8W2dMkY7G+jXGJ9f4ESkSA==</latexit>
where
Figure 2: CUR decomposition of a matrix (Mcomb =Manc
Mtest) using a subset of its columns (Ccomb =Canc
Ctest) and
rows (Ranc). The blue rows (Ranc) corresponding to the anchor queries and U=C
anc are used for indexing the
items to obtain item embeddings EI=U×Ranc and the green row corresponds to the test query. Note that the test
row (Mtest) can be approximated using a subset of its columns (Ctest) and the latent representation of items (EI).
1995) or a trained dual-encoder (DE) model to ap-
proximate the CE. The DE model independently
embeds the query and the item in
Rd
, for instance
by using a transformer
T
followed by pooling the
final layer representations into a single vector (e.g.
using CLS token). The DE score for the query-item
pair is computed using dot-product of the query
embedding and the item embedding.
f(DE)
θ(q, i) = pool(T(q))>pool(T(i))
In this work, we propose a method based on
CUR matrix factorization that allows efficient re-
trieval of top-
k
items by directly approximating the
cross-encoder model rather than using an auxiliary
(trained) retrieval model.
CUR Decomposition
(Mahoney and Drineas,
2009) In CUR matrix factorization, a matrix
M
Rn×m
is approximated using a subset of its rows
R=M[Sr,:] Rk1×m
, a subset of its columns
C=M[:, Sc]Rn×k2
and a joining matrix
U
Rk2×k1as follows
˜
M=CUR
where
Sr
and
Sc
are the indices corresponding to
rows
R
and columns
C
respectively, and the joining
matrix
U
optimizes the approximation error. In this
work, we set
U
to be the Moore-Penrose pseudo-
inverse of
M[Sr, Sc]
, the intersection of matrices
C
and
R
, in which case
˜
M
is known as the skeleton
approximation of M(Goreinov et al.,1997).
2.2 Proposed Method Overview
Our proposed method ANNCUR, which stands for
A
pproximate
N
earest
N
eighbor search using
CUR
decomposition, begins by selecting a fixed set of
kq
anchor queries (
Qanc
) and
ki
anchor items (
Ianc
).
It uses scores between queries
q∈ Qanc
and all
items
i∈ I
to index the item set by generating
latent item embeddings. At test time, we compute
exact scores between the test query
qtest
and the
anchor items
Ianc
, and use it to approximate scores
of all items for the given query and/or retrieve top-
k
items according to the approximate scores. We
could optionally retrieve
kr> k
items, re-rank
them using exact fθscores and return top-kitems.
Let
Mcomb =Manc
Mtest
and
Ccomb =Canc
Ctest
where
Manc =Ranc Rkq×|I|
contains scores for
the anchor queries and all items,
Mtest R1×|I|
contains scores for a test query and all items,
Canc Rkq×ki
contains scores for the anchor
queries and the anchor items, and
Ctest R1×ki
contains scores for the test query paired with the
anchor items.
Using CUR decomposition, we can approximate
Mcomb
using a subset of its columns (
Ccomb
) cor-
responding to the anchor items and a subset of its
rows (
Ranc
) corresponding to the anchor queries as
˜
Mcomb =CcombURanc
˜
Manc
˜
Mtest=Canc
CtestURanc
˜
Manc =CancURanc and ˜
Mtest =CtestURanc
Figure 2shows CUR decomposition of matrix
Mcomb
. At test time,
˜
Mtest
containing approximate
item scores for the test query can be computed us-
ing
Ctest, U
, and
Ranc
where
Ctest
contains exact
fθ
scores between the test query and the anchor items.
Matrices
U
and
Ranc
can be computed offline as
these are independent of the test query.
2.3 Offline Indexing
The indexing process first computes
Ranc
contain-
ing scores between the anchor queries and all items.
Ranc(q, i) = fθ(q, i),(q, i)∈ Qanc × I
We embed all items in Rkias
EI=U×Ranc
where
U=C
anc
is the pseudo-inverse of
Canc
.
Each column of
EIRki×|I|
corresponds to a
latent item embedding.
2.4 Test-time Inference
At test time, we embed the test query
q
in
Rki
using
scores between qand anchor items Ianc.
eq= [fθ(q, i)]i∈Ianc
We approximate the score for a query-item pair
(
q, i
) using inner-product of
eq
and
EI[:, i]
where
EI[:, i]Rkiis the embedding of item i.
ˆ
fθ(q, i) = e>
qEI[:, i]
We can use
eqRki
along with an off-the-
shelf nearest-neighbor search method for maximum
inner-product search (Malkov and Yashunin,2018;
Johnson et al.,2019;Guo et al.,2020) and retrieve
top-scoring items for the given query
q
according
to the approximate query-item scores without ex-
plicitly approximating scores for all the items.
2.5 Time Complexity
During indexing stage, we evaluate
fθ
for
kq|I|
query-item pairs, and compute the pseudo-inverse
of a
kq×ki
matrix. The overall time complexity of
the indexing stage is
O(Cfθkq|I| +Ckq,ki
inv )
, where
Ckq,ki
inv
is the cost of computing the pseudo-inverse
of a
kq×ki
matrix, and
Cfθ
is the cost of computing
fθ
on a query-item pair. For CE models used in
this work, we observe that Cfθkq|I|  Ckq,ki
inv .
At test time, we need to compute
fθ
for
ki
query-
item pairs followed by optionally re-ranking
kr
items retrieved by maximum inner-product search
(MIPS). Overall time complexity for inference is
O(ki+kr)Cfθ+C|I|,kr
MIPS
, where
O(C|I|,kr
MIPS )
is the
time complexity of MIPS over
|I|
items to retrieve
top-kritems.
2.6 Improving score distribution of CE
models for matrix factorization
The rank of the query-item score matrix, and in
turn, the approximation error of a matrix factor-
ization method depends on the scores in the ma-
trix. Figure 1b shows a histogram of query-item
score distribution (adjusted to have zero mean)
for a dual-encoder and [CLS]-CE model. We use
[CLS]-CE to refer to a cross-encoder model pa-
rameterized using transformers which uses CLS
token to compute a pooled representation of the
input query-item pair. Both the models are trained
for zero-shot entity linking (see §3.1 for details).
As shown in the figure, the query-item score dis-
tribution for the [CLS]-CE model is significantly
skewed with only a small fraction of items (enti-
ties) getting high scores while the score distribution
for a dual-encoder model is less so as it is gener-
ated explicitly using dot-product of query and item
embeddings. The skewed score distribution from
[CLS]-CE leads to a high rank query-item score
matrix, which results in a large approximation error
for matrix decomposition methods.
We propose a small but important change to the
scoring mechanism of the cross-encoder so that it
yields a less skewed score distribution, thus making
it much easier to approximate the corresponding
query-item score matrix without adversely affect-
ing the downstream task performance. Instead of
using CLS token representation to score a given
query-item pair, we add special tokens amongst
the query and the item tokens and extract contex-
tualized query and item representations using the
special tokens after jointly encoding the query-item
pair using a model such as a transformer T.
eCE
q, eCE
i=pool(T(concat(q, i)))
The final score for the given query-item pair is
computed using dot-product of the contextualized
query and item embeddings.
f([EMB]-CE)
θ(q, i)=(eCE
q)>eCE
i
We refer to this model as [EMB]-CE. Figure 1a
shows high-level model architecture for dual-
encoders, [CLS]-CE and [EMB]-CE model.
As shown in Figure 1b, the query-item score dis-
tribution from an [EMB]-CE model resembles that
from a DE model. Empirically, we observe that
rank of the query-item score matrix for [EMB]-CE
model is much lower than the rank of a similar
matrix computed using [CLS]-CE, thus making it
much easier to approximate using matrix decompo-
sition based methods.
3 Experiments
In our experiments, we use CE models trained for
zero-shot entity linking on ZESHEL dataset (§3.1).
We evaluate the proposed method and various base-
lines on the task of finding
k
-nearest neighbors
for cross-encoder models in §3.2, and evaluate the
proposed method for various indexing and test-
time cost budgets as well as study the effect of
various design choices in §3.3 and §3.4. All re-
sources for the paper including code for all ex-
periments and model checkpoints is available at
https://github.com/iesl/anncur
ZESHEL Dataset
The
Ze
ro-
Sh
ot
E
ntity
L
inking (ZESHEL) dataset was constructed
by Logeswaran et al. (2019) from Wikia. The task
of zero-shot entity linking involves linking entity
mentions in text to an entity from a list of entities
with associated descriptions. The dataset consists
of 16 different domains with eight, four, and four
domains in training, dev, and test splits respectively.
Each domain contains non-overlapping sets of
entities, thus at test time, mentions need to be
linked to unseen entities solely based on entity
descriptions. Table 1in the appendix shows
dataset statistics. In this task, queries correspond
to mentions of entities along with the surrounding
context, and items correspond to entities with their
associated descriptions.
3.1 Training DE and CE models on ZESHEL
Following the precedent set by recent papers (Wu
et al.,2020;Zhang and Stratos,2021), we first
train a dual-encoder model on ZESHEL training
data using hard negatives. We train a cross-encoder
model for the task of zero-shot entity-linking on
all eight training domains using cross-entropy loss
with ground-truth entity and negative entities mined
using the dual-encoder. We refer the reader to Ap-
pendix A.1 for more details.
Results on downstream task of Entity Linking
To evaluate the cross-encoder models, we retrieve
64 entities for each test mention using the dual-
encoder model and re-rank them using a cross-
encoder model. The top-64 entities retrieved by the
DE contain the ground-truth entity for 87.95% men-
tions in test data and 92.04% mentions in dev data.
The proposed [EMB]-CE model achieves an aver-
age accuracy of 65.49 and 66.86 on domains in test
and dev set respectively, and performs at par with
the widely used and state-of-the-art
2
[CLS]-CE
2
We observe that our implementation of [CLS]-CE obtains
slightly different results as compared to state-of-the-art (see
Table 2 in Zhang and Stratos (2021) ) likely due to minor
implementation/training differences.
architecture which achieves an accuracy of 65.87
and 67.67 on test and dev set respectively. Since
[EMB]-CE model performs at par with [CLS]-CE
on the downstream task of entity linking, and rank
of the score matrix from [EMB]-CE is much lower
than that from [CLS]-CE, we use [EMB]-CE in
subsequent experiments.
3.2 Evaluating on k-NN search for CE
Experimental Setup
For all experiments in this
section, we use the [EMB]-CE model trained on
original ZESHEL training data on the task of
zero-shot entity linking, and evaluate the proposed
method and baselines for the task of retrieving
k
-
nearest neighbor entities (items) for a given men-
tion (query) according to the cross-encoder model.
We run experiments separately on five domains
from ZESHEL containing 10K to 100K items. For
each domain, we compute the query-item score
matrix for a subset or all of the queries (mentions)
and all items (entities) in the domain. We randomly
split the query set into a training set (
Qtrain
) and a
test set (
Qtest
). We use the queries in training data
to train baseline DE models. For ANNCUR, we
use the training queries as anchor queries and use
CE scores between the anchor queries and all items
for indexing as described in §2.3. All approaches
are then evaluated on the task of finding top-
k
CE
items for queries in the corresponding domain’s
test split. For a fair comparison, we do not train
DE models on multiple domains at the same time.
3.2.1 Baseline Retrieval Methods
TF-IDF:
All queries and items are embedded us-
ing a TF-IDF vectorizer trained on item descriptions
and top-
k
items are retrieved using the dot-product
of query and item embeddings.
DE models:
We experiment with DE
BASE
, the
DE model trained on ZESHEL for the task of en-
tity linking (see §3.1), and the following two DE
models trained via distillation from the CE.
DE
BERT+CE
: DE initialized with BERT (Devlin
et al.,2019) and trained only using training
signal from the cross-encoder model.
DE
BASE+CE
: DE
BASE
model further fine-tuned
via distillation using the cross-encoder model.
We refer the reader to Appendix A.3 for hyper-
parameter and optimization details.
摘要:

EfcientNearestNeighborSearchforCross-EncoderModelsusingMatrixFactorizationNishantYadavy,NicholasMonath},RicoAngelly,ManzilZaheer},andAndrewMcCallumyyUniversityofMassachusettsAmherst}GoogleResearch{nishantyadav,rangell,mccallum}@cs.umass.edu{nmonath,manzilzaheer}@google.comAbstractEfcientk-nearestn...

展开>> 收起<<
Efficient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization Nishant Yadavy Nicholas Monath Rico Angelly.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:2.21MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注