Efﬁcient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization Nishant Yadavy Nicholas Monath Rico Angelly

2025-05-03 0 0 2.21MB 24 页 10玖币

侵权投诉

Efﬁcient Nearest Neighbor Search for Cross-Encoder Models

using Matrix Factorization

Nishant Yadav†, Nicholas Monath♦, Rico Angell†,

Manzil Zaheer♦, and Andrew McCallum†

†University of Massachusetts Amherst ♦Google Research

{nishantyadav, rangell, mccallum}@cs.umass.edu

{nmonath,manzilzaheer}@google.com

Abstract

Efﬁcient k-nearest neighbor search is a fun-

damental task, foundational for many prob-

lems in NLP. When the similarity is measured

by dot-product between dual-encoder vectors

or `2-distance, there already exist many scal-

able and efﬁcient search methods. But not

so when similarity is measured by more ac-

curate and expensive black-box neural simi-

larity models, such as cross-encoders, which

jointly encode the query and candidate neigh-

bor. The cross-encoders’ high computational

cost typically limits their use to reranking can-

didates retrieved by a cheaper model, such as

dual encoder or TF-IDF. However, the accu-

racy of such a two-stage approach is upper-

bounded by the recall of the initial candidate

set, and potentially requires additional train-

ing to align the auxiliary retrieval model with

the cross-encoder model. In this paper, we

present an approach that avoids the use of a

dual-encoder for retrieval, relying solely on

the cross-encoder. Retrieval is made efﬁcient

with CUR decomposition, a matrix decompo-

sition approach that approximates all pairwise

cross-encoder distances from a small subset

of rows and columns of the distance matrix.

Indexing items using our approach is compu-

tationally cheaper than training an auxiliary

dual-encoder model through distillation. Em-

pirically, for k > 10, our approach provides

test-time recall-vs-computational cost trade-

offs superior to the current widely-used meth-

ods that re-rank items retrieved using a dual-

encoder or TF-IDF.

1 Introduction

Finding top-

scoring items for a given query is a

fundamental sub-routine of recommendation and

information retrieval systems (Kowalski,2007;Das

et al.,2017). For instance, in question answering

systems, the query corresponds to a question and

the item corresponds to a document or a passage.

Neural networks are widely used to model the sim-

ilarity between a query and an item in such ap-

plications (Zamani et al.,2018;Hofstätter et al.,

2019;Karpukhin et al.,2020;Qu et al.,2021). In

this work, we focus on efﬁcient

-nearest neigh-

bor search for one such similarity function – the

cross-encoder model.

Cross-encoder models output a scalar similarity

score by jointly encoding the query-item pair and

often generalize better to new domains and unseen

data (Chen et al.,2020;Wu et al.,2020;Thakur

et al.,2021) as compared to dual-encoder

models

which independently embed the query and the item

in a vector space, and use simple functions such as

dot-product to measure similarity. However, due

to the black-box nature of the cross-encoder based

similarity function, the computational cost for brute

force search with cross-encoders is prohibitively

high. This often limits the use of cross-encoder

models to re-ranking items retrieved using a sepa-

rate retrieval model such as a dual-encoder or a TF-

IDF-based model (Logeswaran et al.,2019;Zhang

and Stratos,2021;Qu et al.,2021). The accuracy of

such a two-stage approach is upper bounded by the

recall of relevant items by the initial retrieval model.

Much of recent work either attempts to distill in-

formation from an expensive but more expressive

cross-encoder model into a cheaper student model

such as a dual-encoder (Wu et al.,2020;Hofstätter

et al.,2020;Lu et al.,2020;Qu et al.,2021;Liu

et al.,2022), or focuses on cheaper alternatives to

the cross-encoder model while attempting to cap-

ture ﬁne-grained interactions between the query

and the item (Humeau et al.,2020;Khattab and

Zaharia,2020;Luan et al.,2021).

In this work, we tackle the fundamental task

of efﬁcient

-nearest neighbor search for a given

query according to the cross-encoder. Our pro-

posed approach, ANNCUR, uses CUR decompo-

sition (Mahoney and Drineas,2009), a matrix fac-

torization approach, to approximate cross-encoder

1also referred to as two-tower models, Siamese networks

arXiv:2210.12579v1 [cs.CL] 23 Oct 2022

Item

(i)

Query

(q)

Item

(i)

Query

(q)

Score

Query

Embedding

Item

Embedding

Score

Joint Query-Item

Embedding

Linear Layer Score

Contextualized

Query Embedding

Contextualized

Item Embedding

Query

(q)

………

Item

(i)

…… …

Special Tokens Query & Item Tokens

a) Dual-Encoder b) [CLS]-Cross-Encoder c) [EMB]-Cross-Encoder

Model (e.g. BERT) Model (e.g. BERT)

Model

(e.g. BERT)

Model

(e.g. BERT)

(a) Model architecture

−10 0 10

Query-Item Score

10−3

10−2

10−1

100

Score Density

[emb]-CE

[cls]-CE

(b) Query-item score distribution

Figure 1: Model architecture and score distribution for three neural scoring functions. Dual-Encoder (DE) models

score a query-item pair using independently computed query and item embeddings. [CLS]-CE model computes the

score by jointly encoding the query-item pair followed by passing the joint query-item embedding through a linear

layer. Our proposed [EMB]-CE model embeds special tokens amongst query and item tokens, and computes the

query-item score using contextualixed query and item embeddings extracted using the special tokens after jointly

encoding the query-item pair.

scores for all items, and retrieves

-nearest neigh-

bor items while only making a small number of

calls to the cross-encoder. Our proposed method

selects a ﬁxed set of anchor queries and anchor

items, and uses scores between anchor queries and

all items to generate latent embeddings for indexing

the item set. At test time, we generate latent em-

bedding for the query using cross-encoder scores

for the test query and anchor items, and use it to

approximate scores of all items for the given query

and/or retrieve top-

items according to the approx-

imate scores. In contrast to distillation-based ap-

proaches, our proposed approach does not involve

any additional compute-intensive training of a stu-

dent model such as dual-encoder via distillation.

In general, the performance of a matrix

factorization-based method depends on the rank of

the matrix being factorized. In our case, the entries

of the matrix are cross-encoder scores for query-

item pairs. To further improve rank of the score

matrix, and in-turn performance of the proposed

matrix factorization based approach, we propose

[EMB]-CE which uses a novel dot-product based

scoring mechanism for cross-encoder models (see

Figure 1a). In contrast to the widely used [CLS]-

CE approach of pooling query-item representation

into a single vector followed by scoring using a

linear layer, [EMB]-CE produces a score matrix

with a much lower rank while performing at par

with [CLS]-CE on the downstream task.

We run extensive experiments with cross-

encoder models trained for the downstream task

of entity linking. The query and item in this case

correspond to a mention of an entity in text and a

document with an entity description respectively.

For the task of retrieving

-nearest neighbors ac-

cording to the cross-encoder, our proposed ap-

proach presents superior recall-vs-computational

cost trade-offs over using dual-encoders trained

via distillation as well as over unsupervised TF-

IDF-based methods (§3.2). We also evaluate the

proposed method for various indexing and test-time

cost budgets as well as study the effect of various

design choices in §3.3 and §3.4.

2 Matrix Factorization for Nearest

Neighbor Search

2.1 Task Description and Background

Given a scoring function

fθ:Q × I → R

that

maps a query-item pair to a scalar score, and a

query

q∈ Q

, the

-nearest neighbor task is to

retrieve top-

scoring items according to the given

scoring function fθfrom a ﬁxed item set I.

In NLP, queries and items are typically repre-

sented as a sequence of tokens and the scoring func-

tion is typically parameterized using deep neural

models such as transformers (Vaswani et al.,2017).

There are two popular choices for the scoring func-

tion – the cross-encoder (CE) model, and the dual-

encoder (DE) model. The CE model scores a given

query-item pair by concatenating the query and the

item using special tokens, passing them through a

model (such as a transformer

) to obtain repre-

sentation for the input pair followed by computing

the score using linear weights w∈Rd.

f(CE)

θ(q, i) = w>pool(T(concat(q, i)))

While effective, computing a similarity between a

query-item pair requires a full forward pass of the

model, which is often quite computationally bur-

densome. As a result, previous work uses auxiliary

retrieval models such as BM25 (Robertson et al.,

≈

≈× ×

≈

××

≈×

<latexit sha1_base64="oQVU+O3z+3b0Tq1Z2Qro1XJK9+E=">AAACGXicjVC7SgNBFJ2NrxhfUUubwRCwCrsiaBm0sREUzAOSJcxObpIhs7PrzF0xLPkOCxt/xUbEUiv/xkmyhSYWHhg4nHMvc88JYikMuu6Xk1taXlldy68XNja3tneKu3t1EyWaQ41HMtLNgBmQQkENBUpoxhpYGEhoBMOLid+4B21EpG5xFIMfsr4SPcEZWsm/6rQRHjDlURiMO8WSW3GnoIvEy0iJZPjfeKf40e5GPAlBIZfMmJbnxuinTKPgEsaFdmIgZnzI+tCyVLEQjJ9Ok41p2Spd2ou0fQrpVP25kbLQmJG9nZZDhgMz703Ev7xWgr0zPxUqThAUn33USyTFiE5qol2hgaMcWcK4FvZWygdMM462zIKN7s0HXST144pn+c1JqXqedZYnB+SQHBGPnJIquSTXpEY4uSOP5Jm8Ok/Oi/PmvM9Gc062s09+wfn8Blismds=</latexit>

<latexit sha1_base64="wje2KHrEZ457AKpciujWPnubkC0=">AAACGXicjVC7SgNBFJ2NrxhfUUubwRCwCrsiaBlMY6lgHpAsYXZykwyZ3Vln7ophyXdY2PgrNiKWWvk3TpItNLHwwMDhnHuZe04QS2HQdb+c3Mrq2vpGfrOwtb2zu1fcP2gYlWgOda6k0q2AGZAigjoKlNCKNbAwkNAMRrWp37wHbYSKbnEcgx+yQST6gjO0kl/rdhAeMOUqDCbdYsmtuDPQZeJlpEQy/G+8W/zo9BRPQoiQS2ZM23Nj9FOmUXAJk0InMRAzPmIDaFsasRCMn86STWjZKj3aV9q+COlM/bmRstCYsb2dlkOGQ7PoTcW/vHaC/Qs/FVGcIER8/lE/kRQVndZEe0IDRzm2hHEt7K2UD5lmHG2ZBRvdWwy6TBqnFc/ym7NS9TLrLE+OyDE5IR45J1VyRa5JnXByRx7JM3l1npwX5815n4/mnGznkPyC8/kNR5qZ0Q==</latexit>

<latexit sha1_base64="iRIzka+GmqobpB2vQZh808GeGOE=">AAACGHicjVBNS8NAFHypX7V+VT16WSyCp5KIoMeiF48qthXaUDbbTbt0swm7L2IJ/RsevPhXvIh47c1/4zbmoK0HBxaGmfd4OxMkUhh03U+ntLS8srpWXq9sbG5t71R391omTjXjTRbLWN8H1HApFG+iQMnvE81pFEjeDkaXM7/9wLURsbrDccL9iA6UCAWjaKXuba+L/BEzqtikV625dTcHWSReQWpQ4H/jveq0249ZGnGFTFJjOp6boJ9RjYJJPql0U8MTykZ0wDuWKhpx42d5sAk5skqfhLG2TyHJ1Z8bGY2MGUeBnYwoDs28NxP/8jophud+JlSSIrfR80NhKgnGZNYS6QvNGcqxJZRpYf9K2JBqytB2WbHRvfmgi6R1UvcsvzmtNS6KzspwAIdwDB6cQQOu4BqawCCBJ3iBN+fZeXXenY/v0ZJT7OzDLzjTL37AmWc=</latexit>

<latexit sha1_base64="JUB7YfaHaMti6YKxto8slfNvgVU=">AAACDXicjVA9SwNBFHwXv2L8ilraLAbBKtyJoGXQxtKAlwjJEfY275Ile3vH7p4QjvwCCxv/io2Irb2d/8ZNcoUmFg4sDDPzePsmTAXXxnW/nNLK6tr6RnmzsrW9s7tX3T9o6SRTDH2WiETdh1Sj4BJ9w43A+1QhjUOB7XB0PfXbD6g0T+SdGacYxHQgecQZNVZq+r1qza27M5Bl4hWkBgX+F+9VP7v9hGUxSsME1brjuakJcqoMZwInlW6mMaVsRAfYsVTSGHWQz66ZkBOr9EmUKPukITP150ROY63HcWiTMTVDvehNxb+8TmaiyyDnMs0MSjZfFGWCmIRMqyF9rpAZMbaEMsXtXwkbUkWZsQVW7One4qHLpHVW9yxvntcaV0VnZTiCYzgFDy6gATdwCz4wQHiEZ3h1npwX5815n0dLTjFzCL/gfHwDEuyUUg==</latexit>

<latexit sha1_base64="W4leED71u4SckQpFJZHUTWamqBk=">AAACGXicjVC7SgNBFJ31GeMramkzGASrsCuClkEbG0HBPCBZwuzkbjJkdnaduSuGJd9hYeOv2IhYauXfOEm20MTCAwOHc87lzj1BIoVB1/1yFhaXlldWC2vF9Y3Nre3Szm7dxKnmUOOxjHUzYAakUFBDgRKaiQYWBRIaweBi7DfuQRsRq1scJuBHrKdEKDhDK/lXnTbCA2YIBkedUtmtuBPQeeLlpExy/C/eKX20uzFPI1DIJTOm5bkJ+hnTKLiEUbGdGkgYH7AetCxVLALjZ5PLRvTQKl0axto+hXSi/pzIWGTMMApsMmLYN7PeWPzLa6UYnvmZUEmKoPh0UZhKijEd10S7QgNHObSEcS3sXynvM8042jKL9nRv9tB5Uj+ueJbfnJSr53lnBbJPDsgR8cgpqZJLck1qhJM78kieyavz5Lw4b877NLrg5DN75Becz2+MZ5n6</latexit>

<latexit sha1_base64="lq40gTxOaMOgNZc7QqqNalgvZTY=">AAACGXicjVC7SgNBFJ31GeMramkzGASrsCuClsE0lgrmAckSZid3kyGzs+vMXTEs+Q4LG3/FRsRSK//GSbKFJhYeGDiccy537gkSKQy67peztLyyurZe2Chubm3v7Jb29hsmTjWHOo9lrFsBMyCFgjoKlNBKNLAokNAMhrWJ37wHbUSsbnGUgB+xvhKh4Ayt5Ne6HYQHzBAMjrulsltxp6CLxMtJmeT4X7xb+uj0Yp5GoJBLZkzbcxP0M6ZRcAnjYic1kDA+ZH1oW6pYBMbPppeN6bFVejSMtX0K6VT9OZGxyJhRFNhkxHBg5r2J+JfXTjG88DOhkhRB8dmiMJUUYzqpifaEBo5yZAnjWti/Uj5gmnG0ZRbt6d78oYukcVrxLL85K1cv884K5JAckRPikXNSJVfkmtQJJ3fkkTyTV+fJeXHenPdZdMnJZw7ILzif33tVmfA=</latexit>

<latexit sha1_base64="bKAE4JOEb8raZhxNI+U/0tfYu5Q=">AAACHXicjVDLSsNAFL2pr1ofjbp0M1gEVyURQZdFEXSnYB/QxjKZTtqhk0mYmQgl5EtcuPFX3Ii4cCP+jZM2C21deGDgcM693DnHjzlT2nG+rNLS8srqWnm9srG5tV21d3ZbKkokoU0S8Uh2fKwoZ4I2NdOcdmJJcehz2vbHF7nffqBSsUjc6UlMvRAPBQsYwdpIfbt6eZ/2QqxHBPP0Osv6ds2pO1OgReIWpAYF/jfetz96g4gkIRWacKxU13Vi7aVYakY4zSq9RNEYkzEe0q6hAodUeek0XYYOjTJAQSTNExpN1Z8bKQ6VmoS+mcyDqHkvF//yuokOzryUiTjRVJDZoSDhSEcorwoNmKRE84khmEhm/orICEtMtCm0YqK780EXSeu47hp+e1JrnBedlWEfDuAIXDiFBlzBDTSBQAKP8Ayv1pP1Yr1Z77PRklXs7MEvWJ/fRpeazA==</latexit>

<latexit sha1_base64="ofS5HxXRmR3fc/7DxHU4ulXbCng=">AAACNXicjVDLSsNAFJ34rPUVdelmsIiuSiKCboSiCLpTMW2hiWUynbZDJw9mbsQS8gl+jQs3foU7F25E3PoLTmIX2rrwwsDhnHOZe44fC67Asl6MqemZ2bn50kJ5cWl5ZdVcW6+rKJGUOTQSkWz6RDHBQ+YAB8GasWQk8AVr+IOTXG/cMql4FF7DMGZeQHoh73JKQFNtc+f0JnUDAn1KRHqeZfgIO9gFHjCFr9ousDtISUiztlmxqlYxeBLYI1BBo/mfvW0+u52IJgELgQqiVMu2YvBSIoFTwbKymygWEzogPdbSMCT6LC8tUmd4WzMd3I2kfiHggv25kZJAqWHga2ceUI1rOfmX1kqge+ilPIwTYDp68VE3ERginFeIO1wyCmKoAaGS61sx7RNJKOiiyzq6PR50EtT3qrbGl/uV2vGosxLaRFtoF9noANXQGbpADqLoHj2gJ/RmPBqvxrvx8W2dMkY7G+jXGJ9f4ESkSA==</latexit>

where

Figure 2: CUR decomposition of a matrix (Mcomb =Manc

Mtest) using a subset of its columns (Ccomb =Canc

Ctest) and

rows (Ranc). The blue rows (Ranc) corresponding to the anchor queries and U=C†

anc are used for indexing the

items to obtain item embeddings EI=U×Ranc and the green row corresponds to the test query. Note that the test

row (Mtest) can be approximated using a subset of its columns (Ctest) and the latent representation of items (EI).

1995) or a trained dual-encoder (DE) model to ap-

proximate the CE. The DE model independently

embeds the query and the item in

, for instance

by using a transformer

followed by pooling the

ﬁnal layer representations into a single vector (e.g.

using CLS token). The DE score for the query-item

pair is computed using dot-product of the query

embedding and the item embedding.

f(DE)

θ(q, i) = pool(T(q))>pool(T(i))

In this work, we propose a method based on

CUR matrix factorization that allows efﬁcient re-

trieval of top-

items by directly approximating the

cross-encoder model rather than using an auxiliary

(trained) retrieval model.

CUR Decomposition

(Mahoney and Drineas,

2009) In CUR matrix factorization, a matrix

M∈

Rn×m

is approximated using a subset of its rows

R=M[Sr,:] ∈Rk1×m

, a subset of its columns

C=M[:, Sc]∈Rn×k2

and a joining matrix

U∈

Rk2×k1as follows

M=CUR

where

and

are the indices corresponding to

rows

and columns

respectively, and the joining

matrix

optimizes the approximation error. In this

work, we set

to be the Moore-Penrose pseudo-

inverse of

M[Sr, Sc]

, the intersection of matrices

and

, in which case

is known as the skeleton

approximation of M(Goreinov et al.,1997).

2.2 Proposed Method Overview

Our proposed method ANNCUR, which stands for

pproximate

earest

eighbor search using

CUR

decomposition, begins by selecting a ﬁxed set of

anchor queries (

Qanc

) and

anchor items (

Ianc

It uses scores between queries

q∈ Qanc

and all

items

i∈ I

to index the item set by generating

latent item embeddings. At test time, we compute

exact scores between the test query

qtest

and the

anchor items

Ianc

, and use it to approximate scores

of all items for the given query and/or retrieve top-

items according to the approximate scores. We

could optionally retrieve

kr> k

items, re-rank

them using exact fθscores and return top-kitems.

Let

Mcomb =Manc

Mtest

and

Ccomb =Canc

Ctest

where

Manc =Ranc ∈Rkq×|I|

contains scores for

the anchor queries and all items,

Mtest ∈R1×|I|

contains scores for a test query and all items,

Canc ∈Rkq×ki

contains scores for the anchor

queries and the anchor items, and

Ctest ∈R1×ki

contains scores for the test query paired with the

anchor items.

Using CUR decomposition, we can approximate

Mcomb

using a subset of its columns (

Ccomb

) cor-

responding to the anchor items and a subset of its

rows (

Ranc

) corresponding to the anchor queries as

Mcomb =CcombURanc

˜

Manc

Mtest=Canc

CtestURanc

Manc =CancURanc and ˜

Mtest =CtestURanc

Figure 2shows CUR decomposition of matrix

Mcomb

. At test time,

Mtest

containing approximate

item scores for the test query can be computed us-

ing

Ctest, U

, and

Ranc

where

Ctest

contains exact

fθ

scores between the test query and the anchor items.

Matrices

and

Ranc

can be computed ofﬂine as

these are independent of the test query.

2.3 Ofﬂine Indexing

The indexing process ﬁrst computes

Ranc

contain-

ing scores between the anchor queries and all items.

Ranc(q, i) = fθ(q, i),∀(q, i)∈ Qanc × I

We embed all items in Rkias

EI=U×Ranc

where

U=C†

anc

is the pseudo-inverse of

Canc

Each column of

EI∈Rki×|I|

corresponds to a

latent item embedding.

2.4 Test-time Inference

At test time, we embed the test query

Rki

using

scores between qand anchor items Ianc.

eq= [fθ(q, i)]i∈Ianc

We approximate the score for a query-item pair

(

q, i

) using inner-product of

and

EI[:, i]

where

EI[:, i]∈Rkiis the embedding of item i.

fθ(q, i) = e>

qEI[:, i]

We can use

eq∈Rki

along with an off-the-

shelf nearest-neighbor search method for maximum

inner-product search (Malkov and Yashunin,2018;

Johnson et al.,2019;Guo et al.,2020) and retrieve

top-scoring items for the given query

according

to the approximate query-item scores without ex-

plicitly approximating scores for all the items.

2.5 Time Complexity

During indexing stage, we evaluate

fθ

for

kq|I|

query-item pairs, and compute the pseudo-inverse

of a

kq×ki

matrix. The overall time complexity of

the indexing stage is

O(Cfθkq|I| +Ckq,ki

inv )

, where

Ckq,ki

inv

is the cost of computing the pseudo-inverse

of a

kq×ki

matrix, and

Cfθ

is the cost of computing

fθ

on a query-item pair. For CE models used in

this work, we observe that Cfθkq|I|  Ckq,ki

inv .

At test time, we need to compute

fθ

for

query-

item pairs followed by optionally re-ranking

items retrieved by maximum inner-product search

(MIPS). Overall time complexity for inference is

O(ki+kr)Cfθ+C|I|,kr

MIPS 

, where

O(C|I|,kr

MIPS )

is the

time complexity of MIPS over

|I|

items to retrieve

top-kritems.

2.6 Improving score distribution of CE

models for matrix factorization

The rank of the query-item score matrix, and in

turn, the approximation error of a matrix factor-

ization method depends on the scores in the ma-

trix. Figure 1b shows a histogram of query-item

score distribution (adjusted to have zero mean)

for a dual-encoder and [CLS]-CE model. We use

[CLS]-CE to refer to a cross-encoder model pa-

rameterized using transformers which uses CLS

token to compute a pooled representation of the

input query-item pair. Both the models are trained

for zero-shot entity linking (see §3.1 for details).

As shown in the ﬁgure, the query-item score dis-

tribution for the [CLS]-CE model is signiﬁcantly

skewed with only a small fraction of items (enti-

ties) getting high scores while the score distribution

for a dual-encoder model is less so as it is gener-

ated explicitly using dot-product of query and item

embeddings. The skewed score distribution from

[CLS]-CE leads to a high rank query-item score

matrix, which results in a large approximation error

for matrix decomposition methods.

We propose a small but important change to the

scoring mechanism of the cross-encoder so that it

yields a less skewed score distribution, thus making

it much easier to approximate the corresponding

query-item score matrix without adversely affect-

ing the downstream task performance. Instead of

using CLS token representation to score a given

query-item pair, we add special tokens amongst

the query and the item tokens and extract contex-

tualized query and item representations using the

special tokens after jointly encoding the query-item

pair using a model such as a transformer T.

eCE

q, eCE

i=pool(T(concat(q, i)))

The ﬁnal score for the given query-item pair is

computed using dot-product of the contextualized

query and item embeddings.

f([EMB]-CE)

θ(q, i)=(eCE

q)>eCE

We refer to this model as [EMB]-CE. Figure 1a

shows high-level model architecture for dual-

encoders, [CLS]-CE and [EMB]-CE model.

As shown in Figure 1b, the query-item score dis-

tribution from an [EMB]-CE model resembles that

from a DE model. Empirically, we observe that

rank of the query-item score matrix for [EMB]-CE

model is much lower than the rank of a similar

matrix computed using [CLS]-CE, thus making it

much easier to approximate using matrix decompo-

sition based methods.

3 Experiments

In our experiments, we use CE models trained for

zero-shot entity linking on ZESHEL dataset (§3.1).

We evaluate the proposed method and various base-

lines on the task of ﬁnding

-nearest neighbors

for cross-encoder models in §3.2, and evaluate the

proposed method for various indexing and test-

time cost budgets as well as study the effect of

various design choices in §3.3 and §3.4. All re-

sources for the paper including code for all ex-

periments and model checkpoints is available at

https://github.com/iesl/anncur

ZESHEL Dataset

The

ro-

ntity

inking (ZESHEL) dataset was constructed

by Logeswaran et al. (2019) from Wikia. The task

of zero-shot entity linking involves linking entity

mentions in text to an entity from a list of entities

with associated descriptions. The dataset consists

of 16 different domains with eight, four, and four

domains in training, dev, and test splits respectively.

Each domain contains non-overlapping sets of

entities, thus at test time, mentions need to be

linked to unseen entities solely based on entity

descriptions. Table 1in the appendix shows

dataset statistics. In this task, queries correspond

to mentions of entities along with the surrounding

context, and items correspond to entities with their

associated descriptions.

3.1 Training DE and CE models on ZESHEL

Following the precedent set by recent papers (Wu

et al.,2020;Zhang and Stratos,2021), we ﬁrst

train a dual-encoder model on ZESHEL training

data using hard negatives. We train a cross-encoder

model for the task of zero-shot entity-linking on

all eight training domains using cross-entropy loss

with ground-truth entity and negative entities mined

using the dual-encoder. We refer the reader to Ap-

pendix A.1 for more details.

Results on downstream task of Entity Linking

To evaluate the cross-encoder models, we retrieve

64 entities for each test mention using the dual-

encoder model and re-rank them using a cross-

encoder model. The top-64 entities retrieved by the

DE contain the ground-truth entity for 87.95% men-

tions in test data and 92.04% mentions in dev data.

The proposed [EMB]-CE model achieves an aver-

age accuracy of 65.49 and 66.86 on domains in test

and dev set respectively, and performs at par with

the widely used and state-of-the-art

[CLS]-CE

We observe that our implementation of [CLS]-CE obtains

slightly different results as compared to state-of-the-art (see

Table 2 in Zhang and Stratos (2021) ) likely due to minor

implementation/training differences.

architecture which achieves an accuracy of 65.87

and 67.67 on test and dev set respectively. Since

[EMB]-CE model performs at par with [CLS]-CE

on the downstream task of entity linking, and rank

of the score matrix from [EMB]-CE is much lower

than that from [CLS]-CE, we use [EMB]-CE in

subsequent experiments.

3.2 Evaluating on k-NN search for CE

Experimental Setup

For all experiments in this

section, we use the [EMB]-CE model trained on

original ZESHEL training data on the task of

zero-shot entity linking, and evaluate the proposed

method and baselines for the task of retrieving

nearest neighbor entities (items) for a given men-

tion (query) according to the cross-encoder model.

We run experiments separately on ﬁve domains

from ZESHEL containing 10K to 100K items. For

each domain, we compute the query-item score

matrix for a subset or all of the queries (mentions)

and all items (entities) in the domain. We randomly

split the query set into a training set (

Qtrain

) and a

test set (

Qtest

). We use the queries in training data

to train baseline DE models. For ANNCUR, we

use the training queries as anchor queries and use

CE scores between the anchor queries and all items

for indexing as described in §2.3. All approaches

are then evaluated on the task of ﬁnding top-

items for queries in the corresponding domain’s

test split. For a fair comparison, we do not train

DE models on multiple domains at the same time.

3.2.1 Baseline Retrieval Methods

TF-IDF:

All queries and items are embedded us-

ing a TF-IDF vectorizer trained on item descriptions

and top-

items are retrieved using the dot-product

of query and item embeddings.

DE models:

We experiment with DE

BASE

, the

DE model trained on ZESHEL for the task of en-

tity linking (see §3.1), and the following two DE

models trained via distillation from the CE.

•

BERT+CE

: DE initialized with BERT (Devlin

et al.,2019) and trained only using training

signal from the cross-encoder model.

•

BASE+CE

: DE

BASE

model further ﬁne-tuned

via distillation using the cross-encoder model.

We refer the reader to Appendix A.3 for hyper-

parameter and optimization details.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EfcientNearestNeighborSearchforCross-EncoderModelsusingMatrixFactorizationNishantYadavy,NicholasMonath},RicoAngelly,ManzilZaheer},andAndrewMcCallumyyUniversityofMassachusettsAmherst}GoogleResearch{nishantyadav,rangell,mccallum}@cs.umass.edu{nmonath,manzilzaheer}@google.comAbstractEfcientk-nearestn...

展开>> 收起<<

Efﬁcient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization Nishant Yadavy Nicholas Monath Rico Angelly.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Efﬁcient Nearest Neighbor Search for Cross-Encoder Models using Matrix Factorization Nishant Yadavy Nicholas Monath Rico Angelly

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: