Unsupervised Search Algorithm Configuration using Query Performance Prediction

2025-05-06 0 0 791.02KB 7 页 10玖币

侵权投诉

Unsupervised Search Algorithm Configuration using

Query Performance Prediction

Haggai Roitman*

eBay Research, Netanya, Israel

Abstract

Search engine conguration can be quite dicult for inexpert developers. Instead, an auto-conguration

approach can be used to speed up development time. Yet, such an automatic process usually requires

relevance labels to train a supervised model. In this work, we suggest a simple solution based on query

performance prediction that requires no relevance labels but only a sample of queries in a given domain.

Using two example usecases we demonstrate the merits of our solution.

Keywords

Search engine conguration, Query performance prediction, Evaluation

1. Introduction

Search engine solutions such as the (Apache) Lucene1library and Elasticsearch2have become

a ubiquitous utility for developers of various discovery and data mining applications. Yet,

conguring such solutions for a specic application in mind can be quite challenging and time-

consuming. Commonly, inexpert developers may nd it hard to determine which conguration

would best t their needs. Therefore, many such developers usually prefer to use the “out-of-

the-box” default conguration (e.g., BM25 similarity in Elasticsearch).

Nowadays, contemporary search engine solutions oer a variety of alternative congurations

that can be utilized to improve search eectiveness. As a motivating usecase (which is also

studied later on in our evaluation), the Divergence From Randomness [

] (DFR) similarity, one

of the many similarities implemented within the Lucence search library, has more than 100

dierent congurations. Therefore, even for a single search algorithm option, it is highly dicult

for developers to determine in-advance which DFR conguration is best for their application.

Conguring the search algorithm of a search engine so as to optimize its search eectiveness

for a given application in mind usually requires some sort of domain expertise. In addition,

choosing the “best” conguration usually requires enough relevance labels to train a supervised

auto-conguration model [2], which are not always in the expense of developers.

In this work, we propose an

unsupervised

automatic search engine search algorithm con-

guration solution based on query performance prediction [

] (QPP). We formally dene the

*Work was done while the author was still in IBM Research

"hroitman@ebay.com (H. Roitman)

CEUR

Workshop

Proceedings

http://ceur-ws.org

ISSN 1613-0073

CEUR Workshop Proceedings (CEUR-WS.org)

1https://lucene.apache.org/

2https://www.elastic.co/elasticsearch/

arXiv:2210.00767v1 [cs.IR] 3 Oct 2022

search algorithm conguration task as a utility maximization task, where the goal is to maximize

the relevance of documents retrieved from the search engine for a given set of “representative”

user-queries in the domain of interest. To estimate the expected relevance that may be obtained

by a given candidate (search algorithm) conguration, we propose a simple, yet highly eective,

extension to the probabilistic QPP framework [4].

Our solution only requires a sample of queries in the domain (e.g., query log) and a set

of candidate search algorithm congurations to choose from; while no other input such as

relevance labels is required. As a proof of concept, we evaluate our proposed solution on two

common search algorithm conguration usecases in Lucene, namely: Similarity model selection

and Similarity parameter auto-tuning.

2. Automatic Configuration Solution

We now describe our automatic-conguration solution. To this end, we assume a corpus of

documents

𝐶

which is searchable through a given search engine interface (e.g., Lucene library

or Elasticsearch service). To derive an “optimal” search algorithm conguration over

𝐶

, all we

require is a sample of queries

𝑄

(hereinafter termed the “query benchmark”) and a set of (search

algorithm) congurations

𝒮={𝑆1, . . . , 𝑆𝑚}

to be evaluated. Our goal is, therefore, to nd a

conguration

𝑆∈ 𝒮

that would “maximize” the search engine’s eectiveness. For each query

𝑞∈𝑄

, the retrieval quality is expected to be measured relatively to the relevance of documents

𝐶

retrieved (and ranked) according to

𝑆

. Yet, in this work, we assume that relevance labels

are unavailable apriori; instead, we aim to estimate such relevance using a QPP approach.

Our solution builds on top of the probabilistic QPP framework [

], introducing a simple, yet

highly eective, extension in which instead of considering a single query

𝑞∈𝑄

eectiveness to

be predicted, we wish to derive a prediction for the whole benchmark

𝑄

. Formally, let

𝑟

denote

the relevance event. For a given conguration

𝑆

, our goal is to estimate

3𝑝(𝑟|𝑄, 𝑆, 𝐶)

–the

relevance likelihood given that we retrieve documents from 𝐶for each query in 𝑄using 𝑆.

We now derive

𝑝(𝑟|𝑄, 𝑆, 𝐶)

by rst conditioning over the various queries

𝑞

in the benchmark:

𝑝(𝑟|𝑄, 𝑆, 𝐶)𝑑𝑒𝑓

=

𝑞∈𝑄

𝑝(𝑟|𝑞, 𝑆, 𝐶)𝑝(𝑞|𝑄),(1)

where

𝑝(𝑟|𝑞, 𝑆, 𝐶)

denotes the relevance likelihood for a specic query

𝑞∈𝑄

and

𝑝(𝑞|𝑄)

denotes the likelihood of observing query 𝑞. We next estimate these two likelihood terms.

2.1. Estimating 𝑝(𝑟|𝑞, 𝑆, 𝐶)

Estimating the performance of a specic conguration

𝑆

can be done by evaluating that

conguration over

𝐶

using query

𝑞

and obtaining the response (result-list)

𝐷𝑞

𝑆⊆𝐶

. Hence,

in this work, we simply estimate

𝑝^(𝑟|𝑞, 𝑆, 𝐶)𝑑𝑒𝑓

=𝑝(𝑟|𝑞, 𝐷𝑞

𝑆, 𝐶)

. To estimate

𝑝(𝑟|𝑞, 𝐷𝑞

𝑆, 𝐶)

, we

rst note that:

𝑝(𝑟|𝑞, 𝐷𝑞

𝑆, 𝐶)∝𝑝(𝑟|𝑞, 𝐷𝑞

𝑆)𝑝(𝑟|𝐷𝑞

𝑆, 𝐶)

. Here we further note that, while the

left term

𝑝(𝑟|𝑞, 𝐷𝑞

𝑆)

aims to capture the query-dependent quality eects of documents in

𝐷𝑞

𝑆

3We denote by 𝑝(·)a likelihood term and by 𝑝^(·)its corresponding estimate, which is not always normalized.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

UnsupervisedSearchAlgorithmConfigurationusingQueryPerformancePredictionHaggaiRoitman*eBayResearch,Netanya,IsraelAbstractSearchengineconfigurationcanbequitedifficultforinexpertdevelopers.Instead,anauto-configurationapproachcanbeusedtospeedupdevelopmenttime.Yet,suchanautomaticprocessusuallyrequiresrel...

展开>> 收起<<

Unsupervised Search Algorithm Configuration using Query Performance Prediction.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Unsupervised Search Algorithm Configuration using Query Performance Prediction

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: