Online Information Retrieval Evaluation using the STELLA Framework

2025-05-02 0 0 942.4KB 7 页 10玖币

侵权投诉

Online Information Retrieval Evaluation using the

STELLA Framework

Timo Breuer1,Narges Tavakolpoursaleh2,Johann Schaible3,Daniel Hienert2,

Philipp Schaer1and Leyla Jael Castro4

1TH Köln – University of Applied Sciences, Cologne, Germany

2GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany

3EU|FH – University of Applied Sciences, Bruehl, Germany

4ZB MED – Information Centre for Life Sciences, Cologne, Germany

1. Abstract

Introduction.

Involving users in early phases of software development has become a common

strategy as it enables developers to consider user needs from the beginning. Once a system is

in production, new opportunities to observe, evaluate and learn from users emerge as more

information becomes available. Gathering information from users to continuously evaluate their

behavior is a common practice for commercial software, while the Craneld paradigm remains

the preferred option for Information Retrieval (IR) and recommendation systems in the academic

world. Here we introduce the Infrastructures for Living Labs STELLA project which aims to

create an evaluation infrastructure allowing experimental systems to run along production

web-based academic search systems with real users. STELLA combines user interactions and

log les analyses to enable large-scale A/B experiments for academic search.

Methods.

The STELLA evaluation infrastructure provides an online reproducible envi-

ronment allowing developers and researchers to work together to produce and evaluate new

retrieval and recommendation approaches for existing IR. STELLA integrates itself to a produc-

tion system, allows experimental systems to run along the production one, and evaluates the

performance of those experimental systems using real-time information coming from the regular

users of the system. The production system acts as a baseline that experimental systems try to

outperform. Our experimental setup uses interleaving, i.e. it combines experimental results

with those from the corresponding baseline systems. STELLA gathers information on user

interactions and provides statistics useful to developers and researchers. STELLA architecture

comprises three main elements: (i) micro-services corresponding to experimental systems, (ii) a

multi-container application (MCA) bundling together all the participant experimental systems,

and (iii) a central server to manage participant and production systems, and to provide feedback.

Results.

STELLA was the technological and methodological foundation of the CLEF 2021

Living Labs for Academic Search (LiLAS) lab. LiLAS aimed to strengthen the concept of user-

centric living labs for academic search with two separated evaluation rounds of 4 weeks each.

Preprint for Information Retrieval Meeting 2022

CEUR

Workshop

Proceedings

http://ceur-ws.org

ISSN 1613-0073

CEUR Workshop Proceedings (CEUR-WS.org)

arXiv:2210.13202v1 [cs.IR] 24 Oct 2022

LiLAS integrated STELLA into two academic search systems: LIVIVO (for the task of ranking

documents wrt a head query) and GESIS Search (for the task of ranking datasets wrt a reference

document). We evaluated nine experimental systems contributed by three participating groups.

Overall, we consider our lab as a successful advancement to previous living lab experiments. We

were able to exemplify the benets of fully dockerized systems delivering results for arbitrary

results on-the-y. Furthermore, we could conrm several previous ndings, for instance the

power laws underlying the click distributions.

2. Introduction

Involving users in the early phases of software development – in the form of user experience

analysis and prototype testing – has become common whenever some degree of user interaction

is required. This allows developers to consider the users’ needs from the beginning, making it

easier for users to adopt a new system or adapt to a new version. However, once a system is

put in place, new opportunities to observe, evaluate and learn from users become possible as

more information becomes available. For instance, it becomes possible to observe and record

interaction patterns and answer questions like (a) how long does it take a user to nd a particular

button, (b) how does the user reacts to dierent options or (c) what are the most common paths

used to achieve a goal, whether this goal is buying clothes or nding an article or dataset

relevant for research. All this information can be tracked, stored, analyzed, and evaluated,

making systems more attractive and easier for users. Gathering information from users in order

to continuously evaluate their interaction and learn more from their needs is a common practice

for commercial software as knowing their users and their behavior allows them to predict

their actions, oer them better products that they are likely to buy and, therefore, make more

prot. Despite the benets of user-based evaluation, this approach is not yet fully exploited by

Information Retrieval systems within the academic world.

Information Retrieval (IR) systems are commonly used in academics as they aim at presenting

the most relevant resources from a corpus for an information need. Typical tasks include ranking

a series of documents regarding a query or oering recommendations regarding a document

already selected as relevant (systems taking care of the latter are called recommendation

systems). Traditionally, IR and recommendation systems are used in academics to recover

scholarly articles from specialized repositories designed to this end. Although IR is an active

research area, evaluation remains a challenge in the academic context. One reason for this is

that IR evaluation mainly relies on the Craneld paradigm. In this paradigm, search systems

are compared by processing a set of queries or topics based on a standard corpus of documents

while trying to produce the best possible results. Results are then evaluated with the help of

relevance assessments produced by domain experts. This research method has been established

and proven for more than 25 years in international evaluation campaigns such as the Text

Retrieval Conference TREC

or the Conference and Labs Evaluation Forum CLEF

. However,

this so-called oine evaluation or shared task principle faces the criticism of drifting away

1https://trec.nist.gov/

2https://www.clef-initiative.eu/

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OnlineInformationRetrievalEvaluationusingtheSTELLAFrameworkTimoBreuer1,NargesTavakolpoursaleh2,JohannSchaible3,DanielHienert2,PhilippSchaer1andLeylaJaelCastro41THKöln–UniversityofAppliedSciences,Cologne,Germany2GESIS–LeibnizInstitutefortheSocialSciences,Cologne,Germany3EU|FH–UniversityofAppliedScien...

展开>> 收起<<

Online Information Retrieval Evaluation using the STELLA Framework.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Online Information Retrieval Evaluation using the STELLA Framework

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: