Online Information Retrieval Evaluation using the STELLA Framework

2025-05-02 0 0 942.4KB 7 页 10玖币
侵权投诉
Online Information Retrieval Evaluation using the
STELLA Framework
Timo Breuer1,Narges Tavakolpoursaleh2,Johann Schaible3,Daniel Hienert2,
Philipp Schaer1and Leyla Jael Castro4
1TH Köln – University of Applied Sciences, Cologne, Germany
2GESIS – Leibniz Institute for the Social Sciences, Cologne, Germany
3EU|FH – University of Applied Sciences, Bruehl, Germany
4ZB MED – Information Centre for Life Sciences, Cologne, Germany
1. Abstract
Introduction.
Involving users in early phases of software development has become a common
strategy as it enables developers to consider user needs from the beginning. Once a system is
in production, new opportunities to observe, evaluate and learn from users emerge as more
information becomes available. Gathering information from users to continuously evaluate their
behavior is a common practice for commercial software, while the Craneld paradigm remains
the preferred option for Information Retrieval (IR) and recommendation systems in the academic
world. Here we introduce the Infrastructures for Living Labs STELLA project which aims to
create an evaluation infrastructure allowing experimental systems to run along production
web-based academic search systems with real users. STELLA combines user interactions and
log les analyses to enable large-scale A/B experiments for academic search.
Methods.
The STELLA evaluation infrastructure provides an online reproducible envi-
ronment allowing developers and researchers to work together to produce and evaluate new
retrieval and recommendation approaches for existing IR. STELLA integrates itself to a produc-
tion system, allows experimental systems to run along the production one, and evaluates the
performance of those experimental systems using real-time information coming from the regular
users of the system. The production system acts as a baseline that experimental systems try to
outperform. Our experimental setup uses interleaving, i.e. it combines experimental results
with those from the corresponding baseline systems. STELLA gathers information on user
interactions and provides statistics useful to developers and researchers. STELLA architecture
comprises three main elements: (i) micro-services corresponding to experimental systems, (ii) a
multi-container application (MCA) bundling together all the participant experimental systems,
and (iii) a central server to manage participant and production systems, and to provide feedback.
Results.
STELLA was the technological and methodological foundation of the CLEF 2021
Living Labs for Academic Search (LiLAS) lab. LiLAS aimed to strengthen the concept of user-
centric living labs for academic search with two separated evaluation rounds of 4 weeks each.
Preprint for Information Retrieval Meeting 2022
©2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
arXiv:2210.13202v1 [cs.IR] 24 Oct 2022
LiLAS integrated STELLA into two academic search systems: LIVIVO (for the task of ranking
documents wrt a head query) and GESIS Search (for the task of ranking datasets wrt a reference
document). We evaluated nine experimental systems contributed by three participating groups.
Overall, we consider our lab as a successful advancement to previous living lab experiments. We
were able to exemplify the benets of fully dockerized systems delivering results for arbitrary
results on-the-y. Furthermore, we could conrm several previous ndings, for instance the
power laws underlying the click distributions.
2. Introduction
Involving users in the early phases of software development – in the form of user experience
analysis and prototype testing – has become common whenever some degree of user interaction
is required. This allows developers to consider the users’ needs from the beginning, making it
easier for users to adopt a new system or adapt to a new version. However, once a system is
put in place, new opportunities to observe, evaluate and learn from users become possible as
more information becomes available. For instance, it becomes possible to observe and record
interaction patterns and answer questions like (a) how long does it take a user to nd a particular
button, (b) how does the user reacts to dierent options or (c) what are the most common paths
used to achieve a goal, whether this goal is buying clothes or nding an article or dataset
relevant for research. All this information can be tracked, stored, analyzed, and evaluated,
making systems more attractive and easier for users. Gathering information from users in order
to continuously evaluate their interaction and learn more from their needs is a common practice
for commercial software as knowing their users and their behavior allows them to predict
their actions, oer them better products that they are likely to buy and, therefore, make more
prot. Despite the benets of user-based evaluation, this approach is not yet fully exploited by
Information Retrieval systems within the academic world.
Information Retrieval (IR) systems are commonly used in academics as they aim at presenting
the most relevant resources from a corpus for an information need. Typical tasks include ranking
a series of documents regarding a query or oering recommendations regarding a document
already selected as relevant (systems taking care of the latter are called recommendation
systems). Traditionally, IR and recommendation systems are used in academics to recover
scholarly articles from specialized repositories designed to this end. Although IR is an active
research area, evaluation remains a challenge in the academic context. One reason for this is
that IR evaluation mainly relies on the Craneld paradigm. In this paradigm, search systems
are compared by processing a set of queries or topics based on a standard corpus of documents
while trying to produce the best possible results. Results are then evaluated with the help of
relevance assessments produced by domain experts. This research method has been established
and proven for more than 25 years in international evaluation campaigns such as the Text
Retrieval Conference TREC
1
or the Conference and Labs Evaluation Forum CLEF
2
. However,
this so-called oine evaluation or shared task principle faces the criticism of drifting away
1https://trec.nist.gov/
2https://www.clef-initiative.eu/
摘要:

OnlineInformationRetrievalEvaluationusingtheSTELLAFrameworkTimoBreuer1,NargesTavakolpoursaleh2,JohannSchaible3,DanielHienert2,PhilippSchaer1andLeylaJaelCastro41THKöln–UniversityofAppliedSciences,Cologne,Germany2GESIS–LeibnizInstitutefortheSocialSciences,Cologne,Germany3EU|FH–UniversityofAppliedScien...

展开>> 收起<<
Online Information Retrieval Evaluation using the STELLA Framework.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:942.4KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注