Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing

2025-05-06 0 0 532.47KB 8 页 10玖币
侵权投诉
Towards Employing Recommender Systems for
Supporting Data and Algorithm Sharing
Peter Müllner
Know-Center Gmbh
Graz, Austria
pmuellner@know-center.at
Stefan Schmerda
Know-Center Gmbh
Graz, Austria
sschmerda@know-center.at
Dieter Theiler
Know-Center Gmbh
Graz, Austria
dtheiler@know-center.at
Stefanie Lindstaedt
Know-Center Gmbh & TU Graz
Graz, Austria
slind@know-center.at
Dominik Kowald
Know-Center Gmbh & TU Graz
Graz, Austria
dkowald@know-center.at
ABSTRACT
Data and algorithm sharing is an imperative part of data-
and AI-driven economies. The ecient sharing of data and
algorithms relies on the active interplay between users, data
providers, and algorithm providers. Although recommender
systems are known to eectively interconnect users and
items in e-commerce settings, there is a lack of research
on the applicability of recommender systems for data and
algorithm sharing. To ll this gap, we identify six recommen-
dation scenarios for supporting data and algorithm sharing,
where four of these scenarios substantially dier from the
traditional recommendation scenarios in e-commerce appli-
cations. We evaluate these recommendation scenarios using
a novel dataset based on interaction data of the OpenML data
and algorithm sharing platform, which we also provide for
the scientic community. Specically, we investigate three
types of recommendation approaches, namely
popularity-,
collaboration-, and content-based recommendations. We nd
that collaboration-based recommendations provide the most
accurate recommendations in all scenarios. Plus, the rec-
ommendation accuracy strongly depends on the specic
scenario, e.g., algorithm recommendations for users are a
more dicult problem than algorithm recommendations for
datasets. Finally, the content-based approach generates the
least popularity-biased recommendations that cover the most
datasets and algorithms.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for prot or commercial advantage and that copies bear
this notice and the full citation on the rst page. Copyrights for components
of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, or republish, to post on servers or to
redistribute to lists, requires prior specic permission and/or a fee. Request
permissions from permissions@acm.org.
DataEconomy@CoNEXT’22, December 6–9, 2022, Rome, Italy
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00
https://doi.org/XXXXXXX.XXXXXXX
Data and AlgorithmsUsers
SC1
SC2 SC4 SC3
SC5
SC6
Data and Algorithm Sharing
Figure 1: Recommendation scenarios SC1-SC6 that
can support data and algorithm sharing. In addition
to the traditional item-to-user scenarios SC1 and SC2,
also item-to-item scenarios SC3-SC6 can occur.
KEYWORDS
recommender systems, data economy, AI-driven economy,
data and algorithm sharing, popularity bias, collaborative
ltering, content-based ltering
1 INTRODUCTION
Sharing data and algorithms is one important cornerstone
in today’s data- and AI-driven economy. To enable data and
algorithm sharing, interconnecting three key-players is es-
sential: data providers, algorithm providers, and users. Data
Providers grant access to their data collections. Algorithm
Providers allow applying their algorithms to a given piece of
data. Users apply algorithms to data and, this way, connect
data and algorithms. In general, data and algorithm providers
may share their resources due to various reasons, e.g., to mon-
etize the data or the algorithm, or to make them available for
the research community. The powerful strength of data and
arXiv:2210.11828v2 [cs.IR] 26 Oct 2022
DataEconomy@CoNEXT’22, December 6–9, 2022, Rome, Italy Müllner, Schmerda, Theiler, Lindstaedt, and Kowald
algorithm sharing lies in the exploitation of shared resources,
e.g., data shared by a data provider. For example, it might be
advantageous for companies to gain access to the best-suited
data to enhance their AI pipeline. However, selecting the
best-suited dataset is hard, which stems from the fact that
the number of available datasets, publicly available over the
Web or stored in private databases, has increased rapidly
over the last decade [7, 10, 18, 28].
Although the deployment of recommender systems for
e-commerce, e.g., Amazon or Zalando, is a natural decision to
address this choice overload, not much research is available
on the applicability of recommender systems for data and
algorithm sharing (see Section 2). This is especially true for
beyond-accuracy objectives of recommender systems, such
as popularity bias, which is currently an important topic in
the research community. Recommender systems exhibiting
popularity bias tend to exclude many datasets and algorithms
from their recommendations and recommend popular items
substantially more often than non-popular items [
6
,
12
,
13
].
To study to what extent recommender systems can support
data and algorithm sharing, we identify six recommenda-
tion scenarios (see Figure 1). In these scenarios, we evaluate
three recommendation methods, i.e., Most Popular, Collab-
orative Filtering, and Content-based Filtering, with respect
to recommendation accuracy and popularity bias. The three
main-contributions of this paper are as follows:
(1)
We discuss six recommendation scenarios and outline
how recommender systems can be applied to support
data and algorithm sharing (see Section 3).
(2)
We create and publish a novel dataset based on the
OpenML platform, which allows studying recommender
systems for data and algorithm sharing (see Section 4).
(3)
We show that Collaborate Filtering yields the most ac-
curate recommendations and Content-based Filtering
can generate recommendations that cover the most
datasets and algorithms (see Section 5).
2 RELATED WORK
Recommender systems for data and algorithms are of grow-
ing interest to both academia and industry in the eld of
data and AI-driven economies [7, 10, 22].
For example, Patra et al. [
22
] utilize Content-based Fil-
tering for dataset recommendations in the genetics domain.
Also, Jess et al. [
10
] design a recommender system for arti-
cial data to help human decision-making in the industrial do-
main. The task of algorithm recommendations has been par-
tially approached by Automated Machine Learning, which
aims to automatically select an appropriate machine learning
pipeline (including algorithms) for a given dataset and prob-
lem [8]. For example, Zschech et al. [33] recommend a data
mining pipeline for a given problem. Vainshtein et al. [
29
]
and Song et al. [
27
] exploit metadata and structural proper-
ties of datasets to recommend classication algorithms.
Numerous works exist that evaluate recommender sys-
tems for popularity bias, i.e., their inclination to recommend
popular items [
6
,
19
,
32
]. For example, Mansoury et al. [
19
]
show that recommender systems can seriously exacerbate
existing biases, such as popularity bias. Also, Zhu et al. [
32
]
simulate a recommender system to monitor the evolution
of popularity bias. Within this dynamic setting, the authors
studied factors that drive popularity bias.
Data Market Austria (DMA)
1
is an example of a data- and
AI-driven economy, in which a recommender system is em-
ployed to connect users, data, and algorithms [
14
]. How-
ever, the authors raise concerns regarding the dataset used
in their study with respect to valid connections between
users, datasets, and algorithms, and do not consider content-
based recommendations. Plus, our work includes a beyond-
accuracy evaluation study with respect to popularity bias.
3 RECOMMENDATION SCENARIOS
Recommender systems rely on (i) prole data for model
training and (ii) ground truth data for model evaluation. In a
traditional item-to-user recommendation scenario (SC1 and
SC2), prole data refers to the user prole that represents
a user’s item preferences. Ground truth data represents the
user’s item preferences the recommender system aims to
predict. Typically, the direct interactions between users and
items (e.g., a user’s utilization of a certain dataset) are used
as the users’ item preferences. However, for the remaining
item-to-item recommendation scenarios (SC3-SC6) there is
no direct item-to-item interaction data that can be used to
generate recommendations, e.g., dataset to algorithm recom-
mendations (see Table 1).
Thus, in the following, we detail our six recommendation
scenarios that can occur in data and algorithm sharing (see
Figure 1) and give examples how recommender systems can
cope with the lack of direct interactions for item-to-item
recommendation scenarios:
SC1: Datasets to Users.
In SC1, recommendations help
users (e.g., researchers) to identify datasets that are deemed
to be relevant. As Figure 1 illustrates, there exists a direct
interaction between users and datasets (e.g., a user uses a
dataset to train an algorithm). Thus, the recommender sys-
tem can leverage these interactions to generate recommen-
dations.
SC2: Algorithms to Users.
In SC2, recommendations
help users (e.g., researchers) to identify algorithms that are
deemed to be relevant. As in SC1, also in SC2, the recom-
mender system can leverage the direct interactions between
users and algorithms to generate recommendations.
1https://www.datamarket.at/
摘要:

TowardsEmployingRecommenderSystemsforSupportingDataandAlgorithmSharingPeterMüllnerKnow-CenterGmbhGraz,Austriapmuellner@know-center.atStefanSchmerdaKnow-CenterGmbhGraz,Austriasschmerda@know-center.atDieterTheilerKnow-CenterGmbhGraz,Austriadtheiler@know-center.atStefanieLindstaedtKnow-CenterGmbh&TUGra...

展开>> 收起<<
Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:532.47KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注