Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing

2025-05-06 0 0 532.47KB 8 页 10玖币

侵权投诉

Towards Employing Recommender Systems for

Supporting Data and Algorithm Sharing

Peter Müllner

Know-Center Gmbh

Graz, Austria

pmuellner@know-center.at

Stefan Schmerda

Know-Center Gmbh

Graz, Austria

sschmerda@know-center.at

Dieter Theiler

Know-Center Gmbh

Graz, Austria

dtheiler@know-center.at

Stefanie Lindstaedt

Know-Center Gmbh & TU Graz

Graz, Austria

slind@know-center.at

Dominik Kowald

Know-Center Gmbh & TU Graz

Graz, Austria

dkowald@know-center.at

ABSTRACT

Data and algorithm sharing is an imperative part of data-

and AI-driven economies. The ecient sharing of data and

algorithms relies on the active interplay between users, data

providers, and algorithm providers. Although recommender

systems are known to eectively interconnect users and

items in e-commerce settings, there is a lack of research

on the applicability of recommender systems for data and

algorithm sharing. To ll this gap, we identify six recommen-

dation scenarios for supporting data and algorithm sharing,

where four of these scenarios substantially dier from the

traditional recommendation scenarios in e-commerce appli-

cations. We evaluate these recommendation scenarios using

a novel dataset based on interaction data of the OpenML data

and algorithm sharing platform, which we also provide for

the scientic community. Specically, we investigate three

types of recommendation approaches, namely

popularity-,

collaboration-, and content-based recommendations. We nd

that collaboration-based recommendations provide the most

accurate recommendations in all scenarios. Plus, the rec-

ommendation accuracy strongly depends on the specic

scenario, e.g., algorithm recommendations for users are a

more dicult problem than algorithm recommendations for

datasets. Finally, the content-based approach generates the

least popularity-biased recommendations that cover the most

datasets and algorithms.

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are not

made or distributed for prot or commercial advantage and that copies bear

this notice and the full citation on the rst page. Copyrights for components

of this work owned by others than ACM must be honored. Abstracting with

credit is permitted. To copy otherwise, or republish, to post on servers or to

redistribute to lists, requires prior specic permission and/or a fee. Request

permissions from permissions@acm.org.

DataEconomy@CoNEXT’22, December 6–9, 2022, Rome, Italy

ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00

https://doi.org/XXXXXXX.XXXXXXX

Data and AlgorithmsUsers

SC1

SC2 SC4 SC3

SC5

SC6

Data and Algorithm Sharing

Figure 1: Recommendation scenarios SC1-SC6 that

can support data and algorithm sharing. In addition

to the traditional item-to-user scenarios SC1 and SC2,

also item-to-item scenarios SC3-SC6 can occur.

KEYWORDS

recommender systems, data economy, AI-driven economy,

data and algorithm sharing, popularity bias, collaborative

ltering, content-based ltering

1 INTRODUCTION

Sharing data and algorithms is one important cornerstone

in today’s data- and AI-driven economy. To enable data and

algorithm sharing, interconnecting three key-players is es-

sential: data providers, algorithm providers, and users. Data

Providers grant access to their data collections. Algorithm

Providers allow applying their algorithms to a given piece of

data. Users apply algorithms to data and, this way, connect

data and algorithms. In general, data and algorithm providers

may share their resources due to various reasons, e.g., to mon-

etize the data or the algorithm, or to make them available for

the research community. The powerful strength of data and

arXiv:2210.11828v2 [cs.IR] 26 Oct 2022

DataEconomy@CoNEXT’22, December 6–9, 2022, Rome, Italy Müllner, Schmerda, Theiler, Lindstaedt, and Kowald

algorithm sharing lies in the exploitation of shared resources,

e.g., data shared by a data provider. For example, it might be

advantageous for companies to gain access to the best-suited

data to enhance their AI pipeline. However, selecting the

best-suited dataset is hard, which stems from the fact that

the number of available datasets, publicly available over the

Web or stored in private databases, has increased rapidly

over the last decade [7, 10, 18, 28].

Although the deployment of recommender systems for

e-commerce, e.g., Amazon or Zalando, is a natural decision to

address this choice overload, not much research is available

on the applicability of recommender systems for data and

algorithm sharing (see Section 2). This is especially true for

beyond-accuracy objectives of recommender systems, such

as popularity bias, which is currently an important topic in

the research community. Recommender systems exhibiting

popularity bias tend to exclude many datasets and algorithms

from their recommendations and recommend popular items

substantially more often than non-popular items [

To study to what extent recommender systems can support

data and algorithm sharing, we identify six recommenda-

tion scenarios (see Figure 1). In these scenarios, we evaluate

three recommendation methods, i.e., Most Popular, Collab-

orative Filtering, and Content-based Filtering, with respect

to recommendation accuracy and popularity bias. The three

main-contributions of this paper are as follows:

(1)

We discuss six recommendation scenarios and outline

how recommender systems can be applied to support

data and algorithm sharing (see Section 3).

(2)

We create and publish a novel dataset based on the

OpenML platform, which allows studying recommender

systems for data and algorithm sharing (see Section 4).

(3)

We show that Collaborate Filtering yields the most ac-

curate recommendations and Content-based Filtering

can generate recommendations that cover the most

datasets and algorithms (see Section 5).

2 RELATED WORK

Recommender systems for data and algorithms are of grow-

ing interest to both academia and industry in the eld of

data and AI-driven economies [7, 10, 22].

For example, Patra et al. [

] utilize Content-based Fil-

tering for dataset recommendations in the genetics domain.

Also, Jess et al. [

] design a recommender system for arti-

cial data to help human decision-making in the industrial do-

main. The task of algorithm recommendations has been par-

tially approached by Automated Machine Learning, which

aims to automatically select an appropriate machine learning

pipeline (including algorithms) for a given dataset and prob-

lem [8]. For example, Zschech et al. [33] recommend a data

mining pipeline for a given problem. Vainshtein et al. [

]

and Song et al. [

] exploit metadata and structural proper-

ties of datasets to recommend classication algorithms.

Numerous works exist that evaluate recommender sys-

tems for popularity bias, i.e., their inclination to recommend

popular items [

]. For example, Mansoury et al. [

]

show that recommender systems can seriously exacerbate

existing biases, such as popularity bias. Also, Zhu et al. [

]

simulate a recommender system to monitor the evolution

of popularity bias. Within this dynamic setting, the authors

studied factors that drive popularity bias.

Data Market Austria (DMA)

is an example of a data- and

AI-driven economy, in which a recommender system is em-

ployed to connect users, data, and algorithms [

]. How-

ever, the authors raise concerns regarding the dataset used

in their study with respect to valid connections between

users, datasets, and algorithms, and do not consider content-

based recommendations. Plus, our work includes a beyond-

accuracy evaluation study with respect to popularity bias.

3 RECOMMENDATION SCENARIOS

Recommender systems rely on (i) prole data for model

training and (ii) ground truth data for model evaluation. In a

traditional item-to-user recommendation scenario (SC1 and

SC2), prole data refers to the user prole that represents

a user’s item preferences. Ground truth data represents the

user’s item preferences the recommender system aims to

predict. Typically, the direct interactions between users and

items (e.g., a user’s utilization of a certain dataset) are used

as the users’ item preferences. However, for the remaining

item-to-item recommendation scenarios (SC3-SC6) there is

no direct item-to-item interaction data that can be used to

generate recommendations, e.g., dataset to algorithm recom-

mendations (see Table 1).

Thus, in the following, we detail our six recommendation

scenarios that can occur in data and algorithm sharing (see

Figure 1) and give examples how recommender systems can

cope with the lack of direct interactions for item-to-item

recommendation scenarios:

SC1: Datasets to Users.

In SC1, recommendations help

users (e.g., researchers) to identify datasets that are deemed

to be relevant. As Figure 1 illustrates, there exists a direct

interaction between users and datasets (e.g., a user uses a

dataset to train an algorithm). Thus, the recommender sys-

tem can leverage these interactions to generate recommen-

dations.

SC2: Algorithms to Users.

In SC2, recommendations

help users (e.g., researchers) to identify algorithms that are

deemed to be relevant. As in SC1, also in SC2, the recom-

mender system can leverage the direct interactions between

users and algorithms to generate recommendations.

1https://www.datamarket.at/

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsEmployingRecommenderSystemsforSupportingDataandAlgorithmSharingPeterMüllnerKnow-CenterGmbhGraz,Austriapmuellner@know-center.atStefanSchmerdaKnow-CenterGmbhGraz,Austriasschmerda@know-center.atDieterTheilerKnow-CenterGmbhGraz,Austriadtheiler@know-center.atStefanieLindstaedtKnow-CenterGmbh&TUGra...

展开>> 收起<<

Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Employing Recommender Systems for Supporting Data and Algorithm Sharing

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: