
DataEconomy@CoNEXT’22, December 6–9, 2022, Rome, Italy Müllner, Schmerda, Theiler, Lindstaedt, and Kowald
algorithm sharing lies in the exploitation of shared resources,
e.g., data shared by a data provider. For example, it might be
advantageous for companies to gain access to the best-suited
data to enhance their AI pipeline. However, selecting the
best-suited dataset is hard, which stems from the fact that
the number of available datasets, publicly available over the
Web or stored in private databases, has increased rapidly
over the last decade [7, 10, 18, 28].
Although the deployment of recommender systems for
e-commerce, e.g., Amazon or Zalando, is a natural decision to
address this choice overload, not much research is available
on the applicability of recommender systems for data and
algorithm sharing (see Section 2). This is especially true for
beyond-accuracy objectives of recommender systems, such
as popularity bias, which is currently an important topic in
the research community. Recommender systems exhibiting
popularity bias tend to exclude many datasets and algorithms
from their recommendations and recommend popular items
substantially more often than non-popular items [
6
,
12
,
13
].
To study to what extent recommender systems can support
data and algorithm sharing, we identify six recommenda-
tion scenarios (see Figure 1). In these scenarios, we evaluate
three recommendation methods, i.e., Most Popular, Collab-
orative Filtering, and Content-based Filtering, with respect
to recommendation accuracy and popularity bias. The three
main-contributions of this paper are as follows:
(1)
We discuss six recommendation scenarios and outline
how recommender systems can be applied to support
data and algorithm sharing (see Section 3).
(2)
We create and publish a novel dataset based on the
OpenML platform, which allows studying recommender
systems for data and algorithm sharing (see Section 4).
(3)
We show that Collaborate Filtering yields the most ac-
curate recommendations and Content-based Filtering
can generate recommendations that cover the most
datasets and algorithms (see Section 5).
2 RELATED WORK
Recommender systems for data and algorithms are of grow-
ing interest to both academia and industry in the eld of
data and AI-driven economies [7, 10, 22].
For example, Patra et al. [
22
] utilize Content-based Fil-
tering for dataset recommendations in the genetics domain.
Also, Jess et al. [
10
] design a recommender system for arti-
cial data to help human decision-making in the industrial do-
main. The task of algorithm recommendations has been par-
tially approached by Automated Machine Learning, which
aims to automatically select an appropriate machine learning
pipeline (including algorithms) for a given dataset and prob-
lem [8]. For example, Zschech et al. [33] recommend a data
mining pipeline for a given problem. Vainshtein et al. [
29
]
and Song et al. [
27
] exploit metadata and structural proper-
ties of datasets to recommend classication algorithms.
Numerous works exist that evaluate recommender sys-
tems for popularity bias, i.e., their inclination to recommend
popular items [
6
,
19
,
32
]. For example, Mansoury et al. [
19
]
show that recommender systems can seriously exacerbate
existing biases, such as popularity bias. Also, Zhu et al. [
32
]
simulate a recommender system to monitor the evolution
of popularity bias. Within this dynamic setting, the authors
studied factors that drive popularity bias.
Data Market Austria (DMA)
1
is an example of a data- and
AI-driven economy, in which a recommender system is em-
ployed to connect users, data, and algorithms [
14
]. How-
ever, the authors raise concerns regarding the dataset used
in their study with respect to valid connections between
users, datasets, and algorithms, and do not consider content-
based recommendations. Plus, our work includes a beyond-
accuracy evaluation study with respect to popularity bias.
3 RECOMMENDATION SCENARIOS
Recommender systems rely on (i) prole data for model
training and (ii) ground truth data for model evaluation. In a
traditional item-to-user recommendation scenario (SC1 and
SC2), prole data refers to the user prole that represents
a user’s item preferences. Ground truth data represents the
user’s item preferences the recommender system aims to
predict. Typically, the direct interactions between users and
items (e.g., a user’s utilization of a certain dataset) are used
as the users’ item preferences. However, for the remaining
item-to-item recommendation scenarios (SC3-SC6) there is
no direct item-to-item interaction data that can be used to
generate recommendations, e.g., dataset to algorithm recom-
mendations (see Table 1).
Thus, in the following, we detail our six recommendation
scenarios that can occur in data and algorithm sharing (see
Figure 1) and give examples how recommender systems can
cope with the lack of direct interactions for item-to-item
recommendation scenarios:
SC1: Datasets to Users.
In SC1, recommendations help
users (e.g., researchers) to identify datasets that are deemed
to be relevant. As Figure 1 illustrates, there exists a direct
interaction between users and datasets (e.g., a user uses a
dataset to train an algorithm). Thus, the recommender sys-
tem can leverage these interactions to generate recommen-
dations.
SC2: Algorithms to Users.
In SC2, recommendations
help users (e.g., researchers) to identify algorithms that are
deemed to be relevant. As in SC1, also in SC2, the recom-
mender system can leverage the direct interactions between
users and algorithms to generate recommendations.
1https://www.datamarket.at/