Management of Machine Learning Lifecycle Artifacts A Survey

2025-05-02 0 0 629.85KB 13 页 10玖币
侵权投诉
Management of Machine Learning Lifecycle Artifacts:
A Survey
Marius Schlegel
TU Ilmenau
Ilmenau, Germany
marius.schlegel@tu-ilmenau.de
Kai-Uwe Sattler
TU Ilmenau
Ilmenau, Germany
kus@tu-ilmenau.de
ABSTRACT
The explorative and iterative nature of developing and operating
machine learning (ML) applications leads to a variety of artifacts,
such as datasets, features, models, hyperparameters, metrics, soft-
ware, congurations, and logs. In order to enable comparability,
reproducibility, and traceability of these artifacts across the ML life-
cycle steps and iterations, systems and tools have been developed
to support their collection, storage, and management. It is often
not obvious what precise functional scope such systems oer so
that the comparison and the estimation of synergy eects between
candidates are quite challenging. In this paper, we aim to give an
overview of systems and platforms which support the management
of ML lifecycle artifacts. Based on a systematic literature review,
we derive assessment criteria and apply them to a representative
selection of more than 60 systems and platforms.
KEYWORDS
Machine Learning, Workow, Model Lifecycle, Artifact, Asset, Man-
agement, Systems, Classication, Taxonomy, Assessment
1 INTRODUCTION
Machine learning (ML) approaches are well established in a wide
range of application domains. In contrast to engineering traditional
software, the development of ML systems is dierent: data and fea-
ture preparation, model development, and model operation tasks
are integrated into a unied lifecycle which is often iterated several
times [
9
,
26
,
53
]. Although there are systems and tools that provide
support for a broad range of tasks within the ML lifecycle, such as
data cleaning and labeling, feature engineering, model design and
training, experiment management, hyperparameter optimization,
and orchestration, achieving comparability, traceability, and repro-
ducibility of model and data artifacts across all lifecycle steps and
iterations is still challenging.
To meet these requirements, it is necessary to capture the in-
put and output artifacts of each lifecycle step and iteration. That
includes model artifacts and data-related artifacts, such as data-
sets, labels, and features. Reproducibility also requires capturing
software-related artifacts, such as code, congurations, and envi-
ronmental dependencies. By additionally considering metadata,
such as model parameters, hyperparameters, quality metrics, and
execution statistics, comparability of artifacts is enabled.
Since the manual management of ML artifacts is simply not ef-
cient, systems and platforms provide support for the systematic
collection, storage, and management of ML lifecycle artifacts, which
we collectively referred to as ML artifact management systems (ML
AMSs)
1
. Since ML AMSs are often integrated into general ML de-
velopment platforms or frameworks for a subset of the ML lifecycle
tasks, it is typically not obvious what the precise functional and
non-functional scope of an AMS is, how an AMS compares to oth-
ers, and to what extent possible synergy eects can be exploited
through tool-chaining.
The objective of this paper is to provide a comprehensive overview
of AMSs from academia and industry. We address the following
research questions: (RQ1) What are criteria to describe, assess, and
compare AMSs? (RQ2) Which AMSs exist in academia and indus-
try, and what are their functional and non-functional properties
according to the assessment criteria? To answer these questions,
we conducted a systematic literature review.
The paper is organized as follows: § 2 gives an overview of re-
lated work. § 3 describes the ML lifecycle and concretizes the tasks
of ML lifecycle management. Based on the conducted systematic
literature review, § 4 discusses criteria for assessing AMSs w. r. t.
their functional and non-functional scope of features. § 5 applies
the criteria to the 64 identied AMSs and discusses the results.
2 RELATED WORK
In recent years, both academia and industry have produced a variety
of systems that provide artifact collection and management support
for individual steps of ML lifecycles [
3
,
18
,
25
,
28
,
48
,
50
,
63
,
101
,
130
].
Authors often compare with related works in the scope of the par-
ticular system, which, however, does not enable the comparability
with a broad range of systems and criteria.
This problem has been addressed by a few surveys [
67
,
69
,
169
].
In the context of reproducibility of empirical results, Isdahl et al. [
69
]
have investigated what support is provided by existing experiment
management systems. However, these systems cover only a subset
of the ML lifecycle. Weißgerber et al. [
169
] develop an open-science-
centered process model for ML research as a common ground and
investigate 40 ML platforms and tools. However, the authors analyze
only 11 platforms w. r. t. ML workow support capabilities and their
properties.
In contrast to the aforementioned studies and surveys, Idowu et
al. [
67
] adopt a more ne-grained understanding of artifacts and
system capabilities which is most closely to our work. Based on
a selection of 17 experiment management systems and tools, the
authors develop a feature model for assessing their capabilities.
Although this survey shows parallels to our work, the authors only
consider a limited selection of systems which is, again, only focused
on the area of experiment tracking and management.
1Whenever we use just “AMS”, we refer to an ML AMS.
arXiv:2210.11831v1 [cs.DB] 21 Oct 2022
3 MACHINE LEARNING LIFECYCLE
ARTIFACT MANAGEMENT
In this section, we discuss the steps of ML lifecycles based on typical
ML workows (§ 3.1), derive the tasks of ML artifact management,
and outline the support ML AMSs should provide (§ 3.2).
3.1 ML Lifecycle
In contrast to traditional software engineering, the development of
ML-powered applications is more iterative and explorative. Thus,
developers have adapted their processes and practices for ML: Fol-
lowing methodologies in the context of data science, data analyt-
ics and data mining, such as TDSP [
102
], KDD [
45
], CRISP-DM
[
137
,
170
], or ASUM-DM [
64
], workows specialized for ML have
been established [
9
,
26
,
53
,
128
]. Despite minor dierences, ML
workows contain both data-centric and model-centric steps and
often multiple feedback loops among the dierent stages, which
leads to a lifecycle. Fig. 1 depicts a common view on the ML lifecycle.
The ML lifecycle consists of four stages: Requirements Stage,
Data-oriented Stage, Model-oriented Stage, and Operations Stage.
Starting with the Requirements Stage, the requirements for the
model to be developed are derived based on the application require-
ments [
164
]. This stage is dedicated to three major decisions: (1.) to
decide which functionality and interfaces to realize, (2.) to decide
which types of models are best suited for the given problem, and
(3.) to decide which types of data to work on.
Requirements
Stage
Data-oriented
Stage
Model-oriented
Stage
Operations
Stage
Data Collection
and Selection Data Cleaning
Model Requirements
Analysis
Data Labeling
Feature Engineering
and Selection
Model Design Model Training
Model Evaluation
Model Optimization
Model Deployment Model Monitoring
Figure 1: Typical ML lifecycle.
The Data-oriented Stage starts with the Data Collection and
Selection step. Datasets, either internal or publicly available, are
searched, or individual ones are collected and the data most suitable
for the subsequent steps is selected (e. g. dependent on data quality,
bias, etc.). By using available generic datasets, models may be (pre-)
trained (e. g. ImageNet for object recognition), and later, by using
transfer learning [
24
,
111
] along with more specic data, trained
to a more specic model. Then, in the Data Cleaning step, datasets
are prepared, removing inaccurate or noisy records. As required by
most of the supervised learning techniques to be able to induce a
model, data labeling is used to assign a ground truth label to each
dataset record. Subsequently, feature engineering and selection is
performed to extract and select features for ML models. For some
models, such as convolutional neural networks, this step is directly
intertwined with model training.
The Model-oriented Stage starts with the Model Design step.
Often, existing model designs and neural network architectures
are used and tailored towards specic requirements. During model
training, the selected models are trained on the collected and pre-
processed datasets using the selected features and their respective
labels. Subsequently, in the Model Evaluation step, developers eval-
uate a trained model on test datasets using predened metrics, such
as accuracy or F1-score. In critical application domains, this step
also includes extensive human evaluation. The subsequent Model
Optimization step is used to ne-tune the model, especially its
hyperparameters. In the context of the model development steps,
we refer to an experiment as a sequence of model development
activities that result in a trained model but do not include cycles to
previous steps.
Finally, in the Operations Stage, the model is distributed to the
target systems and devices, either as an on-demand (online) service
or in batch (oine) mode (Model Deployment), as well as continu-
ously monitored for metrics and errors during execution and use
(Model Monitoring). In particular, CI/CD practices from software
engineering are adapted.
As illustrated by Fig. 1, multiple feedback loops from steps of the
Model-oriented Stage or the Operations Stage to any step before
may be triggered by insucient accuracy or new data. Moreover,
Sculley et al. [
131
] point out, that the model development often takes
only a fraction of the time required to complete ML projects. Usually,
a large amount of tooling and infrastructure is required to support
data extract, transform, and load (ETL) pipelines, ecient training
and inference, reproducible experiments, versioning of datasets and
models, model analysis, and model monitoring at scale. The creation
and management of services and tools can ultimately account for
a large portion of the workload of ML engineers, researchers, and
data scientists.
3.2 Management of ML Lifecycle Artifacts
Within the steps of the ML lifecycle, a variety of artifacts is created,
used, and modied: Datasets, labels and annotations, and feature
sets are inputs and outputs of steps in the Data-oriented Stage.
Moreover, data processing source code, logs, and environmental
dependencies are created and/or used. In the Model-oriented Stage,
results from the Data-oriented Stage are used to develop and train
models. In addition, metadata such as parameters, hyperparameters,
and captured metrics as well as model processing source code, logs,
and environment dependencies are artifacts that are created and/or
used in this stage. The Operations Stage requires trained models
and corresponding dependencies such as libraries and runtimes
(e. g. via Docker container), uses model deployment and monitoring
source code which is typically wrapped into a web service with a
REST API for on-demand (online) service or scheduled for batch
(oine) execution, and captures execution logs and statistics. To
achieve comparability, traceability, and reproducibility of produced
data and model artifacts across multiple lifecycle iterations, it is es-
sential to also capture metadata artifacts that can be easily inspected
afterwards (e. g. model parameters, hyperparameters, lineage traces,
performance metrics) as well as software artifacts.
Manual management of artifacts is simply not ecient due to the
complexity and the required time. To meet the above requirements,
it is necessary to systematically capture any input and output ar-
tifacts and to provide them via appropriate interfaces. ML artifact
management includes any methods and tools for managing ML arti-
facts that are created and used in the development, deployment, and
operation of ML-based systems. Systems supporting ML artifact
management, collectively referred to as ML artifact management
systems (ML AMSs), provide the functionality and interfaces to
adequately record, store, and manage ML lifecycle artifacts.
4 ASSESSMENT CRITERIA
The goal of this section is to dene criteria for the description and
assessment of AMSs. Based on a priori assumptions, we rst list
functional and non-functional requirements. We then conduct a
systematic literature review according to Kitchenham et al. [
81
]: Us-
ing well-dened keywords, we search ACM DL, DBLP, IEEE Xplore,
and SpringerLink for academic publications as well as Google and
Google Scholar for web pages, articles, white papers, technical re-
ports, reference lists, source code repositories, and documentations.
Next, we perform the publication selection based on the relevance
for answering our research questions. To avoid overlooking relevant
literature, we perform one iteration of backward snowballing [
171
].
Finally, we iteratively extract assessment criteria and subcriteria,
criteria categories, as well as the functional and non-functional
properties of concrete systems and platforms based on concept
matrices. The results are shown in Tab. 1, which outlines categories,
criteria (italicized), subcriteria (in square brackets).
Lifecycle Integration. This category describes for which parts of
the ML lifecycle a system provides artifact collection and manage-
ment capabilities. The four stages form the criteria, with the steps
assigned to each stage forming the subcriteria (cf. § 3.1).
Artifact Support. Orthogonal to the previous category, this cate-
gory indicates which types of artifacts are supported and managed
by an AMS. Based on the discussion in § 3.2, we distinguish between
the criteria Data-related,Model,Metadata, and Software Artifacts.
The criteria Data-related Artifacts and Model Artifacts represent
core resources that are either input, output, or both for a lifecycle
step. Data-related Artifacts are datasets (used for training, validation,
and testing), annotations and labels, and features (cf. correspond-
ing subcriteria). Model Artifacts are represented by trained models
(subcriterion Model).
The criteria Metadata Artifacts and Software Artifacts represent
the corresponding artifact types, that enable the reproducibility
and traceability of individual ML lifecycle steps and their results.
The criterion Metadata Artifacts covers dierent types of metadata:
(i) identication metadata (e. g. identier, name, type of dataset
or model, association with groups, experiments, pipelines, etc.);
(ii) data-related metadata; (iii) model-related metadata, such as
inspectable model parameters (e. g. weights and biases), model hy-
perparameters (e. g. number of hidden layers, learning rate, batch
size, or dropout), and model quality & performance metrics (e. g.
accuracy, F1-score, or AUC score); (iv) experiments and projects,
which are abstractions to capture data processing or model training
runs and to group related artifacts in a reproducible and compa-
rable way; (v) pipelines, which are abstractions to execute entire
ML workows in an automated fashion and relates the input and
output artifacts required per step as well as the glue code required
for processing; (vi) execution-related logs & statistics.
The criterion Software Artifacts comprises source code and note-
books, e. g. for data processing, experimentation and model training,
and serving, as well as congurations and execution-related envi-
ronment dependencies and containers, e. g. Conda environments,
Docker containers, or virtual machines.
Operations. This category indicates the operations provided by
an AMS for handling and managing ML artifacts. It comprises
the criteria Logging & Versioning,Exploration,Management, and
Collaboration.
The criterion Logging & Versioning represents any operations
that enable logging or capturing single artifacts (subcriterion Log/
Capture), creating checkpoints of a project or an experiment com-
prising several artifacts (subcriterion Commit), and reverting or
rolling back to an earlier committed or snapshot version (subcrite-
rion Revert/Rollback).
The criterion Exploration includes any operations that help to
gain concrete insights into the results of data processing pipelines,
experiments, model training results, or monitoring statistics. These
operations are dierentiated by the subcriteria Query,Compare,
Lineage,Provenance, and Visualize.Query operations may be repre-
sented by simple searching and listing functionality, more advanced
ltering functionality (e. g. based on model performance metrics), or
a comprehensive query language. Compare indicates the presence
of operations for the comparison between two or more versions of
artifacts. In terms of model artifacts, this operation may be used to
select the most promising model from a set of candidates (model se-
lection), either in model training and development [
122
] or in model
serving (e. g. best performing predictor) [
33
]. Lineage represents
any operations for tracing the lineage of artifacts, i. e. which input
artifacts led to which output artifacts, and thus provide information
about the history of a model, dataset, or project. Provenance repre-
sents any operations, which in addition provide information about
which concrete transformations and processes converted inputs
into an output. Visualize indicates the presence of functionality for
graphical representation of model architectures, pipelines, model
metrics, or experimentation results.
The criterion Management characterizes operations for handling
and using stored artifacts. The subcriteria Modify and Delete indi-
cate operations for modifying or deleting logged and already stored
摘要:

ManagementofMachineLearningLifecycleArtifacts:ASurveyMariusSchlegelTUIlmenauIlmenau,Germanymarius.schlegel@tu-ilmenau.deKai-UweSattlerTUIlmenauIlmenau,Germanykus@tu-ilmenau.deABSTRACTTheexplorativeanditerativenatureofdevelopingandoperatingmachinelearning(ML)applicationsleadstoavarietyofartifacts,suc...

展开>> 收起<<
Management of Machine Learning Lifecycle Artifacts A Survey.pdf

共13页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:13 页 大小:629.85KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 13
客服
关注