Management of Machine Learning Lifecycle Artifacts A Survey

2025-05-02 0 0 629.85KB 13 页 10玖币

侵权投诉

Management of Machine Learning Lifecycle Artifacts:

A Survey

Marius Schlegel

TU Ilmenau

Ilmenau, Germany

marius.schlegel@tu-ilmenau.de

Kai-Uwe Sattler

TU Ilmenau

Ilmenau, Germany

kus@tu-ilmenau.de

ABSTRACT

The explorative and iterative nature of developing and operating

machine learning (ML) applications leads to a variety of artifacts,

such as datasets, features, models, hyperparameters, metrics, soft-

ware, congurations, and logs. In order to enable comparability,

reproducibility, and traceability of these artifacts across the ML life-

cycle steps and iterations, systems and tools have been developed

to support their collection, storage, and management. It is often

not obvious what precise functional scope such systems oer so

that the comparison and the estimation of synergy eects between

candidates are quite challenging. In this paper, we aim to give an

overview of systems and platforms which support the management

of ML lifecycle artifacts. Based on a systematic literature review,

we derive assessment criteria and apply them to a representative

selection of more than 60 systems and platforms.

KEYWORDS

Machine Learning, Workow, Model Lifecycle, Artifact, Asset, Man-

agement, Systems, Classication, Taxonomy, Assessment

1 INTRODUCTION

Machine learning (ML) approaches are well established in a wide

range of application domains. In contrast to engineering traditional

software, the development of ML systems is dierent: data and fea-

ture preparation, model development, and model operation tasks

are integrated into a unied lifecycle which is often iterated several

times [

]. Although there are systems and tools that provide

support for a broad range of tasks within the ML lifecycle, such as

data cleaning and labeling, feature engineering, model design and

training, experiment management, hyperparameter optimization,

and orchestration, achieving comparability, traceability, and repro-

ducibility of model and data artifacts across all lifecycle steps and

iterations is still challenging.

To meet these requirements, it is necessary to capture the in-

put and output artifacts of each lifecycle step and iteration. That

includes model artifacts and data-related artifacts, such as data-

sets, labels, and features. Reproducibility also requires capturing

software-related artifacts, such as code, congurations, and envi-

ronmental dependencies. By additionally considering metadata,

such as model parameters, hyperparameters, quality metrics, and

execution statistics, comparability of artifacts is enabled.

Since the manual management of ML artifacts is simply not ef-

cient, systems and platforms provide support for the systematic

collection, storage, and management of ML lifecycle artifacts, which

we collectively referred to as ML artifact management systems (ML

AMSs)

. Since ML AMSs are often integrated into general ML de-

velopment platforms or frameworks for a subset of the ML lifecycle

tasks, it is typically not obvious what the precise functional and

non-functional scope of an AMS is, how an AMS compares to oth-

ers, and to what extent possible synergy eects can be exploited

through tool-chaining.

The objective of this paper is to provide a comprehensive overview

of AMSs from academia and industry. We address the following

research questions: (RQ1) What are criteria to describe, assess, and

compare AMSs? (RQ2) Which AMSs exist in academia and indus-

try, and what are their functional and non-functional properties

according to the assessment criteria? To answer these questions,

we conducted a systematic literature review.

The paper is organized as follows: § 2 gives an overview of re-

lated work. § 3 describes the ML lifecycle and concretizes the tasks

of ML lifecycle management. Based on the conducted systematic

literature review, § 4 discusses criteria for assessing AMSs w. r. t.

their functional and non-functional scope of features. § 5 applies

the criteria to the 64 identied AMSs and discusses the results.

2 RELATED WORK

In recent years, both academia and industry have produced a variety

of systems that provide artifact collection and management support

for individual steps of ML lifecycles [

101

130

Authors often compare with related works in the scope of the par-

ticular system, which, however, does not enable the comparability

with a broad range of systems and criteria.

This problem has been addressed by a few surveys [

169

In the context of reproducibility of empirical results, Isdahl et al. [

]

have investigated what support is provided by existing experiment

management systems. However, these systems cover only a subset

of the ML lifecycle. Weißgerber et al. [

169

] develop an open-science-

centered process model for ML research as a common ground and

investigate 40 ML platforms and tools. However, the authors analyze

only 11 platforms w. r. t. ML workow support capabilities and their

properties.

In contrast to the aforementioned studies and surveys, Idowu et

al. [

] adopt a more ne-grained understanding of artifacts and

system capabilities which is most closely to our work. Based on

a selection of 17 experiment management systems and tools, the

authors develop a feature model for assessing their capabilities.

Although this survey shows parallels to our work, the authors only

consider a limited selection of systems which is, again, only focused

on the area of experiment tracking and management.

1Whenever we use just “AMS”, we refer to an ML AMS.

arXiv:2210.11831v1 [cs.DB] 21 Oct 2022

3 MACHINE LEARNING LIFECYCLE

ARTIFACT MANAGEMENT

In this section, we discuss the steps of ML lifecycles based on typical

ML workows (§ 3.1), derive the tasks of ML artifact management,

and outline the support ML AMSs should provide (§ 3.2).

3.1 ML Lifecycle

In contrast to traditional software engineering, the development of

ML-powered applications is more iterative and explorative. Thus,

developers have adapted their processes and practices for ML: Fol-

lowing methodologies in the context of data science, data analyt-

ics and data mining, such as TDSP [

102

], KDD [

], CRISP-DM

[

137

170

], or ASUM-DM [

], workows specialized for ML have

been established [

128

]. Despite minor dierences, ML

workows contain both data-centric and model-centric steps and

often multiple feedback loops among the dierent stages, which

leads to a lifecycle. Fig. 1 depicts a common view on the ML lifecycle.

The ML lifecycle consists of four stages: Requirements Stage,

Data-oriented Stage, Model-oriented Stage, and Operations Stage.

Starting with the Requirements Stage, the requirements for the

model to be developed are derived based on the application require-

ments [

164

]. This stage is dedicated to three major decisions: (1.) to

decide which functionality and interfaces to realize, (2.) to decide

which types of models are best suited for the given problem, and

(3.) to decide which types of data to work on.

Requirements

Stage

Data-oriented

Stage

Model-oriented

Stage

Operations

Stage

Data Collection

and Selection Data Cleaning

Model Requirements

Analysis

Data Labeling

Feature Engineering

and Selection

Model Design Model Training

Model Evaluation

Model Optimization

Model Deployment Model Monitoring

Figure 1: Typical ML lifecycle.

The Data-oriented Stage starts with the Data Collection and

Selection step. Datasets, either internal or publicly available, are

searched, or individual ones are collected and the data most suitable

for the subsequent steps is selected (e. g. dependent on data quality,

bias, etc.). By using available generic datasets, models may be (pre-)

trained (e. g. ImageNet for object recognition), and later, by using

transfer learning [

111

] along with more specic data, trained

to a more specic model. Then, in the Data Cleaning step, datasets

are prepared, removing inaccurate or noisy records. As required by

most of the supervised learning techniques to be able to induce a

model, data labeling is used to assign a ground truth label to each

dataset record. Subsequently, feature engineering and selection is

performed to extract and select features for ML models. For some

models, such as convolutional neural networks, this step is directly

intertwined with model training.

The Model-oriented Stage starts with the Model Design step.

Often, existing model designs and neural network architectures

are used and tailored towards specic requirements. During model

training, the selected models are trained on the collected and pre-

processed datasets using the selected features and their respective

labels. Subsequently, in the Model Evaluation step, developers eval-

uate a trained model on test datasets using predened metrics, such

as accuracy or F1-score. In critical application domains, this step

also includes extensive human evaluation. The subsequent Model

Optimization step is used to ne-tune the model, especially its

hyperparameters. In the context of the model development steps,

we refer to an experiment as a sequence of model development

activities that result in a trained model but do not include cycles to

previous steps.

Finally, in the Operations Stage, the model is distributed to the

target systems and devices, either as an on-demand (online) service

or in batch (oine) mode (Model Deployment), as well as continu-

ously monitored for metrics and errors during execution and use

(Model Monitoring). In particular, CI/CD practices from software

engineering are adapted.

As illustrated by Fig. 1, multiple feedback loops from steps of the

Model-oriented Stage or the Operations Stage to any step before

may be triggered by insucient accuracy or new data. Moreover,

Sculley et al. [

131

] point out, that the model development often takes

only a fraction of the time required to complete ML projects. Usually,

a large amount of tooling and infrastructure is required to support

data extract, transform, and load (ETL) pipelines, ecient training

and inference, reproducible experiments, versioning of datasets and

models, model analysis, and model monitoring at scale. The creation

and management of services and tools can ultimately account for

a large portion of the workload of ML engineers, researchers, and

data scientists.

3.2 Management of ML Lifecycle Artifacts

Within the steps of the ML lifecycle, a variety of artifacts is created,

used, and modied: Datasets, labels and annotations, and feature

sets are inputs and outputs of steps in the Data-oriented Stage.

Moreover, data processing source code, logs, and environmental

dependencies are created and/or used. In the Model-oriented Stage,

results from the Data-oriented Stage are used to develop and train

models. In addition, metadata such as parameters, hyperparameters,

and captured metrics as well as model processing source code, logs,

and environment dependencies are artifacts that are created and/or

used in this stage. The Operations Stage requires trained models

and corresponding dependencies such as libraries and runtimes

(e. g. via Docker container), uses model deployment and monitoring

source code which is typically wrapped into a web service with a

REST API for on-demand (online) service or scheduled for batch

(oine) execution, and captures execution logs and statistics. To

achieve comparability, traceability, and reproducibility of produced

data and model artifacts across multiple lifecycle iterations, it is es-

sential to also capture metadata artifacts that can be easily inspected

afterwards (e. g. model parameters, hyperparameters, lineage traces,

performance metrics) as well as software artifacts.

Manual management of artifacts is simply not ecient due to the

complexity and the required time. To meet the above requirements,

it is necessary to systematically capture any input and output ar-

tifacts and to provide them via appropriate interfaces. ML artifact

management includes any methods and tools for managing ML arti-

facts that are created and used in the development, deployment, and

operation of ML-based systems. Systems supporting ML artifact

management, collectively referred to as ML artifact management

systems (ML AMSs), provide the functionality and interfaces to

adequately record, store, and manage ML lifecycle artifacts.

4 ASSESSMENT CRITERIA

The goal of this section is to dene criteria for the description and

assessment of AMSs. Based on a priori assumptions, we rst list

functional and non-functional requirements. We then conduct a

systematic literature review according to Kitchenham et al. [

]: Us-

ing well-dened keywords, we search ACM DL, DBLP, IEEE Xplore,

and SpringerLink for academic publications as well as Google and

Google Scholar for web pages, articles, white papers, technical re-

ports, reference lists, source code repositories, and documentations.

Next, we perform the publication selection based on the relevance

for answering our research questions. To avoid overlooking relevant

literature, we perform one iteration of backward snowballing [

171

Finally, we iteratively extract assessment criteria and subcriteria,

criteria categories, as well as the functional and non-functional

properties of concrete systems and platforms based on concept

matrices. The results are shown in Tab. 1, which outlines categories,

criteria (italicized), subcriteria (in square brackets).

Lifecycle Integration. This category describes for which parts of

the ML lifecycle a system provides artifact collection and manage-

ment capabilities. The four stages form the criteria, with the steps

assigned to each stage forming the subcriteria (cf. § 3.1).

Artifact Support. Orthogonal to the previous category, this cate-

gory indicates which types of artifacts are supported and managed

by an AMS. Based on the discussion in § 3.2, we distinguish between

the criteria Data-related,Model,Metadata, and Software Artifacts.

The criteria Data-related Artifacts and Model Artifacts represent

core resources that are either input, output, or both for a lifecycle

step. Data-related Artifacts are datasets (used for training, validation,

and testing), annotations and labels, and features (cf. correspond-

ing subcriteria). Model Artifacts are represented by trained models

(subcriterion Model).

The criteria Metadata Artifacts and Software Artifacts represent

the corresponding artifact types, that enable the reproducibility

and traceability of individual ML lifecycle steps and their results.

The criterion Metadata Artifacts covers dierent types of metadata:

(i) identication metadata (e. g. identier, name, type of dataset

or model, association with groups, experiments, pipelines, etc.);

(ii) data-related metadata; (iii) model-related metadata, such as

inspectable model parameters (e. g. weights and biases), model hy-

perparameters (e. g. number of hidden layers, learning rate, batch

size, or dropout), and model quality & performance metrics (e. g.

accuracy, F1-score, or AUC score); (iv) experiments and projects,

which are abstractions to capture data processing or model training

runs and to group related artifacts in a reproducible and compa-

rable way; (v) pipelines, which are abstractions to execute entire

ML workows in an automated fashion and relates the input and

output artifacts required per step as well as the glue code required

for processing; (vi) execution-related logs & statistics.

The criterion Software Artifacts comprises source code and note-

books, e. g. for data processing, experimentation and model training,

and serving, as well as congurations and execution-related envi-

ronment dependencies and containers, e. g. Conda environments,

Docker containers, or virtual machines.

Operations. This category indicates the operations provided by

an AMS for handling and managing ML artifacts. It comprises

the criteria Logging & Versioning,Exploration,Management, and

Collaboration.

The criterion Logging & Versioning represents any operations

that enable logging or capturing single artifacts (subcriterion Log/

Capture), creating checkpoints of a project or an experiment com-

prising several artifacts (subcriterion Commit), and reverting or

rolling back to an earlier committed or snapshot version (subcrite-

rion Revert/Rollback).

The criterion Exploration includes any operations that help to

gain concrete insights into the results of data processing pipelines,

experiments, model training results, or monitoring statistics. These

operations are dierentiated by the subcriteria Query,Compare,

Lineage,Provenance, and Visualize.Query operations may be repre-

sented by simple searching and listing functionality, more advanced

ltering functionality (e. g. based on model performance metrics), or

a comprehensive query language. Compare indicates the presence

of operations for the comparison between two or more versions of

artifacts. In terms of model artifacts, this operation may be used to

select the most promising model from a set of candidates (model se-

lection), either in model training and development [

122

] or in model

serving (e. g. best performing predictor) [

]. Lineage represents

any operations for tracing the lineage of artifacts, i. e. which input

artifacts led to which output artifacts, and thus provide information

about the history of a model, dataset, or project. Provenance repre-

sents any operations, which in addition provide information about

which concrete transformations and processes converted inputs

into an output. Visualize indicates the presence of functionality for

graphical representation of model architectures, pipelines, model

metrics, or experimentation results.

The criterion Management characterizes operations for handling

and using stored artifacts. The subcriteria Modify and Delete indi-

cate operations for modifying or deleting logged and already stored

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ManagementofMachineLearningLifecycleArtifacts:ASurveyMariusSchlegelTUIlmenauIlmenau,Germanymarius.schlegel@tu-ilmenau.deKai-UweSattlerTUIlmenauIlmenau,Germanykus@tu-ilmenau.deABSTRACTTheexplorativeanditerativenatureofdevelopingandoperatingmachinelearning(ML)applicationsleadstoavarietyofartifacts,suc...

展开>> 收起<<

Management of Machine Learning Lifecycle Artifacts A Survey.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Management of Machine Learning Lifecycle Artifacts A Survey

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: