DATALOADER PARAMETER TUNER FOR DEEPLEARNING MODELS

2025-08-18 2 0 265.16KB 7 页 10玖币

侵权投诉

arXiv:2210.05244v1 [cs.DC] 11 Oct 2022

DATALOADER PARAMETER TUNER: ANAUTOMATED

DATALOADER PARAMETER TUNER FOR DEEP LEARNING

MODELS

JooYoung Park, DoangJoo Synn

Korea University

Seoul

Republic of Korea

{nehalem, alansynn}@korea.ac.kr

XinYu Piao

Korea University

Seoul

Republic of Korea

xypiao97@korea.ac.kr

Jong-Kook Kim

Korea University

Seoul

Republic of Korea

jongkook@korea.ac.kr

ABSTRACT

Deep learning has recently become one of the most compute/data-intensive methods and is widely

used in many research areas and businesses. One of the critical challenges of deep learning is that it

has many parameters that can be adjusted, and the optimal value may need to be determined for faster

operation and high accuracy. The focus of this paper is the adjustable parameters of the dataloader.

The dataloader in a system mainly groups the data appropriately and loads it to the main memory for

the deep learning model to use. We introduce an automated framework called Dataloader Parameter

Tuner (DPT) that determines the optimal value for the parameters required for the dataloader. This

framework discovers the optimal values for the number of dataloader’s subprocesses (i.e., worker)

and prefetch factor through grid search to accelerate the data transfer for machine learning systems.

Keywords performance, machine learning systems, dataloader

1 Introduction

Recently, data collected by many enterprises are increasing in capacity, resolution, and variety. This is due to

the increase in the use of mobile devices and the Internet. To process this so called ‘Big Data’, the need for

Deeper Neural Networks (DNN) is also increasing. Conversely, bigger datasets are needed to successfully train

more complex DNN [Alwosheel et al.(2018)], both leading to need for more computing power and more mem-

ory. To fulﬁll the demand for more computing power, modern computing architectures use multi-CPUs and

GPUs [Masek et al.(2016), Pal et al.(2019)], which exhibits the characteristics of a distributed computing system

in a single node or system. Therefore, various learning techniques such as data parallelism [Huo et al.(2018),

Jia et al.(2019), Ho et al.(2013)], model parallelism [Chilimbi et al.(2014), Dean et al.(2012)], and pipeline paral-

lelism [Huang et al.(2019), Narayanan et al.(2019), Synn et al.(2021)] are introduced for effecient use of such systems.

Modern deep learning framework tries to utilize dataloader as much as possible by using multi-threading (e.g. Tensor-

ﬂow) [TensorFlow(2021)] and multi-processing (e.g. PyTorch) [PyTorch(2019)]. As default, the main dataloader of a

system mostly loads, reprocesses, and shufﬂes data and passes it to the targeted DNN. To take advantage of parallelism,

the main dataloader process spawns multiple sub process, also known as worker or thread. And the number of these

sub processes can be adjusted by using arguments.

Recent studies for dataloaders are mostly application speciﬁc implementations like dataloader for language related

applications [Cai et al.(2020)], or graph related applications [Hu et al.(2020)]. Because of diverse computing environ-

ments and various DNN models, tuning common parameters such as workers/threads and prefetch factors may be the

solution to overall performance boost. Deep learning frameworks such as PyTorch have default parameter values for

the dataloader which are half of CPU cores for the number workers and 2 for the prefetch factor. Due to varying

computing environment, these parameter values are often not the optimal. Number of CPU cores, CPU performance,

Park and Synn, et al.

system memory, number of GPU and its performance, and even GPU Memory size and system I/O performance makes

it difﬁcult to determine the optimal parameter values.

This paper proposes the Dataloader Parameter Tuner (DPT) that determines the optimal number of workers and

prefetch factor for a particular system. Thus maximizing the effectiveness of the dataloader.

This paper is organized as follows. The concept of dataloader and the parameter to tune are described in Section 2.

Section 3 introduces the DPT method and terminologies used throughout this paper. Experimental results are depicted

in Section 4. Finally, Section 5 summarizes the research.

2 Dataloader

2.1 Overview

A dataloader process usually consists of four steps. First two steps are dataloading and transform. Dataloader ﬁrst calls

a dataset instance. The dataset instance reads the data and label from the storage, and then transforms (e.g., padding,

tensorize, collate function) them to a suitable form for future processes. Third is shufﬂing and batching according to

the arguments that the dataloader received. Fourth is prefetching. Prefetching loads nsamples in advance such that

communication latency can be hidden.

2.2 Worker

In Pytorch, the dataloader has a parameter called num worker, which indicates the number of workers to be spawned.

Dataloader spawns a number of worker processes, where each worker receives a series of arguments which include the

location of dataset, collate function, and values to initialize the worker. This means data is accessed through disk I/O,

and the transformation of dataset is executed within the worker process. These workers prepare the data in parallel.

3 Dataloader Parameter Tuner

3.1 Overview

Dataloader Parameter Tuner (DPT) is a framework that ﬁnds the optimal parameters for dataloader using grid search

(Figure 1). A grid search is a methodology to ﬁnd a optimal value by substituting all combinations of the candidate

hyper parameters. Using grid search allows DPT to take variables that are difﬁcult to parameterize into account such

as dataset properties and hardware dependency. And parameters drawn from DPT may be reused on the same machine

upon loading data sets that have similar characteristics.

Grid Search

Dataloader Parameter Tuner Framework

GPU

Spawn Retrieve

Send

Total

Transfer

Time

Feedback

nWorker

nPrefetch

Update

Parameter

Dataloader

Number of Worker

Number of Prefetch Factor

Worker

Storage

GPU

Memory

Monitoring

Figure 1: Overview of DPT

3.2 Procedure

In Algorithm 1, DPT tries to determine optimal values for two dataloader parameters. The ﬁrst one is the nWorker

which is the number of workers spawned by the dataloder, the other is the nPrefetch, which is the number of batches

to be processed ahead of time. In the beginning of DPT, initialized values of the three variables Nwhich is the number

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

arXiv:2210.05244v1[cs.DC]11Oct2022DATALOADERPARAMETERTUNER:ANAUTOMATEDDATALOADERPARAMETERTUNERFORDEEPLEARNINGMODELSJooYoungPark,DoangJooSynnKoreaUniversitySeoulRepublicofKorea{nehalem,alansynn}@korea.ac.krXinYuPiaoKoreaUniversitySeoulRepublicofKoreaxypiao97@korea.ac.krJong-KookKimKoreaUniversitySeou...

展开>> 收起<<

DATALOADER PARAMETER TUNER FOR DEEPLEARNING MODELS.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DATALOADER PARAMETER TUNER FOR DEEPLEARNING MODELS

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: