DATALOADER PARAMETER TUNER FOR DEEPLEARNING MODELS

2025-08-18 2 0 265.16KB 7 页 10玖币
侵权投诉
arXiv:2210.05244v1 [cs.DC] 11 Oct 2022
DATALOADER PARAMETER TUNER: ANAUTOMATED
DATALOADER PARAMETER TUNER FOR DEEP LEARNING
MODELS
JooYoung Park, DoangJoo Synn
Korea University
Seoul
Republic of Korea
{nehalem, alansynn}@korea.ac.kr
XinYu Piao
Korea University
Seoul
Republic of Korea
xypiao97@korea.ac.kr
Jong-Kook Kim
Korea University
Seoul
Republic of Korea
jongkook@korea.ac.kr
ABSTRACT
Deep learning has recently become one of the most compute/data-intensive methods and is widely
used in many research areas and businesses. One of the critical challenges of deep learning is that it
has many parameters that can be adjusted, and the optimal value may need to be determined for faster
operation and high accuracy. The focus of this paper is the adjustable parameters of the dataloader.
The dataloader in a system mainly groups the data appropriately and loads it to the main memory for
the deep learning model to use. We introduce an automated framework called Dataloader Parameter
Tuner (DPT) that determines the optimal value for the parameters required for the dataloader. This
framework discovers the optimal values for the number of dataloader’s subprocesses (i.e., worker)
and prefetch factor through grid search to accelerate the data transfer for machine learning systems.
Keywords performance, machine learning systems, dataloader
1 Introduction
Recently, data collected by many enterprises are increasing in capacity, resolution, and variety. This is due to
the increase in the use of mobile devices and the Internet. To process this so called ‘Big Data’, the need for
Deeper Neural Networks (DNN) is also increasing. Conversely, bigger datasets are needed to successfully train
more complex DNN [Alwosheel et al.(2018)], both leading to need for more computing power and more mem-
ory. To fulfill the demand for more computing power, modern computing architectures use multi-CPUs and
GPUs [Masek et al.(2016), Pal et al.(2019)], which exhibits the characteristics of a distributed computing system
in a single node or system. Therefore, various learning techniques such as data parallelism [Huo et al.(2018),
Jia et al.(2019), Ho et al.(2013)], model parallelism [Chilimbi et al.(2014), Dean et al.(2012)], and pipeline paral-
lelism [Huang et al.(2019), Narayanan et al.(2019), Synn et al.(2021)] are introduced for effecient use of such systems.
Modern deep learning framework tries to utilize dataloader as much as possible by using multi-threading (e.g. Tensor-
flow) [TensorFlow(2021)] and multi-processing (e.g. PyTorch) [PyTorch(2019)]. As default, the main dataloader of a
system mostly loads, reprocesses, and shuffles data and passes it to the targeted DNN. To take advantage of parallelism,
the main dataloader process spawns multiple sub process, also known as worker or thread. And the number of these
sub processes can be adjusted by using arguments.
Recent studies for dataloaders are mostly application specific implementations like dataloader for language related
applications [Cai et al.(2020)], or graph related applications [Hu et al.(2020)]. Because of diverse computing environ-
ments and various DNN models, tuning common parameters such as workers/threads and prefetch factors may be the
solution to overall performance boost. Deep learning frameworks such as PyTorch have default parameter values for
the dataloader which are half of CPU cores for the number workers and 2 for the prefetch factor. Due to varying
computing environment, these parameter values are often not the optimal. Number of CPU cores, CPU performance,
Park and Synn, et al.
system memory, number of GPU and its performance, and even GPU Memory size and system I/O performance makes
it difficult to determine the optimal parameter values.
This paper proposes the Dataloader Parameter Tuner (DPT) that determines the optimal number of workers and
prefetch factor for a particular system. Thus maximizing the effectiveness of the dataloader.
This paper is organized as follows. The concept of dataloader and the parameter to tune are described in Section 2.
Section 3 introduces the DPT method and terminologies used throughout this paper. Experimental results are depicted
in Section 4. Finally, Section 5 summarizes the research.
2 Dataloader
2.1 Overview
A dataloader process usually consists of four steps. First two steps are dataloading and transform. Dataloader first calls
a dataset instance. The dataset instance reads the data and label from the storage, and then transforms (e.g., padding,
tensorize, collate function) them to a suitable form for future processes. Third is shuffling and batching according to
the arguments that the dataloader received. Fourth is prefetching. Prefetching loads nsamples in advance such that
communication latency can be hidden.
2.2 Worker
In Pytorch, the dataloader has a parameter called num worker, which indicates the number of workers to be spawned.
Dataloader spawns a number of worker processes, where each worker receives a series of arguments which include the
location of dataset, collate function, and values to initialize the worker. This means data is accessed through disk I/O,
and the transformation of dataset is executed within the worker process. These workers prepare the data in parallel.
3 Dataloader Parameter Tuner
3.1 Overview
Dataloader Parameter Tuner (DPT) is a framework that finds the optimal parameters for dataloader using grid search
(Figure 1). A grid search is a methodology to find a optimal value by substituting all combinations of the candidate
hyper parameters. Using grid search allows DPT to take variables that are difficult to parameterize into account such
as dataset properties and hardware dependency. And parameters drawn from DPT may be reused on the same machine
upon loading data sets that have similar characteristics.
Grid Search
Dataloader Parameter Tuner Framework
GPU
Spawn Retrieve
Send
Total
Transfer
Time
Feedback
nWorker
nPrefetch
Update
Parameter
Search
Parameter
Dataloader
Number of Worker
Number of Prefetch Factor
Worker
Worker
Worker
Worker
Worker
Worker
Storage
GPU
Memory
Monitoring
Figure 1: Overview of DPT
3.2 Procedure
In Algorithm 1, DPT tries to determine optimal values for two dataloader parameters. The first one is the nWorker
which is the number of workers spawned by the dataloder, the other is the nPrefetch, which is the number of batches
to be processed ahead of time. In the beginning of DPT, initialized values of the three variables Nwhich is the number
2
摘要:

arXiv:2210.05244v1[cs.DC]11Oct2022DATALOADERPARAMETERTUNER:ANAUTOMATEDDATALOADERPARAMETERTUNERFORDEEPLEARNINGMODELSJooYoungPark,DoangJooSynnKoreaUniversitySeoulRepublicofKorea{nehalem,alansynn}@korea.ac.krXinYuPiaoKoreaUniversitySeoulRepublicofKoreaxypiao97@korea.ac.krJong-KookKimKoreaUniversitySeou...

展开>> 收起<<
DATALOADER PARAMETER TUNER FOR DEEPLEARNING MODELS.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:7 页 大小:265.16KB 格式:PDF 时间:2025-08-18

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注