
arXiv:2210.05244v1 [cs.DC] 11 Oct 2022
DATALOADER PARAMETER TUNER: ANAUTOMATED
DATALOADER PARAMETER TUNER FOR DEEP LEARNING
MODELS
JooYoung Park, DoangJoo Synn
Korea University
Seoul
Republic of Korea
{nehalem, alansynn}@korea.ac.kr
XinYu Piao
Korea University
Seoul
Republic of Korea
xypiao97@korea.ac.kr
Jong-Kook Kim
Korea University
Seoul
Republic of Korea
jongkook@korea.ac.kr
ABSTRACT
Deep learning has recently become one of the most compute/data-intensive methods and is widely
used in many research areas and businesses. One of the critical challenges of deep learning is that it
has many parameters that can be adjusted, and the optimal value may need to be determined for faster
operation and high accuracy. The focus of this paper is the adjustable parameters of the dataloader.
The dataloader in a system mainly groups the data appropriately and loads it to the main memory for
the deep learning model to use. We introduce an automated framework called Dataloader Parameter
Tuner (DPT) that determines the optimal value for the parameters required for the dataloader. This
framework discovers the optimal values for the number of dataloader’s subprocesses (i.e., worker)
and prefetch factor through grid search to accelerate the data transfer for machine learning systems.
Keywords performance, machine learning systems, dataloader
1 Introduction
Recently, data collected by many enterprises are increasing in capacity, resolution, and variety. This is due to
the increase in the use of mobile devices and the Internet. To process this so called ‘Big Data’, the need for
Deeper Neural Networks (DNN) is also increasing. Conversely, bigger datasets are needed to successfully train
more complex DNN [Alwosheel et al.(2018)], both leading to need for more computing power and more mem-
ory. To fulfill the demand for more computing power, modern computing architectures use multi-CPUs and
GPUs [Masek et al.(2016), Pal et al.(2019)], which exhibits the characteristics of a distributed computing system
in a single node or system. Therefore, various learning techniques such as data parallelism [Huo et al.(2018),
Jia et al.(2019), Ho et al.(2013)], model parallelism [Chilimbi et al.(2014), Dean et al.(2012)], and pipeline paral-
lelism [Huang et al.(2019), Narayanan et al.(2019), Synn et al.(2021)] are introduced for effecient use of such systems.
Modern deep learning framework tries to utilize dataloader as much as possible by using multi-threading (e.g. Tensor-
flow) [TensorFlow(2021)] and multi-processing (e.g. PyTorch) [PyTorch(2019)]. As default, the main dataloader of a
system mostly loads, reprocesses, and shuffles data and passes it to the targeted DNN. To take advantage of parallelism,
the main dataloader process spawns multiple sub process, also known as worker or thread. And the number of these
sub processes can be adjusted by using arguments.
Recent studies for dataloaders are mostly application specific implementations like dataloader for language related
applications [Cai et al.(2020)], or graph related applications [Hu et al.(2020)]. Because of diverse computing environ-
ments and various DNN models, tuning common parameters such as workers/threads and prefetch factors may be the
solution to overall performance boost. Deep learning frameworks such as PyTorch have default parameter values for
the dataloader which are half of CPU cores for the number workers and 2 for the prefetch factor. Due to varying
computing environment, these parameter values are often not the optimal. Number of CPU cores, CPU performance,