DRL PyHopper - Hyperparameter optimization Mathias Lechner12 Ramin Hasani12 Philipp Neubauer23 Sophie Neubauer23 Daniela Rus1

2025-05-03 0 0 2MB 18 页 10玖币
侵权投诉
DRL
PyHopper - Hyperparameter optimization
Mathias Lechner1,2,†, Ramin Hasani1,2, Philipp Neubauer2,3, Sophie Neubauer2,3, Daniela Rus1
1Massachusetts Institute of Technology (MIT)
2Simple AI
3DatenVorspung GmbH
Correspondence E-mail: mlechner@mit.edu
Hyperparameter tuning is a fundamental aspect of machine learning research. Setting up the in-
frastructure for systematic optimization of hyperparameters can take a significant amount of time.
Here, we present PyHopper, a black-box optimization platform designed to streamline the hy-
perparameter tuning workflow of machine learning researchers. PyHoppers goal is to integrate
with existing code with minimal effort and run the optimization process with minimal necessary
manual oversight. With simplicity as the primary theme, PyHopper is powered by a single robust
Markov-chain Monte-Carlo optimization algorithm that scales to millions of dimensions. Com-
pared to existing tuning packages, focusing on a single algorithm frees the user from having to
decide between several algorithms and makes PyHopper easily customizable. PyHopper is pub-
licly available under the Apache-2.0 license at https://github.com/PyHopper/PyHopper.
import pyhopper
def objective(hparams):
model = build_model(hparams["size"],...)
opt = Adam(hparams["lr"])
train_loader, val_loader = ...
# .... train model
val_accuracy = model.evaluate(val_loader)
return val_accuracy
if __name__ == "__main__":
search = pyhopper.Search(
epochs = 100,
size = pyhopper.int(100,500),
gain = pyhopper.float(0,10, shape=(10,2)),
opt = pyhopper.choice("adam","rmsprop"),
lr = pyhopper.float(1e-5,1e-1, "0.1g"),
...
)
best_params = search.run(
objective, "max",
runtime = "1h 30min",
n_jobs="per-gpu"
)
pip3 install pyhopper
Hyperparameters are dict objects
Use training code without changes
Pythonic search space definition
Multidimensional array parameters
Limit search space via format strings
(e.g. ”0.1g” 1-significant digit
and loguniform)
User-friendly way to set runtime
Run evaluations in parallel
on each available GPU
Fig. 1: Visual abstract showing a typical use of PyHopper and some of its key features.
arXiv:2210.04728v1 [cs.LG] 10 Oct 2022
PyHopper - Hyperparameter optimization
Table of Contents
1 Introduction 2
2 Related Works 3
2.1 Hyperparameter tuning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Hyperparameter tuning packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 PyHopper HPO Algorithm via Use Cases 6
3.1 Use Case 1 - HPO with maximum resource usage . . . . . . . . . . . . . . . . . . . . . 6
3.2 Use Case 2 - Fair comparison of multiple methods . . . . . . . . . . . . . . . . . . . . . 6
3.3 Use Case 3 - Black-box (gradient-free) optimization . . . . . . . . . . . . . . . . . . . . 7
4 Optimization algorithm 8
5 Parallelization 9
6 API design 9
6.1 Separationofconcerns ..................................... 10
6.2 Helplerfunctions ........................................ 10
6.3 Customization.......................................... 12
6.4 Pruningalgorithms....................................... 12
7 Examples 12
7.1 Availabledatatypes....................................... 12
7.2 Log-uniform and quantized distributions via format string . . . . . . . . . . . . . . . . 13
7.3 Commandlinearguments................................... 13
7.4 Noisyobjectiveandpruning.................................. 14
7.5 Fault tolerance and preemptive compute instances . . . . . . . . . . . . . . . . . . . . . 14
8 Experiments 15
9 Limitations 16
10 Conclusion 17
1. Introduction
Modern machine learning (ML) research involves a considerable amount of hyperparameter tuning.
A hyperparameter is a value that is required to be set for training a machine learning model before
the optimization process begins. For instance, the learning rate, i.e., the step size with which an op-
timization algorithm performs the next iteration towards minimizing a loss function, is a typical hy-
perparameter. Other examples of hyperparameters include the choice of the optimization algorithm,
weight regularization factors, or simply the width and depth of a neural network. Hyperparame-
ters are typically set before the optimization process and are not learned together with the model
parameters. This makes finding the right set of hyperparameters challenging.
2
PyHopper - Hyperparameter optimization
Changes in the hyperparameters drastically affect the performance of a trained ML model. For
instance, a learning rate set too high or too low can make the difference between failing or solving a
task. Moreover, the relation of the optimization process with respect to the hyperparameters is non-
convex and non-differentiable. Consequently, the problem of finding the optimal hyperparameters
could be formulated as a black-box optimization problem.
The dependency of ML models’ performance on hyperparameters is a fundamental issue in ma-
chine learning research. This is because when we get to compare models with each other, ensuring a
fair comparison between a new method and the previous baselines highly depends on the choice of
hyperparameters. For example. suppose we want to compare the test performance of two different
machine learning systems A and B. Shall we compare them with default hyperparameter settings,
or shall we tune their hyperparameters, respectively? How much shall we tune them? How can we
ensure we find the optimal settings for each model?
Moreover, consider the case where method A is already a well-established model whose optimal
hyperparameter range is known. How much effort should we put into tuning the hyperparameters
of B for a fair comparison?
The result of the challenges described above is that researchers can spend a tremendous amount
of time tuning the hyperparameters of ML models. The pun grad-student-descent was coined de-
scribing how a typical graduate student working in machine learning spends most of their time
manually tuning some hyperparameters.
Many algorithms and software packages for automatically tuning the hyperparameters have been
proposed. Throughout, we will refer to these algorithms Hyperparameter Optimization (HPO) meth-
ods. However, the shape and size of hyperparameter optimization are very task-specific, and no
one-fits-all solution or free lunch exists.
For instance, an HPO algorithm that performs well in low-dimensional problems may struggle to
outperform simple baselines in higher-dimensional setups. Similarly, the smoothness and curvature
of the objective surface can change drastically between two problem instances, making a one-fits-
all solution impossible. Moreover, the design of hyperparameter tuning packages often encounters
contradictory specifications. For instance, an ideal package should be extensible, customizable, and
rich in features, which, however, may steepen the learning curve and contradict our requirement that
the package should be simple and easy to use.
In this work, we introduce Pyopper, a hyperparameter tuning platform tailored to the optimiza-
tion frameworks we encounterin machine learning research (e.g., training neural networks). In par-
ticular, our HPO platform allows us to streamline the hyperparameter tuning procedures and scale
to hundreds of tuning tasks with minimal effort. The key strengths of PyHopper are:
An intuitive interface that integrates existing machine learning code requiring minimal changes
A highly customizable and robust optimization algorithm based on sequential Markov-chain
Monte-Carlo sampling that scales to millions of hyperparameters
Numerous built-in utility methods to streamline common use cases, such as multi-GPU setup,
checkpointing, and runtime scheduling.
2. Related Works
Numerous hyperparameter optimization algorithms have been proposed in the literature, each spe-
cialized for specific use cases and applications. Moreover, many publicly available HPO packages
implement these algorithms. In this section, we first discuss the most important HPO algorithms
3
PyHopper - Hyperparameter optimization
and how they compare with each other. In the second part, we describe common HPO packages for
Python and highlight their differences from PyHopper.
2.1. Hyperparameter tuning algorithms
Grid Search is arguably the most basic HPO. As its name suggests, Grid search spans a grid over the
parameter space and evaluates every intersection point of the grid. The best configuration of the grid
points is then returned as the best parameter. More advanced variations, such as iterative grid search,
refine the grid resolution locally around the best-found parameter to explore the configuration space
in more detail. The main advantage of Grid search is that it explores all parts of the configuration
space. Thus it does not get easily trapped in local optima. However, the major bottleneck of Grid
search is that its complexity scales exponentially with the dimension of the configuration space, e.g.,
for eight hyperparameters, a grid with five ticks results in almost 400,000 intersection points that
need to be evaluated. Consequently, Grid search is only suitable for low dimensional configuration
spaces, i.e., typically 2 or 3 hyperparameters.
Sequential Model-Based Optimization (SMBO) [7] is a powerful black-box optimization paradigm.
The key idea of SMBO is to fit a surrogate model to the already evaluated points to interpolate be-
tween unexplored parts of the configuration space. The surrogate model has a special structure that
allows finding global optimums easily, e.g., analytically. These global optimums of the fitted surro-
gate model are then evaluated using the true objective function, and the observed objective values
are used to update the surrogate model.
Bayesian Optimization (BO) extends SMBO by fitting distributions instead of deterministic func-
tions to the evaluated points of the configuration space. The key benefit of BO over SMBO is that
they allow modeling the uncertainty about the interpolated parts of the configuration space, i.e., the
uncertainty increases the further away a point is from an already evaluated candidate configura-
tion. Consequently, we can sample points in the configuration space that will maximize the gained
information about the optimum of the black-box objective function.
The main advantage of SMBO and BO is that they become more and more accurate at finding op-
timal parameters the more information about the objective landscape becomes available. Moreover,
BO allows tailoring the algorithm for specific applications by selecting the type of surrogate model
used, i.e., adding prior knowledge about the optimization landscape. However, this comes with the
downside that SMBO and BO are less effective when little information (i.e., evaluated parameters) is
available. Nonetheless, Bayesian optimization is often used in competition-winning toolkits due to
its ability to add prior knowledge about the optimization landscape and adjust the algorithm to the
particular type of the competition’s optimization problem. Gaussian processes (GPs) are a form of
Bayesian optimization where multivariate normal distributions realize the surrogate model [2].
Tree-structured Parzen Estimator (TPE) [2] is a sequential model-based optimization algorithm that
can handle conditional configuration spaces efficiently. An example of such a conditional config-
uration would be the number of layers and corresponding hidden units in each layer of a neural
network. Particularly, the number of layers in the fifth layer is only needed if the number of layers
exceeds 4.
Random Search (RS) is another straightforward black-box optimization baseline. RS samples candi-
date solutions from a uniform distribution over the entire configuration space. Despite its simplic-
ity, RS can be competitive and outperform alternative algorithms in high-dimensional configuration
spaces [3]. PyHopper’s HPO algorithm starts with an RS to gain information about the objective
surface and decide for the second phase which area of the configuration space to focus on.
4
摘要:

DRLPyHopper-HyperparameteroptimizationMathiasLechner1,2,†,RaminHasani1,2,PhilippNeubauer2,3,SophieNeubauer2,3,DanielaRus11MassachusettsInstituteofTechnology(MIT)2SimpleAI3DatenVorspungGmbH†CorrespondenceE-mail:mlechner@mit.eduHyperparametertuningisafundamentalaspectofmachinelearningresearch.Settingu...

展开>> 收起<<
DRL PyHopper - Hyperparameter optimization Mathias Lechner12 Ramin Hasani12 Philipp Neubauer23 Sophie Neubauer23 Daniela Rus1.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:2MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注