DRL PyHopper - Hyperparameter optimization Mathias Lechner12 Ramin Hasani12 Philipp Neubauer23 Sophie Neubauer23 Daniela Rus1

2025-05-03 0 0 2MB 18 页 10玖币

侵权投诉

DRL

PyHopper - Hyperparameter optimization

Mathias Lechner1,2,†, Ramin Hasani1,2, Philipp Neubauer2,3, Sophie Neubauer2,3, Daniela Rus1

1Massachusetts Institute of Technology (MIT)

2Simple AI

3DatenVorspung GmbH

†Correspondence E-mail: mlechner@mit.edu

Hyperparameter tuning is a fundamental aspect of machine learning research. Setting up the in-

frastructure for systematic optimization of hyperparameters can take a signiﬁcant amount of time.

Here, we present PyHopper, a black-box optimization platform designed to streamline the hy-

perparameter tuning workﬂow of machine learning researchers. PyHopper’s goal is to integrate

with existing code with minimal effort and run the optimization process with minimal necessary

manual oversight. With simplicity as the primary theme, PyHopper is powered by a single robust

Markov-chain Monte-Carlo optimization algorithm that scales to millions of dimensions. Com-

pared to existing tuning packages, focusing on a single algorithm frees the user from having to

decide between several algorithms and makes PyHopper easily customizable. PyHopper is pub-

licly available under the Apache-2.0 license at https://github.com/PyHopper/PyHopper.

import pyhopper

def objective(hparams):

model = build_model(hparams["size"],...)

opt = Adam(hparams["lr"])

train_loader, val_loader = ...

# .... train model

val_accuracy = model.evaluate(val_loader)

return val_accuracy

if __name__ == "__main__":

search = pyhopper.Search(

epochs = 100,

size = pyhopper.int(100,500),

gain = pyhopper.float(0,10, shape=(10,2)),

opt = pyhopper.choice("adam","rmsprop"),

lr = pyhopper.float(1e-5,1e-1, "0.1g"),

...

)

best_params = search.run(

objective, "max",

runtime = "1h 30min",

n_jobs="per-gpu"

)

pip3 install pyhopper

Hyperparameters are dict objects

Use training code without changes

Pythonic search space deﬁnition

Multidimensional array parameters

Limit search space via format strings

(e.g. ”0.1g” →1-signiﬁcant digit

and loguniform)

User-friendly way to set runtime

Run evaluations in parallel

on each available GPU

Fig. 1: Visual abstract showing a typical use of PyHopper and some of its key features.

arXiv:2210.04728v1 [cs.LG] 10 Oct 2022

PyHopper - Hyperparameter optimization

Table of Contents

1 Introduction 2

2 Related Works 3

2.1 Hyperparameter tuning algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Hyperparameter tuning packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 PyHopper HPO Algorithm via Use Cases 6

3.1 Use Case 1 - HPO with maximum resource usage . . . . . . . . . . . . . . . . . . . . . 6

3.2 Use Case 2 - Fair comparison of multiple methods . . . . . . . . . . . . . . . . . . . . . 6

3.3 Use Case 3 - Black-box (gradient-free) optimization . . . . . . . . . . . . . . . . . . . . 7

4 Optimization algorithm 8

5 Parallelization 9

6 API design 9

6.1 Separationofconcerns ..................................... 10

6.2 Helplerfunctions ........................................ 10

6.3 Customization.......................................... 12

6.4 Pruningalgorithms....................................... 12

7 Examples 12

7.1 Availabledatatypes....................................... 12

7.2 Log-uniform and quantized distributions via format string . . . . . . . . . . . . . . . . 13

7.3 Commandlinearguments................................... 13

7.4 Noisyobjectiveandpruning.................................. 14

7.5 Fault tolerance and preemptive compute instances . . . . . . . . . . . . . . . . . . . . . 14

8 Experiments 15

9 Limitations 16

10 Conclusion 17

1. Introduction

Modern machine learning (ML) research involves a considerable amount of hyperparameter tuning.

A hyperparameter is a value that is required to be set for training a machine learning model before

the optimization process begins. For instance, the learning rate, i.e., the step size with which an op-

timization algorithm performs the next iteration towards minimizing a loss function, is a typical hy-

perparameter. Other examples of hyperparameters include the choice of the optimization algorithm,

weight regularization factors, or simply the width and depth of a neural network. Hyperparame-

ters are typically set before the optimization process and are not learned together with the model

parameters. This makes ﬁnding the right set of hyperparameters challenging.

PyHopper - Hyperparameter optimization

Changes in the hyperparameters drastically affect the performance of a trained ML model. For

instance, a learning rate set too high or too low can make the difference between failing or solving a

task. Moreover, the relation of the optimization process with respect to the hyperparameters is non-

convex and non-differentiable. Consequently, the problem of ﬁnding the optimal hyperparameters

could be formulated as a black-box optimization problem.

The dependency of ML models’ performance on hyperparameters is a fundamental issue in ma-

chine learning research. This is because when we get to compare models with each other, ensuring a

fair comparison between a new method and the previous baselines highly depends on the choice of

hyperparameters. For example. suppose we want to compare the test performance of two different

machine learning systems A and B. Shall we compare them with default hyperparameter settings,

or shall we tune their hyperparameters, respectively? How much shall we tune them? How can we

ensure we ﬁnd the optimal settings for each model?

Moreover, consider the case where method A is already a well-established model whose optimal

hyperparameter range is known. How much effort should we put into tuning the hyperparameters

of B for a fair comparison?

The result of the challenges described above is that researchers can spend a tremendous amount

of time tuning the hyperparameters of ML models. The pun grad-student-descent was coined de-

scribing how a typical graduate student working in machine learning spends most of their time

manually tuning some hyperparameters.

Many algorithms and software packages for automatically tuning the hyperparameters have been

proposed. Throughout, we will refer to these algorithms Hyperparameter Optimization (HPO) meth-

ods. However, the shape and size of hyperparameter optimization are very task-speciﬁc, and no

one-ﬁts-all solution or free lunch exists.

For instance, an HPO algorithm that performs well in low-dimensional problems may struggle to

outperform simple baselines in higher-dimensional setups. Similarly, the smoothness and curvature

of the objective surface can change drastically between two problem instances, making a one-ﬁts-

all solution impossible. Moreover, the design of hyperparameter tuning packages often encounters

contradictory speciﬁcations. For instance, an ideal package should be extensible, customizable, and

rich in features, which, however, may steepen the learning curve and contradict our requirement that

the package should be simple and easy to use.

In this work, we introduce Pyopper, a hyperparameter tuning platform tailored to the optimiza-

tion frameworks we encounterin machine learning research (e.g., training neural networks). In par-

ticular, our HPO platform allows us to streamline the hyperparameter tuning procedures and scale

to hundreds of tuning tasks with minimal effort. The key strengths of PyHopper are:

• An intuitive interface that integrates existing machine learning code requiring minimal changes

• A highly customizable and robust optimization algorithm based on sequential Markov-chain

Monte-Carlo sampling that scales to millions of hyperparameters

• Numerous built-in utility methods to streamline common use cases, such as multi-GPU setup,

checkpointing, and runtime scheduling.

2. Related Works

Numerous hyperparameter optimization algorithms have been proposed in the literature, each spe-

cialized for speciﬁc use cases and applications. Moreover, many publicly available HPO packages

implement these algorithms. In this section, we ﬁrst discuss the most important HPO algorithms

PyHopper - Hyperparameter optimization

and how they compare with each other. In the second part, we describe common HPO packages for

Python and highlight their differences from PyHopper.

2.1. Hyperparameter tuning algorithms

Grid Search is arguably the most basic HPO. As its name suggests, Grid search spans a grid over the

parameter space and evaluates every intersection point of the grid. The best conﬁguration of the grid

points is then returned as the best parameter. More advanced variations, such as iterative grid search,

reﬁne the grid resolution locally around the best-found parameter to explore the conﬁguration space

in more detail. The main advantage of Grid search is that it explores all parts of the conﬁguration

space. Thus it does not get easily trapped in local optima. However, the major bottleneck of Grid

search is that its complexity scales exponentially with the dimension of the conﬁguration space, e.g.,

for eight hyperparameters, a grid with ﬁve ticks results in almost 400,000 intersection points that

need to be evaluated. Consequently, Grid search is only suitable for low dimensional conﬁguration

spaces, i.e., typically 2 or 3 hyperparameters.

Sequential Model-Based Optimization (SMBO) [7] is a powerful black-box optimization paradigm.

The key idea of SMBO is to ﬁt a surrogate model to the already evaluated points to interpolate be-

tween unexplored parts of the conﬁguration space. The surrogate model has a special structure that

allows ﬁnding global optimums easily, e.g., analytically. These global optimums of the ﬁtted surro-

gate model are then evaluated using the true objective function, and the observed objective values

are used to update the surrogate model.

Bayesian Optimization (BO) extends SMBO by ﬁtting distributions instead of deterministic func-

tions to the evaluated points of the conﬁguration space. The key beneﬁt of BO over SMBO is that

they allow modeling the uncertainty about the interpolated parts of the conﬁguration space, i.e., the

uncertainty increases the further away a point is from an already evaluated candidate conﬁgura-

tion. Consequently, we can sample points in the conﬁguration space that will maximize the gained

information about the optimum of the black-box objective function.

The main advantage of SMBO and BO is that they become more and more accurate at ﬁnding op-

timal parameters the more information about the objective landscape becomes available. Moreover,

BO allows tailoring the algorithm for speciﬁc applications by selecting the type of surrogate model

used, i.e., adding prior knowledge about the optimization landscape. However, this comes with the

downside that SMBO and BO are less effective when little information (i.e., evaluated parameters) is

available. Nonetheless, Bayesian optimization is often used in competition-winning toolkits due to

its ability to add prior knowledge about the optimization landscape and adjust the algorithm to the

particular type of the competition’s optimization problem. Gaussian processes (GPs) are a form of

Bayesian optimization where multivariate normal distributions realize the surrogate model [2].

Tree-structured Parzen Estimator (TPE) [2] is a sequential model-based optimization algorithm that

can handle conditional conﬁguration spaces efﬁciently. An example of such a conditional conﬁg-

uration would be the number of layers and corresponding hidden units in each layer of a neural

network. Particularly, the number of layers in the ﬁfth layer is only needed if the number of layers

exceeds 4.

Random Search (RS) is another straightforward black-box optimization baseline. RS samples candi-

date solutions from a uniform distribution over the entire conﬁguration space. Despite its simplic-

ity, RS can be competitive and outperform alternative algorithms in high-dimensional conﬁguration

spaces [3]. PyHopper’s HPO algorithm starts with an RS to gain information about the objective

surface and decide for the second phase which area of the conﬁguration space to focus on.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DRLPyHopper-HyperparameteroptimizationMathiasLechner1,2,,RaminHasani1,2,PhilippNeubauer2,3,SophieNeubauer2,3,DanielaRus11MassachusettsInstituteofTechnology(MIT)2SimpleAI3DatenVorspungGmbHCorrespondenceE-mail:mlechner@mit.eduHyperparametertuningisafundamentalaspectofmachinelearningresearch.Settingu...

展开>> 收起<<

DRL PyHopper - Hyperparameter optimization Mathias Lechner12 Ramin Hasani12 Philipp Neubauer23 Sophie Neubauer23 Daniela Rus1.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

DRL PyHopper - Hyperparameter optimization Mathias Lechner12 Ramin Hasani12 Philipp Neubauer23 Sophie Neubauer23 Daniela Rus1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: