
PyHopper - Hyperparameter optimization
and how they compare with each other. In the second part, we describe common HPO packages for
Python and highlight their differences from PyHopper.
2.1. Hyperparameter tuning algorithms
Grid Search is arguably the most basic HPO. As its name suggests, Grid search spans a grid over the
parameter space and evaluates every intersection point of the grid. The best configuration of the grid
points is then returned as the best parameter. More advanced variations, such as iterative grid search,
refine the grid resolution locally around the best-found parameter to explore the configuration space
in more detail. The main advantage of Grid search is that it explores all parts of the configuration
space. Thus it does not get easily trapped in local optima. However, the major bottleneck of Grid
search is that its complexity scales exponentially with the dimension of the configuration space, e.g.,
for eight hyperparameters, a grid with five ticks results in almost 400,000 intersection points that
need to be evaluated. Consequently, Grid search is only suitable for low dimensional configuration
spaces, i.e., typically 2 or 3 hyperparameters.
Sequential Model-Based Optimization (SMBO) [7] is a powerful black-box optimization paradigm.
The key idea of SMBO is to fit a surrogate model to the already evaluated points to interpolate be-
tween unexplored parts of the configuration space. The surrogate model has a special structure that
allows finding global optimums easily, e.g., analytically. These global optimums of the fitted surro-
gate model are then evaluated using the true objective function, and the observed objective values
are used to update the surrogate model.
Bayesian Optimization (BO) extends SMBO by fitting distributions instead of deterministic func-
tions to the evaluated points of the configuration space. The key benefit of BO over SMBO is that
they allow modeling the uncertainty about the interpolated parts of the configuration space, i.e., the
uncertainty increases the further away a point is from an already evaluated candidate configura-
tion. Consequently, we can sample points in the configuration space that will maximize the gained
information about the optimum of the black-box objective function.
The main advantage of SMBO and BO is that they become more and more accurate at finding op-
timal parameters the more information about the objective landscape becomes available. Moreover,
BO allows tailoring the algorithm for specific applications by selecting the type of surrogate model
used, i.e., adding prior knowledge about the optimization landscape. However, this comes with the
downside that SMBO and BO are less effective when little information (i.e., evaluated parameters) is
available. Nonetheless, Bayesian optimization is often used in competition-winning toolkits due to
its ability to add prior knowledge about the optimization landscape and adjust the algorithm to the
particular type of the competition’s optimization problem. Gaussian processes (GPs) are a form of
Bayesian optimization where multivariate normal distributions realize the surrogate model [2].
Tree-structured Parzen Estimator (TPE) [2] is a sequential model-based optimization algorithm that
can handle conditional configuration spaces efficiently. An example of such a conditional config-
uration would be the number of layers and corresponding hidden units in each layer of a neural
network. Particularly, the number of layers in the fifth layer is only needed if the number of layers
exceeds 4.
Random Search (RS) is another straightforward black-box optimization baseline. RS samples candi-
date solutions from a uniform distribution over the entire configuration space. Despite its simplic-
ity, RS can be competitive and outperform alternative algorithms in high-dimensional configuration
spaces [3]. PyHopper’s HPO algorithm starts with an RS to gain information about the objective
surface and decide for the second phase which area of the configuration space to focus on.
4