A Bayesian Optimization Framework for Finding Local Optima in Expensive Multimodal Functions Yongsheng Meia Tian Lana Mahdi Imaniband Suresh Subramaniama

2025-04-24 0 0 6.78MB 10 页 10玖币
侵权投诉
A Bayesian Optimization Framework for Finding Local
Optima in Expensive Multimodal Functions
Yongsheng Meia;*, Tian Lana, Mahdi Imaniband Suresh Subramaniama
aThe George Washington University
bNortheastern University
Abstract. Bayesian optimization (BO) is a popular global opti-
mization scheme for sample-efficient optimization in domains with
expensive function evaluations. The existing BO techniques are ca-
pable of finding a single global optimum solution. However, find-
ing a set of global and local optimum solutions is crucial in a wide
range of real-world problems, as implementing some of the opti-
mal solutions might not be feasible due to various practical restric-
tions (e.g., resource limitation, physical constraints, etc.). In such
domains, if multiple solutions are known, the implementation can
be quickly switched to another solution, and the best possible sys-
tem performance can still be obtained. This paper develops a mul-
timodal BO framework to effectively find a set of local/global solu-
tions for expensive-to-evaluate multimodal objective functions. We
consider the standard BO setting with Gaussian process regression
representing the objective function. We analytically derive the joint
distribution of the objective function and its first-order derivatives.
This joint distribution is used in the body of the BO acquisition func-
tions to search for local optima during the optimization process. We
introduce variants of the well-known BO acquisition functions to the
multimodal setting and demonstrate the performance of the proposed
framework in locating a set of local optimum solutions using multi-
ple optimization problems.
1 Introduction
Bayesian optimization (BO) is a popular global optimization scheme
for sample-efficient optimization in domains with expensive function
evaluations [34, 40]. The BO iteratively builds a statistical model of
the objective function according to all the past evaluations and se-
quentially selects the next evaluation by maximizing an acquisition
function. BO has shown tremendous success in a wide range of do-
mains with no analytical formulation of objective functions, includ-
ing simulation optimizations [1], device tuning/calibration [7, 36],
material/drug design [43, 14], and many more.
Despite several variants of BO in recent years [3, 11, 38], the focus
of all methods has been on finding a single global optimum solution.
However, many real-world problems can be considered multimodal
optimization [8], where it is desired to find all or most global and lo-
cal optimum solutions of multimodal functions. The rationale is that
implementing some of the optimal solutions might not be feasible
due to various practical restrictions (e.g., resource limitation, phys-
ical constraints, etc.); in such a scenario, if multiple solutions are
Corresponding Author. Email: ysmei@gwu.edu.
known, the implementation can be quickly switched to another solu-
tion and the best possible system performance can still be obtained.
Besides BO techniques, several deterministic and stochastic tech-
niques have been developed for multi-model optimization. These in-
clude the gradient descent method, the quasi-Newton method [5, 22],
and the Nelder-Mead’s simplex method [4], which require the ana-
lytical form of the multimodal function and tend to be trapped into
a local optimum. Evolutionary optimization is a class of techniques
applicable to domains with no available analytical representations.
Examples include variants of genetic algorithms [23, 24], clonal se-
lection algorithms [9], and artificial immune networks [10]. How-
ever, evolutionary techniques’ reliance on heuristics and excessive
function evaluations prevent their reliable applications in domains
with expensive-to-evaluate functions.
This paper develops a multimodal BO framework to effectively
find a set of local/global solutions for multimodal objective func-
tions. We consider the standard Gaussian process (GP) regression
as the surrogate model for representing the objective function. Since
characterizing local/global optima requires accessing both the val-
ues and first-order conditions of the objective function, we analyt-
ically derive the joint distribution of the objective function and its
first-order gradients using the kernel function properties and deriva-
tives, illustrated by Fig. 1. This joint distribution is used in the body
of the BO acquisition functions to search for local optima during
the optimization process. We introduce the variants of the well-
known BO acquisition functions to the multimodal setting, includ-
ing joint expected improvement and probability of improvement.
The performance of the proposed framework in locating a set of
local optimum solutions is demonstrated using multiple optimiza-
tion problems, including evaluations using synthetic function, well-
known multimodal benchmarks such as Griewank function and Shu-
bert function, as well as through hyperparameter tuning problems
for image classification on CIFAR-10 dataset [18]. Our proposed so-
lution can effectively capture multiple local/global optima in these
evaluations.
The main contributions of our work are as follows:
We develop a new BO framework for finding local optima by an-
alytically deriving the joint distribution of the objective function
and its first-order gradients according to the kernel function of
Gaussian process regression.
We introduce new acquisition functions, such as joint expected
improvement and probability of improvement, to search local op-
tima during the optimization process of our framework.
Experimental results on multimodal functions and real-world im-
arXiv:2210.06635v2 [math.OC] 6 Aug 2023
age classification problem demonstrates the effectiveness of our
framework in capturing multiple local and global optima.
2 Related Work
2.1 Bayesian Optimization and Acquisition Functions
Among many optimization frameworks [37, 32, 41], Bayesian opti-
mization has emerged as a popular method for the sample-efficient
optimization of expensive objective functions [35, 13]. It is capable
of optimizing objective functions with no available closed-form ex-
pression, where only point-based (and possibly noisy) evaluation of
the function is possible. The sampling-efficiency of BO schemes has
made them suitable for domains with non-convex and costly func-
tion evaluations. Fundamentally, BO is a sequential model-based ap-
proach to solving the problem of finding a global optimum of an
unknown objective function f(·):x= arg maxx∈X f(x), where
X Rdis a compact set. It performs a sequential search, and at each
iteration k, selects a new location xk+1 to evaluate fand observe its
value, until making a final recommendation of the best estimate of
the optimum x.
The sequential selection is achieved through the acquisition func-
tion a:X R, defined over the posterior of GP model,
where BO selects a sample in the search space with the highest ac-
quisition value. Upon evaluating the objective function at the se-
lected input, the surrogate GP model gets updated. Various BO
policies have been developed depending on the acquisition func-
tion choices, including expected improvement (EI) [27], knowledge
gradient (KG) [12], probability of improvement (PI) [19], upper-
confidence bounds (UCB) [20], and entropy search (ES) [15]. Recent
studies have also considered accounting for future improvements in
solution quality [44, 17, 39] and BO methods for constrained prob-
lems [2, 30]. However, existing work often focuses on finding the
global optimum rather than a set of local/global optima, which are
also necessary for multimodal objective functions and will be ex-
plored in this paper.
2.2 Local Maxima in Gaussian Random Fields
A separate line of statistical applications concentrates on the tail
distribution of the heights of local maxima in non-stationary Gaus-
sian random fields, such as in peak detection problems. Such a tail
distribution is defined as the probability that the height of the lo-
cal maximum surpasses a given threshold at the point x, condi-
tioned on the case that the point is a local maximum of the Gaus-
sian process model, given by the following equation: Pr(f(x>
ξ)|f(x)is one local maximum), where ξis the threshold. Existing
work [6] has probed into this problem in the Gaussian process model.
The general formulae are derived in [6] for non-stationary Gaussian
fields and a subset of Euclidean space or Riemannian manifold of
arbitrary dimension [21], which depends on local properties of the
Gaussian process model rather than the global supremum of the field.
Although the goal is to characterize certain local properties rather
than finding a set of local/global optima, the contributions of the
above work provide the key intuitions to our new BO framework
design.
2.3 First-Order Derivative in Bayesian Optimization
The first-order derivative information has been exploited in exist-
ing works of BO, such as [25, 45]. Besides, [26] make gradient in-
formation available for hyperparameter tuning. Additionally, adjoint
methods provide gradients cheaply in the optimization of engineer-
ing systems [16, 31]. The gradients provide useful information about
the objective function and can help the BO during the optimization
process. For instance, [40] develop a derivative-enabled knowledge-
gradient algorithm by incorporating derivative information into GP
for BO, given by (f(xk+1),f(xk+1))T|f(x1:k),f(x1:k)
N(f(x1:k),f(x1:k))T,diag(σ2), where σ2is the variance.
These methods assume the gradients of the objective function can
be queried along with the objective function during the optimization
process. GIBO[28] alternates between minimizing the variance of the
estimate of the gradient and moving in the direction of the expected
gradient. Later proposed MPD [29] extended and refined it by find-
ing the maximum look-ahead gradient descent direction. However,
we propose the BO framework to find local optima by computing
and updating the joint distribution of the prediction with its first-order
derivative regarding kernel functions. Besides, existing methods with
gradients concentrate on using gradients as additional information to
improve the traditional BO model targeting global optimum, while
our algorithm aims to find as many local optima as possible. Despite
the similarity, we do not directly access the objective’s gradient.
3 Background
Gaussian model: Gaussian process (GP) model is the most com-
monly used model for standard BO and also adopted by our frame-
work, providing the posterior distribution of the objective function
according to all the past evaluations. In this part, we introduce sev-
eral preliminaries of the GP model. Considering the objective func-
tion f(·)and the GP model with k+ 1 input samples xof dimension
n, the prior of the model is:
f(x1:k+1) N (µx1:k+1 ,Σx1:k+1,x1:k+1 ),
where we use µx1:k+1 to denote the mean of the prior and
Σx1:k+1,x1:k+1 to represent the initial covariance of k+1 input sam-
ples. If we know the first ksamples’ values as observations f(x1:k),
based on the prior, the posterior of the GP model representing the
objective function at the next sampling point xk+1 can be obtained
as:
pf
def
=f(xk+1)|f(x1:k) N (µxk+1 ,Σxk+1,xk+1 ),(1)
where µxk+1 and Σxk+1,xk+1 are the mean and variance respec-
tively at this step. For simplicity purpose, we use pfas the short
notation of the posterior f(xk+1)|f(x1:k).
4 Finding Local Optima via Bayesian Optimization
We aim to develop a multimodal BO framework capable of efficiently
computing optimal solutions. Our method will achieve optimization
regarding a joint distribution containing the Gaussian posterior and
its first-order derivative: the Gaussian posterior reflects the surrogate
model as we observe the objectives in new places, and the first-order
derivative of the posterior informs us about the latent location of lo-
cal/global optima. Therefore, to this end, we compute the first-order
derivative of the objective function to get the gradient and apply it
to our new BO framework. Since the derivative operation is linear,
we can combine the Gaussian posterior of the objective and its first-
order derivative into a joint distribution and then compute the joint
mean/variance for the acquisition functions. In this framework, we
design new acquisition functions to fulfill the needs of finding local
optima. Inspired by existing work [6], we first introduce a particular
摘要:

ABayesianOptimizationFrameworkforFindingLocalOptimainExpensiveMultimodalFunctionsYongshengMeia;*,TianLana,MahdiImanibandSureshSubramaniamaaTheGeorgeWashingtonUniversitybNortheasternUniversityAbstract.Bayesianoptimization(BO)isapopularglobalopti-mizationschemeforsample-efficientoptimizationindomainsw...

展开>> 收起<<
A Bayesian Optimization Framework for Finding Local Optima in Expensive Multimodal Functions Yongsheng Meia Tian Lana Mahdi Imaniband Suresh Subramaniama.pdf

共10页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:10 页 大小:6.78MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 10
客服
关注