A Bayesian Optimization Framework for Finding Local Optima in Expensive Multimodal Functions Yongsheng Meia Tian Lana Mahdi Imaniband Suresh Subramaniama

2025-04-24 0 0 6.78MB 10 页 10玖币

侵权投诉

A Bayesian Optimization Framework for Finding Local

Optima in Expensive Multimodal Functions

Yongsheng Meia;*, Tian Lana, Mahdi Imaniband Suresh Subramaniama

aThe George Washington University

bNortheastern University

Abstract. Bayesian optimization (BO) is a popular global opti-

mization scheme for sample-efﬁcient optimization in domains with

expensive function evaluations. The existing BO techniques are ca-

pable of ﬁnding a single global optimum solution. However, ﬁnd-

ing a set of global and local optimum solutions is crucial in a wide

range of real-world problems, as implementing some of the opti-

mal solutions might not be feasible due to various practical restric-

tions (e.g., resource limitation, physical constraints, etc.). In such

domains, if multiple solutions are known, the implementation can

be quickly switched to another solution, and the best possible sys-

tem performance can still be obtained. This paper develops a mul-

timodal BO framework to effectively ﬁnd a set of local/global solu-

tions for expensive-to-evaluate multimodal objective functions. We

consider the standard BO setting with Gaussian process regression

representing the objective function. We analytically derive the joint

distribution of the objective function and its ﬁrst-order derivatives.

This joint distribution is used in the body of the BO acquisition func-

tions to search for local optima during the optimization process. We

introduce variants of the well-known BO acquisition functions to the

multimodal setting and demonstrate the performance of the proposed

framework in locating a set of local optimum solutions using multi-

ple optimization problems.

1 Introduction

Bayesian optimization (BO) is a popular global optimization scheme

for sample-efﬁcient optimization in domains with expensive function

evaluations [34, 40]. The BO iteratively builds a statistical model of

the objective function according to all the past evaluations and se-

quentially selects the next evaluation by maximizing an acquisition

function. BO has shown tremendous success in a wide range of do-

mains with no analytical formulation of objective functions, includ-

ing simulation optimizations [1], device tuning/calibration [7, 36],

material/drug design [43, 14], and many more.

Despite several variants of BO in recent years [3, 11, 38], the focus

of all methods has been on ﬁnding a single global optimum solution.

However, many real-world problems can be considered multimodal

optimization [8], where it is desired to ﬁnd all or most global and lo-

cal optimum solutions of multimodal functions. The rationale is that

implementing some of the optimal solutions might not be feasible

due to various practical restrictions (e.g., resource limitation, phys-

ical constraints, etc.); in such a scenario, if multiple solutions are

∗Corresponding Author. Email: ysmei@gwu.edu.

known, the implementation can be quickly switched to another solu-

tion and the best possible system performance can still be obtained.

Besides BO techniques, several deterministic and stochastic tech-

niques have been developed for multi-model optimization. These in-

clude the gradient descent method, the quasi-Newton method [5, 22],

and the Nelder-Mead’s simplex method [4], which require the ana-

lytical form of the multimodal function and tend to be trapped into

a local optimum. Evolutionary optimization is a class of techniques

applicable to domains with no available analytical representations.

Examples include variants of genetic algorithms [23, 24], clonal se-

lection algorithms [9], and artiﬁcial immune networks [10]. How-

ever, evolutionary techniques’ reliance on heuristics and excessive

function evaluations prevent their reliable applications in domains

with expensive-to-evaluate functions.

This paper develops a multimodal BO framework to effectively

ﬁnd a set of local/global solutions for multimodal objective func-

tions. We consider the standard Gaussian process (GP) regression

as the surrogate model for representing the objective function. Since

characterizing local/global optima requires accessing both the val-

ues and ﬁrst-order conditions of the objective function, we analyt-

ically derive the joint distribution of the objective function and its

ﬁrst-order gradients using the kernel function properties and deriva-

tives, illustrated by Fig. 1. This joint distribution is used in the body

of the BO acquisition functions to search for local optima during

the optimization process. We introduce the variants of the well-

known BO acquisition functions to the multimodal setting, includ-

ing joint expected improvement and probability of improvement.

The performance of the proposed framework in locating a set of

local optimum solutions is demonstrated using multiple optimiza-

tion problems, including evaluations using synthetic function, well-

known multimodal benchmarks such as Griewank function and Shu-

bert function, as well as through hyperparameter tuning problems

for image classiﬁcation on CIFAR-10 dataset [18]. Our proposed so-

lution can effectively capture multiple local/global optima in these

evaluations.

The main contributions of our work are as follows:

•We develop a new BO framework for ﬁnding local optima by an-

alytically deriving the joint distribution of the objective function

and its ﬁrst-order gradients according to the kernel function of

Gaussian process regression.

•We introduce new acquisition functions, such as joint expected

improvement and probability of improvement, to search local op-

tima during the optimization process of our framework.

•Experimental results on multimodal functions and real-world im-

arXiv:2210.06635v2 [math.OC] 6 Aug 2023

age classiﬁcation problem demonstrates the effectiveness of our

framework in capturing multiple local and global optima.

2 Related Work

2.1 Bayesian Optimization and Acquisition Functions

Among many optimization frameworks [37, 32, 41], Bayesian opti-

mization has emerged as a popular method for the sample-efﬁcient

optimization of expensive objective functions [35, 13]. It is capable

of optimizing objective functions with no available closed-form ex-

pression, where only point-based (and possibly noisy) evaluation of

the function is possible. The sampling-efﬁciency of BO schemes has

made them suitable for domains with non-convex and costly func-

tion evaluations. Fundamentally, BO is a sequential model-based ap-

proach to solving the problem of ﬁnding a global optimum of an

unknown objective function f(·):x∗= arg maxx∈X f(x), where

X ⊂ Rdis a compact set. It performs a sequential search, and at each

iteration k, selects a new location xk+1 to evaluate fand observe its

value, until making a ﬁnal recommendation of the best estimate of

the optimum x∗.

The sequential selection is achieved through the acquisition func-

tion a:X → R, deﬁned over the posterior of GP model,

where BO selects a sample in the search space with the highest ac-

quisition value. Upon evaluating the objective function at the se-

lected input, the surrogate GP model gets updated. Various BO

policies have been developed depending on the acquisition func-

tion choices, including expected improvement (EI) [27], knowledge

gradient (KG) [12], probability of improvement (PI) [19], upper-

conﬁdence bounds (UCB) [20], and entropy search (ES) [15]. Recent

studies have also considered accounting for future improvements in

solution quality [44, 17, 39] and BO methods for constrained prob-

lems [2, 30]. However, existing work often focuses on ﬁnding the

global optimum rather than a set of local/global optima, which are

also necessary for multimodal objective functions and will be ex-

plored in this paper.

2.2 Local Maxima in Gaussian Random Fields

A separate line of statistical applications concentrates on the tail

distribution of the heights of local maxima in non-stationary Gaus-

sian random ﬁelds, such as in peak detection problems. Such a tail

distribution is deﬁned as the probability that the height of the lo-

cal maximum surpasses a given threshold at the point x, condi-

tioned on the case that the point is a local maximum of the Gaus-

sian process model, given by the following equation: Pr(f(x>

ξ)|f(x)is one local maximum), where ξis the threshold. Existing

work [6] has probed into this problem in the Gaussian process model.

The general formulae are derived in [6] for non-stationary Gaussian

ﬁelds and a subset of Euclidean space or Riemannian manifold of

arbitrary dimension [21], which depends on local properties of the

Gaussian process model rather than the global supremum of the ﬁeld.

Although the goal is to characterize certain local properties rather

than ﬁnding a set of local/global optima, the contributions of the

above work provide the key intuitions to our new BO framework

design.

2.3 First-Order Derivative in Bayesian Optimization

The ﬁrst-order derivative information has been exploited in exist-

ing works of BO, such as [25, 45]. Besides, [26] make gradient in-

formation available for hyperparameter tuning. Additionally, adjoint

methods provide gradients cheaply in the optimization of engineer-

ing systems [16, 31]. The gradients provide useful information about

the objective function and can help the BO during the optimization

process. For instance, [40] develop a derivative-enabled knowledge-

gradient algorithm by incorporating derivative information into GP

for BO, given by (f(xk+1),∇f(xk+1))T|f(x1:k),∇f(x1:k)∼

N(f(x1:k),∇f(x1:k))T,diag(σ2), where σ2is the variance.

These methods assume the gradients of the objective function can

be queried along with the objective function during the optimization

process. GIBO[28] alternates between minimizing the variance of the

estimate of the gradient and moving in the direction of the expected

gradient. Later proposed MPD [29] extended and reﬁned it by ﬁnd-

ing the maximum look-ahead gradient descent direction. However,

we propose the BO framework to ﬁnd local optima by computing

and updating the joint distribution of the prediction with its ﬁrst-order

derivative regarding kernel functions. Besides, existing methods with

gradients concentrate on using gradients as additional information to

improve the traditional BO model targeting global optimum, while

our algorithm aims to ﬁnd as many local optima as possible. Despite

the similarity, we do not directly access the objective’s gradient.

3 Background

Gaussian model: Gaussian process (GP) model is the most com-

monly used model for standard BO and also adopted by our frame-

work, providing the posterior distribution of the objective function

according to all the past evaluations. In this part, we introduce sev-

eral preliminaries of the GP model. Considering the objective func-

tion f(·)and the GP model with k+ 1 input samples xof dimension

n, the prior of the model is:

f(x1:k+1)∼ N (µx1:k+1 ,Σx1:k+1,x1:k+1 ),

where we use µx1:k+1 to denote the mean of the prior and

Σx1:k+1,x1:k+1 to represent the initial covariance of k+1 input sam-

ples. If we know the ﬁrst ksamples’ values as observations f(x1:k),

based on the prior, the posterior of the GP model representing the

objective function at the next sampling point xk+1 can be obtained

as:

def

=f(xk+1)|f(x1:k)∼ N (µxk+1 ,Σxk+1,xk+1 ),(1)

where µxk+1 and Σxk+1,xk+1 are the mean and variance respec-

tively at this step. For simplicity purpose, we use pfas the short

notation of the posterior f(xk+1)|f(x1:k).

4 Finding Local Optima via Bayesian Optimization

We aim to develop a multimodal BO framework capable of efﬁciently

computing optimal solutions. Our method will achieve optimization

regarding a joint distribution containing the Gaussian posterior and

its ﬁrst-order derivative: the Gaussian posterior reﬂects the surrogate

model as we observe the objectives in new places, and the ﬁrst-order

derivative of the posterior informs us about the latent location of lo-

cal/global optima. Therefore, to this end, we compute the ﬁrst-order

derivative of the objective function to get the gradient and apply it

to our new BO framework. Since the derivative operation is linear,

we can combine the Gaussian posterior of the objective and its ﬁrst-

order derivative into a joint distribution and then compute the joint

mean/variance for the acquisition functions. In this framework, we

design new acquisition functions to fulﬁll the needs of ﬁnding local

optima. Inspired by existing work [6], we ﬁrst introduce a particular

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ABayesianOptimizationFrameworkforFindingLocalOptimainExpensiveMultimodalFunctionsYongshengMeia;*,TianLana,MahdiImanibandSureshSubramaniamaaTheGeorgeWashingtonUniversitybNortheasternUniversityAbstract.Bayesianoptimization(BO)isapopularglobalopti-mizationschemeforsample-efficientoptimizationindomainsw...

展开>> 收起<<

A Bayesian Optimization Framework for Finding Local Optima in Expensive Multimodal Functions Yongsheng Meia Tian Lana Mahdi Imaniband Suresh Subramaniama.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Bayesian Optimization Framework for Finding Local Optima in Expensive Multimodal Functions Yongsheng Meia Tian Lana Mahdi Imaniband Suresh Subramaniama

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: