While being discrete, the optimization problem
(1.1)
is challenging when the objective function
𝑓
is expensive to evaluate due to computational, monetary, or opportunity costs. For instance,
querying
𝑓
may involve numerically solving a system of partial differential equations, placing a
sensor at a new location, or time-consuming human labor. In such cases, solving
(1.1)
by exhaustive
search over
ℳ𝑁
is unfeasible for large
𝑁,
and it is important to design optimization algorithms that
provably require fewer evaluations of the objective than the size
𝑁
of the point cloud. Solving
(1.1)
is
also challenging in applications where the objective function does not satisfy structural assumptions
(e.g. concavity or linearity) other than a sufficient degree of smoothness, and in applications where
𝑓
is a black-box in that one has only access to noisy output from
𝑓
rather than to an analytic
expression of this function. We refer to [22] for a survey of problems where these conditions arise.
Motivated by these geometric and computational challenges, we introduce an approach to solve
(1.1)
that works directly on the point cloud
ℳ𝑁
and necessitates few evaluations of the objective.
In particular, we show that in the large
𝑁
limit and under suitable smoothness assumptions, our
method provably requires far fewer evaluations of the objective than the size
𝑁
of the point cloud.
Our algorithm falls in the general framework of Bayesian optimization and is specifically designed to
achieve such a convergence guarantee. The main focus will be on the mathematical analysis of the
proposed approach, but we also present simulation studies to illustrate and complement our theory.
1.1 Overview of our Approach
The problem features that gradients are not available and evaluation of the objective is expensive
naturally lead us to adopt a Bayesian optimization (BO) approach to solve
(1.1)
. BO is an iterative
procedure that relies on solving a sequence of surrogate optimization problems to sidestep the need
of gradient information on
𝑓.
At each iteration, the surrogate problem is to optimize an acquisition
function defined using a probabilistic model of the objective function conditioned to previous iterates.
The acquisition function should be inexpensive to evaluate and optimize, and at the same time
provide useful information about where the optimizer of
𝑓
is most likely to lie. The probabilistic
model should be sufficiently rich to adequately represent the objective function. Many choices of
acquisition function have been proposed in the literature, including expected improvement, entropy
search, and knowledge gradient (see [
22
] for a review). Popular probabilistic models for
𝑓
include
Gaussian processes [
54
,
30
] and Bayesian additive regression trees [
13
]. Adequately choosing the
acquisition function and the probabilistic model is essential to the success of BO algorithms.
The BO method that we propose and analyze has the distinctive feature that both the probabilistic
model and the acquisition function are carefully chosen to ensure convergence of the returned solution
to a global maximizer of
𝑓
under suitable smoothness assumptions. A natural way to characterize
the smoothness of
𝑓
is to assume it is a sample path from a Gaussian process (GP) defined on
ℳ
.
Under this smoothness assumption, we adopt a graph GP model [
44
,
8
] for
𝑓|ℳ𝑁
, the restriction of
𝑓
to the point cloud. The graph GP is designed to be a discretely indexed GP that approximates
a Matérn or squared exponential GP on the hidden manifold
ℳ
as the size of the point cloud
grows to infinity. Applications of graph GPs in Bayesian inverse problems, spatial statistics, and
semi-supervised learning are discussed in [
44
,
28
,
31
,
32
]. In this paper, we extend the convergence
analysis for Matérn graph GP models in [
44
,
45
,
27
,
24
] to also cover squared exponential kernels,
see Proposition 2.3.
Such error analysis is important since it allows us to quantify the misspecification error when
modeling
𝑓|ℳ𝑁
with a graph GP. In particular, the model that we use for computation does not
2