Optimization on Manifolds via Graph Gaussian Processes Hwanwoo Kim1 Daniel Sanz-Alonso1 and Ruiyi Yang2 1University of Chicago2Princeton University

2025-04-29 0 0 2.36MB 35 页 10玖币

侵权投诉

Optimization on Manifolds via Graph Gaussian Processes

Hwanwoo Kim1, Daniel Sanz-Alonso1, and Ruiyi Yang2

1University of Chicago, 2Princeton University

Abstract

This paper integrates manifold learning techniques within a Gaussian process upper conﬁdence

bound algorithm to optimize an objective function on a manifold. Our approach is motivated by

applications where a full representation of the manifold is not available and querying the objective

is expensive. We rely on a point cloud of manifold samples to deﬁne a graph Gaussian process

surrogate model for the objective. Query points are sequentially chosen using the posterior

distribution of the surrogate model given all previous queries. We establish regret bounds in

terms of the number of queries and the size of the point cloud. Several numerical examples

complement the theory and illustrate the performance of our method.

1 Introduction

Optimization problems on manifolds are ubiquitous in science and engineering. For instance, low-

rank matrix completion and rotational alignment of 3D bodies can be formulated as optimization

problems over spaces of matrices that are naturally endowed with manifold structures. These matrix

manifolds belong to agreeable families [

] for which Riemannian gradients, geodesics, and other

geometric quantities have closed-form expressions that facilitate the use of Riemannian optimization

algorithms [

]. In contrast, this paper is motivated by optimization problems where the search

space is a manifold that the practitioner can only access through a discrete point cloud representation,

preventing direct use of Riemannian optimization algorithms. Moreover, the hidden manifold may

not belong to an agreeable family, further hindering the use of classical methods. Illustrative

examples where manifolds are represented by point cloud data include computer vision, robotics,

and shape analysis of geometric morphometrics [

]. Additionally, across many applications

in data science, high-dimensional point cloud data contains low-dimensional structure that can be

modeled as a manifold for algorithmic design and theoretical analysis [

]. Motivated by these

problems, this paper introduces a Bayesian optimization method with convergence guarantees to

optimize an expensive-to-evaluate function on a point cloud of manifold samples.

To formalize our setting, consider the optimization problem

maximize 𝑓(𝑥), 𝑥 ∈ ℳ𝑁,(1.1)

where

ℳ𝑁

{𝑥𝑖}𝑁

𝑖=1

is a collection of samples from a compact manifold

ℳ ⊂ R𝑑

. We assume that

the manifold

ℳ

is unknown to the practitioner, but that they have access to the samples

ℳ𝑁.

The

objective function

𝑓

(1.1)

is deﬁned on the hidden manifold

ℳ

; however, since

ℳ

is unknown,

we restrict the search domain to the given point cloud

ℳ𝑁.

Motivating examples include locating

the portion of busiest traﬃc along a highway (idealized as a one-dimensional manifold), or ﬁnding

the point of highest temperature on an artiﬁcial surface for material design. In these and other

applications, the search domains are manifolds for which only a discrete representation may be

available. As a result, Riemannian optimization methods [

] that require Riemannian

gradients or geodesics are not directly applicable.

arXiv:2210.10962v3 [stat.ML] 9 Nov 2023

While being discrete, the optimization problem

(1.1)

is challenging when the objective function

𝑓

is expensive to evaluate due to computational, monetary, or opportunity costs. For instance,

querying

𝑓

may involve numerically solving a system of partial diﬀerential equations, placing a

sensor at a new location, or time-consuming human labor. In such cases, solving

(1.1)

by exhaustive

search over

ℳ𝑁

is unfeasible for large

𝑁,

and it is important to design optimization algorithms that

provably require fewer evaluations of the objective than the size

𝑁

of the point cloud. Solving

(1.1)

also challenging in applications where the objective function does not satisfy structural assumptions

(e.g. concavity or linearity) other than a suﬃcient degree of smoothness, and in applications where

𝑓

is a black-box in that one has only access to noisy output from

𝑓

rather than to an analytic

expression of this function. We refer to [22] for a survey of problems where these conditions arise.

Motivated by these geometric and computational challenges, we introduce an approach to solve

(1.1)

that works directly on the point cloud

ℳ𝑁

and necessitates few evaluations of the objective.

In particular, we show that in the large

𝑁

limit and under suitable smoothness assumptions, our

method provably requires far fewer evaluations of the objective than the size

𝑁

of the point cloud.

Our algorithm falls in the general framework of Bayesian optimization and is speciﬁcally designed to

achieve such a convergence guarantee. The main focus will be on the mathematical analysis of the

proposed approach, but we also present simulation studies to illustrate and complement our theory.

1.1 Overview of our Approach

The problem features that gradients are not available and evaluation of the objective is expensive

naturally lead us to adopt a Bayesian optimization (BO) approach to solve

(1.1)

. BO is an iterative

procedure that relies on solving a sequence of surrogate optimization problems to sidestep the need

of gradient information on

𝑓.

At each iteration, the surrogate problem is to optimize an acquisition

function deﬁned using a probabilistic model of the objective function conditioned to previous iterates.

The acquisition function should be inexpensive to evaluate and optimize, and at the same time

provide useful information about where the optimizer of

𝑓

is most likely to lie. The probabilistic

model should be suﬃciently rich to adequately represent the objective function. Many choices of

acquisition function have been proposed in the literature, including expected improvement, entropy

search, and knowledge gradient (see [

] for a review). Popular probabilistic models for

𝑓

include

Gaussian processes [

] and Bayesian additive regression trees [

]. Adequately choosing the

acquisition function and the probabilistic model is essential to the success of BO algorithms.

The BO method that we propose and analyze has the distinctive feature that both the probabilistic

model and the acquisition function are carefully chosen to ensure convergence of the returned solution

to a global maximizer of

𝑓

under suitable smoothness assumptions. A natural way to characterize

the smoothness of

𝑓

is to assume it is a sample path from a Gaussian process (GP) deﬁned on

ℳ

Under this smoothness assumption, we adopt a graph GP model [

] for

𝑓|ℳ𝑁

, the restriction of

𝑓

to the point cloud. The graph GP is designed to be a discretely indexed GP that approximates

a Matérn or squared exponential GP on the hidden manifold

ℳ

as the size of the point cloud

grows to inﬁnity. Applications of graph GPs in Bayesian inverse problems, spatial statistics, and

semi-supervised learning are discussed in [

]. In this paper, we extend the convergence

analysis for Matérn graph GP models in [

] to also cover squared exponential kernels,

see Proposition 2.3.

Such error analysis is important since it allows us to quantify the misspeciﬁcation error when

modeling

𝑓|ℳ𝑁

with a graph GP. In particular, the model that we use for computation does not

necessarily match the true distribution of

𝑓|ℳ𝑁

due to the missing information about

ℳ

; to obtain

convergence guarantees, this geometric misspeciﬁcation needs to be corrected by suitably choosing

the acquisition function. We accomplish this goal by applying the framework developed in [

]. In so

doing, we adapt their formulation to cover our problem setting, where

𝑓

is a sample path from a

GP instead of an element of a reproducing kernel Hilbert space.

1.2 Contributions and Related Work

Our careful choice of probabilistic model and acquisition function allows us to establish a bound on

the simple regret (see

(2.3)

for its deﬁnition) that converges to zero as the number

𝐿

of evaluations

of the objective and the size

𝑁

of the point cloud converge to inﬁnity while keeping the relation

𝐿≪𝑁

(see Theorem 2.5, Remark 2.6, and Corollary 2.7). In other words, our algorithm can

provably ﬁnd a global maximizer of

𝑓

as we acquire more samples from the compact manifold

ℳ

while still keeping the number of evaluations of the objective much smaller than the size of

the point cloud. We are not aware of an existing algorithm to solve

(1.1)

that enjoys a similar

convergence guarantee. Synthetic computed examples will complement the theory, illustrate the

applicability of our method, and showcase the importance of incorporating geometric information in

the probabilistic model.

As noted in [

], BO algorithms have been most popular in continuous Euclidean domains.

Methods that are tailored to manifold settings [

] and discrete spaces [

] have

received less attention. On the one hand, the search domain in our setting

(1.1)

is a discrete subset

of a manifold, but naive application of discrete BO (e.g. using a standard Euclidean GP on the

ambient space

R𝑑

) would fail to adequately exploit the geometric information contained in the point

cloud; in particular, it would fail to suitably encode smoothness of the probabilistic model for

𝑓

along the hidden manifold

ℳ.

The empirical advantage of our graph GPs over Euclidean kernels will

be illustrated in our numerical experiments (see Subsection 4.2). On the other hand, the manifold

in our setting is only available as a point cloud, which precludes the use of manifold BO approaches

[

] that require access to geodesic distances and eigenpairs of the Laplace-Beltrami operator on

ℳ

for modeling

𝑓

, and to Riemannian gradients for optimizing the acquisition function. Therefore,

our algorithm solves a practical problem for which limited tools with theoretical guarantees are

available. In the context of Riemannian optimization, our algorithm is still applicable when the

diﬀerential geometric quantities necessary for gradient-based methods are not readily available. A

closely related work in this direction is [

], which also assumes a point cloud representation of the

manifold but instead reconstructs from it tangent spaces, gradients, and retractions, followed by

an approximate Riemannian gradient descent. Our paper diﬀers from [

] in that our algorithm is

based on Bayesian optimization and no gradient approximation is carried out, as a result of which

we do not need to assume the point cloud to be quasi-uniform. Going beyond manifold constraints,

optimization of functions with low eﬀective dimensionality has been addressed in [

]

employing subspace methods (see also the references therein).

1.3 Outline

•

Section 2introduces the graph Gaussian process upper conﬁdence bound (GGP-UCB) algorithm

and describes the choice of surrogate model and acquisition function. Our main result, Theorem

2.5, establishes convergence rates.

•

Section 3discusses important practical considerations such as estimating the parameters of

the surrogate model and tuning the acquisition function.

•Section 4contains numerical examples that illustrate and complement the theory.

•Section 5closes with a summary of our paper and directions for further research.

•The proofs of our main results can be found in the appendices.

1.4 Notation

For

𝑎, 𝑏

two real numbers, we denote

𝑎∧𝑏

=min

{𝑎, 𝑏}

and

𝑎∨𝑏

=max

{𝑎, 𝑏}

. The symbol

≲

will

denote less than or equal to up to a universal constant. For two real sequences

{𝑎𝑖}

and

{𝑏𝑖}

, we

denote (i)

𝑎𝑖≪𝑏𝑖

lim𝑖

(

𝑎𝑖/𝑏𝑖

) = 0; (ii)

𝑎𝑖

𝑂

(

𝑏𝑖

)if

lim sup𝑖

(

𝑎𝑖/𝑏𝑖

)

≤𝐶

for some positive constant

𝐶; and (iii) 𝑎𝑖≍𝑏𝑖if 𝑐1≤lim inf𝑖(𝑎𝑖/𝑏𝑖)≤lim sup𝑖(𝑎𝑖/𝑏𝑖)≤𝑐2for some positive constants 𝑐1, 𝑐2.

2 The GGP-UCB Algorithm

In this section we introduce our algorithm and establish convergence guarantees. We start in

Subsection 2.1 by formalizing the problem setting. Subsection 2.2 describes the main GGP-UCB

algorithm. The choice of surrogate model and acquisition function are discussed in Subsections 2.3

and 2.4, respectively. Finally, Subsection 2.5 presents our main theoretical result, Theorem 2.5.

2.1 Problem Formulation

Let

𝑓

be a function deﬁned over a compact Riemannian submanifold

ℳ ⊂ R𝑑

of dimension

𝑚

Suppose that a full representation of

ℳ

is not available and we are only given the dimension

𝑚

and a point cloud of manifold samples

{𝑥𝑖}𝑁

𝑖=1

:ℳ𝑁⊂ ℳ

. We are interested in solving the

optimization problem

max

𝑥∈ℳ𝑁

𝑓(𝑥)(2.1)

in applications where the objective

𝑓

is expensive to evaluate and we may only collect

𝐿≪𝑁

noisy

measurements 𝑦ℓof the form

𝑦ℓ=𝑓(𝑧ℓ) + 𝜂ℓ, 𝜂ℓ𝑖.𝑖.𝑑.

∼ 𝒩(0, 𝜎2),1≤ℓ≤𝐿, (2.2)

where

{𝑧ℓ}𝐿

ℓ=1

are query points and

𝜎

is a given noise level. The goal is then to solve

(2.1)

with

𝐿≪𝑁queries of 𝑓.

Let

𝒵𝐿:

{𝑧ℓ}𝐿

ℓ=1 ⊂ ℳ𝑁

denote the query points sequentially found by our algorithm,

introduced in Subsection 2.2 below. We shall quantify the performance of our approach using the

simple regret, deﬁned as

𝑟𝑁,𝐿 :=𝑓(𝑧*

ℳ𝑁)−𝑓(𝑧*

𝐿), 𝑧*

ℳ𝑁= arg max

𝑧∈ℳ𝑁

𝑓(𝑧), 𝑧*

𝐿= arg max

𝑧∈𝒵𝐿

𝑓(𝑧).(2.3)

Note that the simple regret depends both on the number

𝐿

of queries and on the size

𝑁

of the

point cloud, since

𝑧*

ℳ𝑁

and

𝑧*

𝐿

both depend implicitly on

𝑁

. One should interpret

𝑁

as a large

ﬁxed number and

𝐿

as the running index. The dependence on

𝑁

of the query points

𝑧ℓ

’s will be

omitted for notational simplicity.

Remark 2.1. The optimizer

𝑧*

ℳ𝑁

over the point cloud

ℳ𝑁

is not necessarily the global optimizer of

𝑓

over

ℳ

. Since we only have access to

ℳ𝑁

, ﬁnding the maximizer over

ℳ𝑁

is the best we can

hope for without reconstructing or estimating the hidden manifold

ℳ

. Nevertheless, we will show

in Corollary 2.7 that the continuum regret, deﬁned as

𝑟cont

𝑁,𝐿 :=𝑓(𝑧*

ℳ)−𝑓(𝑧*

𝐿), 𝑧*

ℳ= arg max

𝑧∈ℳ

𝑓(𝑧), 𝑧*

𝐿= arg max

𝑧∈𝒵𝐿

𝑓(𝑧),(2.4)

also converges to zero as both

𝑁

and

𝐿

approach inﬁnity while keeping

𝐿≪𝑁

if the

𝑥𝑖

’s satisfy

Assumption 2.2. In other words, the maximizer

𝑧*

𝐿

returned by our algorithm is an approximate

global maximizer of 𝑓over ℳdespite the fact that 𝑧*

𝐿∈ ℳ𝑁.□

2.2 Main Algorithm

The Bayesian approach to optimization starts by constructing a GP model for the function to be

optimized. We recall that a GP with mean

𝜇

(

)and covariance

𝑐

(

·,·

)is a stochastic process where the

joint distribution over any ﬁnite set of indices

𝑠1, . . . , 𝑠𝑛

is a multivariate Gaussian with mean vector

[

𝜇

(

𝑠𝑖

)]

𝑛

𝑖=1

and covariance matrix [

𝑐

(

𝑠𝑖, 𝑠𝑗

)]

𝑛

𝑖,𝑗=1

[

]. The mean and covariance functions together

encode information about the values of the function, their correlation, and their uncertainty.

In our setting, we need to construct a GP surrogate prior model

𝜋𝑁

for

𝑓𝑁,

where

𝜋𝑁

would

simply be an

𝑁

-dimensional multivariate Gaussian. A natural requirement is that, for

𝑢𝑁∼𝜋𝑁,

𝑢𝑁

(

𝑥𝑖

)and

𝑢𝑁

(

𝑥𝑗

)should be highly correlated iﬀ

𝑥𝑖

and

𝑥𝑗

are close along the manifold, that is,

if the geodesic distance

𝑑ℳ

(

𝑥𝑖, 𝑥𝑗

)is small. We shall discuss in Subsection 2.3 prior models

𝜋𝑁

that fulﬁll this requirement. Deﬁning the covariance matrix of

𝜋𝑁

by using a standard covariance

function in the Euclidean space

R𝑑

would in general fail to meet this requirement, since two points

may be close in Euclidean space but far apart in terms of the geodesic distance 𝑑ℳin ℳ.

Once a choice of surrogate prior model is made, the next step is to sequentially ﬁnd query points

by maximizing an acquisition function [

]. Suppose we have picked query points

𝑧1, . . . , 𝑧ℓ−1

in the

ﬁrst ℓ−1iterations and obtained noisy measurements

𝑦𝑘=𝑓(𝑧𝑘) + 𝜂𝑘, 𝜂𝑘𝑖.𝑖.𝑑.

∼ 𝒩(0, 𝜎2),1≤𝑘≤ℓ−1.(2.5)

At the

ℓ

-th iteration, we will pick the next query point

𝑧ℓ

by maximizing an upper conﬁdence bound

acquisition function [48,5] of the form

𝐴𝑁,ℓ(𝑧) = 𝜇𝑁,ℓ−1(𝑧) + 𝐵𝑁,ℓ𝜎𝑁,ℓ−1(𝑧), 𝑧 ∈ ℳ𝑁,(2.6)

where

𝐵𝑁,ℓ

is a user-chosen parameter, and

𝜇𝑁,ℓ−1, 𝜎𝑁,ℓ−1

are the mean and standard deviation

of the posterior distribution

𝜋𝑁

(

·|𝑦1, . . . , 𝑦ℓ−1

)

Denoting by

𝑐𝑁

(

·,·

)the covariance function of the

surrogate prior

𝜋𝑁

, i.e.,

𝑐𝑁

(

𝑥𝑖, 𝑥𝑗

)is the covariance between

𝑢𝑁

(

𝑥𝑖

)and

𝑢𝑁

(

𝑥𝑗

)for

𝑢𝑁∼𝜋𝑁

, we

have the expressions

𝜇𝑁,ℓ−1(𝑧) = 𝑐𝑁,ℓ−1(𝑧)⊤(𝐶𝑁,ℓ−1+𝜎2𝐼)−1𝑌ℓ−1,

𝜎2

𝑁,ℓ−1(𝑧) = 𝑐𝑁(𝑧, 𝑧)−𝑐𝑁,ℓ−1(𝑧)⊤(𝐶𝑁,ℓ−1+𝜎2𝐼)−1𝑐𝑁,ℓ−1(𝑧),𝑧∈ ℳ𝑁,(2.7)

where

𝑌ℓ−1

= (

𝑦1, . . . , 𝑦ℓ−1

)

⊤∈Rℓ−1

𝑐𝑁,ℓ−1

(

𝑧

)

∈Rℓ−1

is a vector with entries

𝑐𝑁,ℓ−1

(

𝑧

)

𝑖

𝑐𝑁(𝑧, 𝑧𝑖), and 𝐶𝑁,ℓ−1∈Rℓ−1×ℓ−1is a matrix with entries (𝐶𝑁,ℓ−1)𝑖𝑗 =𝑐𝑁(𝑧𝑖, 𝑧𝑗).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OptimizationonManifoldsviaGraphGaussianProcessesHwanwooKim1,DanielSanz-Alonso1,andRuiyiYang21UniversityofChicago,2PrincetonUniversityAbstractThispaperintegratesmanifoldlearningtechniqueswithinaGaussianprocessupperconfidenceboundalgorithmtooptimizeanobjectivefunctiononamanifold.Ourapproachismotivated...

展开>> 收起<<

Optimization on Manifolds via Graph Gaussian Processes Hwanwoo Kim1 Daniel Sanz-Alonso1 and Ruiyi Yang2 1University of Chicago2Princeton University.pdf

共35页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Optimization on Manifolds via Graph Gaussian Processes Hwanwoo Kim1 Daniel Sanz-Alonso1 and Ruiyi Yang2 1University of Chicago2Princeton University

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: