New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization Leonardo D. Gonz alez and Victor M. Zavala

2025-05-02 0 0 7.67MB 36 页 10玖币

侵权投诉

New Paradigms for Exploiting Parallel Experiments in

Bayesian Optimization

Leonardo D. Gonz´

alez and Victor M. Zavala*

Department of Chemical and Biological Engineering

University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA

Abstract

Bayesian optimization (BO) is one of the most effective methods for closed-loop experimen-

tal design and black-box optimization. However, a key limitation of BO is that it is an inher-

ently sequential algorithm (one experiment is proposed per round) and thus cannot directly

exploit high-throughput (parallel) experiments. Diverse modiﬁcations to the BO framework

have been proposed in the literature to enable exploitation of parallel experiments but such

approaches are limited in the degree of parallelization that they can achieve and can lead to

redundant experiments (thus wasting resources and potentially compromising performance).

In this work, we present new parallel BO paradigms that exploit the structure of the system

to partition the design space. Speciﬁcally, we propose an approach that partitions the design

space by following the level sets of the performance function and an approach that exploits par-

tially separable structures of the performance function found. We conduct extensive numerical

experiments using a reactor case study to benchmark the effectiveness of these approaches

against a variety of state-of-the-art parallel algorithms reported in the literature. Our compu-

tational results show that our approaches signiﬁcantly reduce the required search time and

increase the probability of ﬁnding a global (rather than local) solution.

Keywords: Bayesian optimization, high-throughput experiments, parallelization.

1 Introduction

The use of high-throughput experimental (HTE) platforms is accelerating scientiﬁc discovery in

diverse ﬁelds such as catalysis [18], pharmaceuticals [16], synthetic biology [25], and chemical

engineering [21]. Such platforms permit large numbers of experiments to be executed in paral-

lel, sometimes automatically; this enables the exploration of wider design spaces, reduces time to

discovery, and can potentially decrease the use of resources. However, due to the large number

of design variables involved, determining optimal conditions manually is often infeasible. As a

*Corresponding Author: victor.zavala@wisc.edu.

arXiv:2210.01071v3 [stat.ML] 9 Dec 2022

http://zavalab.engr.wisc.edu

result, HTE platforms rely on the use of design of experiments (DoE) algorithms, which aim to

systematically explore the design space.

Screening is a simple DoE approach in which experiments are performed at points on a dis-

cretized grid of the design space [23]; this approach is intuitive but does not scale well with the

number of design variables and can ultimately lead to signiﬁcant waste of resources (conduct

experiments that do not provide signiﬁcant information). The central aim of advanced DoE ap-

proaches is to maximize the value provided by each experiment and ultimately reduce the number

of experiments and resources used (e.g., experiment time). The value of an experiment is usually

measured either by information content (e.g., reduces model uncertainty) or if it results in a desir-

able outcome (e.g., improves an economic objective) [2]. A widely used DoE approach that aims

to tackle this problem is response surface methodology or RSM [3]; this approach is generally

sample-efﬁcient (requires few experiments) but uses second-degree polynomial surrogate models

that can fail to accurately capture system trends. In addition, parameters used in the RSM surro-

gate model are subject to uncertainty and this uncertainty is not resolved via further experiments

[12] (i.e., RSM is an open-loop DoE technique).

Another powerful approach to DoE that aims to maximize value of experiments is Bayesian

experimental design [5]. Recently, the machine learning (ML) community has been using variants

of this paradigm to conduct closed-loop experimental design [7]. One of the most effective varia-

tions of this paradigm is the Bayesian optimization (BO) algorithm [1]; BO has been shown to be

sample-efﬁcient and scalable (requires minimal experiments and can explore large design spaces)

[28]. BO is widely used in applications such as experimental design, hyper-parameter tuning, and

reinforcement learning. Of particular interest is the ﬂexibility of the BO paradigm as it is capable

of accommodating both continuous and discrete (e.g., categorical) design variables as well as con-

straints (which help encode domain knowledge and restrict the design space) [4]. Additionally,

BO uses probabilistic surrogate models (e.g. Gaussian process models) which greatly facilitates

the quantiﬁcation of uncertainty and information in different regions of the design space [9]; this

feature is particularly useful in guiding experiments where information gain can be as important

as performance. BO can also be tuned to emphasize exploration (by sampling regions with high

uncertainty) over exploitation (by sampling from regions with high economic performance) [17];

this trade-off is achieved by tuning the so-called acquisition function (AF), which is a composite

function that captures uncertainty and performance.

A fundamental caveat of BO is that it is inherently a sequential algorithm (samples a single

point in the design space at each iteration), limiting its ability to exploit HTE platforms. Modi-

ﬁcations to the BO algorithm have been proposed in the literature to overcome these limitations

[10,6,15]. Relevant variants include Hyperspace partitioning [33], batch Bayesian optimization

[31], NxMCMC [26], and AF optimization over a set of exploratory designs [11]. These parallel

BO approaches have been shown to perform better than sequential BO in terms of search time

http://zavalab.engr.wisc.edu

[34]; however, these approaches are limited in the degree of parallelization that can be achieved

and can lead to redundant experiments, thus wasting resources and potentially getting trapped in

local solutions.

In this work, we propose a set of new parallel BO paradigms that exploit the structure of

the system in order to guide partitioning of the design space (see Figure 1). Our ﬁrst approach,

which we call level-set partitioning, decomposes the design space by following the level sets of the

performance function. Because the performance function cannot be evaluated (it is unknown), a

key feature of this approach is that it leverages a reference function (which can be a low-ﬁdelity

model or a physics model) to approximate the level sets and guide the partitioning. Our second

approach, called variable partitioning, partitions the design space by exploiting partially separable

structures that typically result when a system is composed of multiple subsystems (e.g., a chemical

process is composed of multiple units). We benchmark the performance of our approaches over

sequential BO and state-of-the-art parallel BO variants from the literature using a reactor system.

Our results show that the proposed approaches can achieve signiﬁcant reductions in search time;

in addition, we observe improvements in performance values found and in search robustness

(sensitivity to initial guess).

Figure 1: Schematic of proposed BO parallelization paradigms using level set partitioning (left)

and variable partitioning (right).

http://zavalab.engr.wisc.edu

2 Sequential Bayesian Optimization

2.1 Standard BO (S-BO)

The aim of closed-loop DoE is to identify experimental inputs that achieve optimal performance;

we cast this as the optimization problem:

min

xf(x)(1a)

s.t. x∈X(1b)

where f:X→Ris a scalar performance function,x∈X⊆Rdis a given experiment (trial) point;

and Xis the experimental design space (of dimension d). Generally, an explicit relationship be-

tween the design variables xand performance function fis not known a priori and motivates the

need to evaluate the performance at a selected set of trial points to build a surrogate model of the

performance function that can be used to optimize the system.

The open-loop DoE approach most commonly used in HTE platforms is screening; this is a

grid-search method that discretizes the design space Xinto a set of experiments xk∈X, k ∈ K :=

{1, ..., K}(we denote this set compactly as xK); the performance of the system is then evaluated

(potentially in parallel) at these points to obtain fKand the experiments that achieve the best per-

formance are selected. This screening approach provides good exploratory capabilities, but it is

not scalable in the sense that the number of trial experiments needed to cover the design space

grows exponentially with the number of design variables dand with the width of the space X.

Moreover, this approach cannot be guaranteed to ﬁnd an optimal solution.

To derive our closed-loop DoE approach based on S-BO, we assume that we have a set of

experimental data D`={x`

K, f`

K}at the initial iteration `= 1. We use this data to develop a proba-

bilistic surrogate model ˆ

f`:X→R(a Gaussian process - GP) that approximates the performance

function fover the design space X. The GP generates this approximation by ﬁrst constructing a

prior over f(x1:n)of the form f(x1:n)∼ N(m(x),K(x, x0)). Here m(x)∈Rdis the prior mean func-

tion and is commonly set equal to 0. The prior covariance matrix, K(x, x0)∈Rd×d, is constructed

using a kernel function, h(x, x0), such that Ki,j =h(xi, xj). There exists a large selection of kernel

functions (e.g, rotational quadratic, squared exponential, M´

atern), and determining which to use

is largely dependent on identifying a choice for h(x, x0)that will ﬁt the generated data well. The

GP then uses these elements to construct a predictive posterior distribution at a new experimen-

tal point xthat generates an estimate ˆ

f`(x)of the value of the performance function at x. This

estimate can be shown to be Gaussian random variable ˆ

f`(x)∼ N(µ`

f(x), σ`

f(x)) with mean and

variance:

µ`

f(x) := K(x, x`

K)K(x`

K, x`

K)−1f`

K, x ∈X(2a)

σ`

f(x) := K(x, x)−K(x, x`

K)TK(x`

K, x`

K)−1K(x`

K, x), x ∈X. (2b)

http://zavalab.engr.wisc.edu

The mean and variance of the prediction capture the expected performance and its uncertainty/information;

these measures are used to construct an acquisition function (AF), which is used to guide the se-

lection of the next experiment. Here, we use an AF of the form:

AF `

f(x;κ) = µ`

f(x)−κ·σ`

f(x), x ∈X(3)

where κ∈R+is a hyperparameter that prioritizes exploration (information) over exploitation

(performance). The next experiment, x`+1 ∈X, is obtained by solving the AF optimization prob-

lem:

x`+1 ←argmin

AF `

f(x;κ)(4a)

s.t. x∈X(4b)

Once the experiment x`+1 has been computed, the system performance is evaluated at this point

(by running the experiment) to obtain the new data point {x`+1, f(x`+1)}; this point is added

to the data collection D`+1 ← D`∪ {x`+1, f(x`+1)}. The GP model is re-trained using this new

data collection to obtain ˆ

f`+1(x)∼ N(µ`+1

f(x), σ`+1

f(x)) and the acquisition function is updated

to AF `+1

fand minimized to obtain a new design x`+2. This process is repeated over multiple

cycles/iterations `= 1,2, ..., L until a satisfactory performance is obtained (or no additional im-

provement in performance is observed). We highlight that the standard BO algorithm (which we

refer to as S-BO) does not have a natural stopping criterion and can get stuck in a local solution

(as opposed to ﬁnding a global solution). The pseudocode for S-BO can be found in Algorithm 1.

Algorithm 1: Standard Bayesian Optimization (S-BO)

Given κ,L, and D`;

Train GP ˆ

f`using initial dataset D`and obtain AF `

for `= 1,2, ..., L do

Compute experiment x`+1 ←argminxAF `

f(x;κ)s.t. x∈X;

Evaluate performance at x`+1 to obtain f`+1;

Update dataset D`+1 ← D`∪x`+1, f`+1;

Train GP using D`+1 to obtain ˆ

f`+1 and AF `+1

end

2.2 Reference-Based BO (Ref-BO)

Recent work has shown that allowing the BO algorithm to exploit available preexisting informa-

tion can improve its performance. One approach found in [19], for example, exploits the fact that

the performance function is usually a known composite function that can be expressed as f(y(x))

where a set of intermediate variables, y(x), are the unknowns. As a result, the performance func-

tion can be optimized using derivative-based methods, allowing one to set targets for the various

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NewParadigmsforExploitingParallelExperimentsinBayesianOptimizationLeonardoD.Gonz´alezandVictorM.Zavala*DepartmentofChemicalandBiologicalEngineeringUniversityofWisconsin-Madison,1415EngineeringDr,Madison,WI53706,USAAbstractBayesianoptimization(BO)isoneofthemosteffectivemethodsforclosed-loopexperimen-...

展开>> 收起<<

New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization Leonardo D. Gonz alez and Victor M. Zavala.pdf

共36页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization Leonardo D. Gonz alez and Victor M. Zavala

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: