New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization Leonardo D. Gonz alez and Victor M. Zavala

2025-05-02 0 0 7.67MB 36 页 10玖币
侵权投诉
New Paradigms for Exploiting Parallel Experiments in
Bayesian Optimization
Leonardo D. Gonz´
alez and Victor M. Zavala*
Department of Chemical and Biological Engineering
University of Wisconsin-Madison, 1415 Engineering Dr, Madison, WI 53706, USA
Abstract
Bayesian optimization (BO) is one of the most effective methods for closed-loop experimen-
tal design and black-box optimization. However, a key limitation of BO is that it is an inher-
ently sequential algorithm (one experiment is proposed per round) and thus cannot directly
exploit high-throughput (parallel) experiments. Diverse modifications to the BO framework
have been proposed in the literature to enable exploitation of parallel experiments but such
approaches are limited in the degree of parallelization that they can achieve and can lead to
redundant experiments (thus wasting resources and potentially compromising performance).
In this work, we present new parallel BO paradigms that exploit the structure of the system
to partition the design space. Specifically, we propose an approach that partitions the design
space by following the level sets of the performance function and an approach that exploits par-
tially separable structures of the performance function found. We conduct extensive numerical
experiments using a reactor case study to benchmark the effectiveness of these approaches
against a variety of state-of-the-art parallel algorithms reported in the literature. Our compu-
tational results show that our approaches significantly reduce the required search time and
increase the probability of finding a global (rather than local) solution.
Keywords: Bayesian optimization, high-throughput experiments, parallelization.
1 Introduction
The use of high-throughput experimental (HTE) platforms is accelerating scientific discovery in
diverse fields such as catalysis [18], pharmaceuticals [16], synthetic biology [25], and chemical
engineering [21]. Such platforms permit large numbers of experiments to be executed in paral-
lel, sometimes automatically; this enables the exploration of wider design spaces, reduces time to
discovery, and can potentially decrease the use of resources. However, due to the large number
of design variables involved, determining optimal conditions manually is often infeasible. As a
*Corresponding Author: victor.zavala@wisc.edu.
1
arXiv:2210.01071v3 [stat.ML] 9 Dec 2022
http://zavalab.engr.wisc.edu
result, HTE platforms rely on the use of design of experiments (DoE) algorithms, which aim to
systematically explore the design space.
Screening is a simple DoE approach in which experiments are performed at points on a dis-
cretized grid of the design space [23]; this approach is intuitive but does not scale well with the
number of design variables and can ultimately lead to significant waste of resources (conduct
experiments that do not provide significant information). The central aim of advanced DoE ap-
proaches is to maximize the value provided by each experiment and ultimately reduce the number
of experiments and resources used (e.g., experiment time). The value of an experiment is usually
measured either by information content (e.g., reduces model uncertainty) or if it results in a desir-
able outcome (e.g., improves an economic objective) [2]. A widely used DoE approach that aims
to tackle this problem is response surface methodology or RSM [3]; this approach is generally
sample-efficient (requires few experiments) but uses second-degree polynomial surrogate models
that can fail to accurately capture system trends. In addition, parameters used in the RSM surro-
gate model are subject to uncertainty and this uncertainty is not resolved via further experiments
[12] (i.e., RSM is an open-loop DoE technique).
Another powerful approach to DoE that aims to maximize value of experiments is Bayesian
experimental design [5]. Recently, the machine learning (ML) community has been using variants
of this paradigm to conduct closed-loop experimental design [7]. One of the most effective varia-
tions of this paradigm is the Bayesian optimization (BO) algorithm [1]; BO has been shown to be
sample-efficient and scalable (requires minimal experiments and can explore large design spaces)
[28]. BO is widely used in applications such as experimental design, hyper-parameter tuning, and
reinforcement learning. Of particular interest is the flexibility of the BO paradigm as it is capable
of accommodating both continuous and discrete (e.g., categorical) design variables as well as con-
straints (which help encode domain knowledge and restrict the design space) [4]. Additionally,
BO uses probabilistic surrogate models (e.g. Gaussian process models) which greatly facilitates
the quantification of uncertainty and information in different regions of the design space [9]; this
feature is particularly useful in guiding experiments where information gain can be as important
as performance. BO can also be tuned to emphasize exploration (by sampling regions with high
uncertainty) over exploitation (by sampling from regions with high economic performance) [17];
this trade-off is achieved by tuning the so-called acquisition function (AF), which is a composite
function that captures uncertainty and performance.
A fundamental caveat of BO is that it is inherently a sequential algorithm (samples a single
point in the design space at each iteration), limiting its ability to exploit HTE platforms. Modi-
fications to the BO algorithm have been proposed in the literature to overcome these limitations
[10,6,15]. Relevant variants include Hyperspace partitioning [33], batch Bayesian optimization
[31], NxMCMC [26], and AF optimization over a set of exploratory designs [11]. These parallel
BO approaches have been shown to perform better than sequential BO in terms of search time
2
http://zavalab.engr.wisc.edu
[34]; however, these approaches are limited in the degree of parallelization that can be achieved
and can lead to redundant experiments, thus wasting resources and potentially getting trapped in
local solutions.
In this work, we propose a set of new parallel BO paradigms that exploit the structure of
the system in order to guide partitioning of the design space (see Figure 1). Our first approach,
which we call level-set partitioning, decomposes the design space by following the level sets of the
performance function. Because the performance function cannot be evaluated (it is unknown), a
key feature of this approach is that it leverages a reference function (which can be a low-fidelity
model or a physics model) to approximate the level sets and guide the partitioning. Our second
approach, called variable partitioning, partitions the design space by exploiting partially separable
structures that typically result when a system is composed of multiple subsystems (e.g., a chemical
process is composed of multiple units). We benchmark the performance of our approaches over
sequential BO and state-of-the-art parallel BO variants from the literature using a reactor system.
Our results show that the proposed approaches can achieve significant reductions in search time;
in addition, we observe improvements in performance values found and in search robustness
(sensitivity to initial guess).
Figure 1: Schematic of proposed BO parallelization paradigms using level set partitioning (left)
and variable partitioning (right).
3
http://zavalab.engr.wisc.edu
2 Sequential Bayesian Optimization
2.1 Standard BO (S-BO)
The aim of closed-loop DoE is to identify experimental inputs that achieve optimal performance;
we cast this as the optimization problem:
min
xf(x)(1a)
s.t. xX(1b)
where f:XRis a scalar performance function,xXRdis a given experiment (trial) point;
and Xis the experimental design space (of dimension d). Generally, an explicit relationship be-
tween the design variables xand performance function fis not known a priori and motivates the
need to evaluate the performance at a selected set of trial points to build a surrogate model of the
performance function that can be used to optimize the system.
The open-loop DoE approach most commonly used in HTE platforms is screening; this is a
grid-search method that discretizes the design space Xinto a set of experiments xkX, k ∈ K :=
{1, ..., K}(we denote this set compactly as xK); the performance of the system is then evaluated
(potentially in parallel) at these points to obtain fKand the experiments that achieve the best per-
formance are selected. This screening approach provides good exploratory capabilities, but it is
not scalable in the sense that the number of trial experiments needed to cover the design space
grows exponentially with the number of design variables dand with the width of the space X.
Moreover, this approach cannot be guaranteed to find an optimal solution.
To derive our closed-loop DoE approach based on S-BO, we assume that we have a set of
experimental data D`={x`
K, f`
K}at the initial iteration `= 1. We use this data to develop a proba-
bilistic surrogate model ˆ
f`:XR(a Gaussian process - GP) that approximates the performance
function fover the design space X. The GP generates this approximation by first constructing a
prior over f(x1:n)of the form f(x1:n)∼ N(m(x),K(x, x0)). Here m(x)Rdis the prior mean func-
tion and is commonly set equal to 0. The prior covariance matrix, K(x, x0)Rd×d, is constructed
using a kernel function, h(x, x0), such that Ki,j =h(xi, xj). There exists a large selection of kernel
functions (e.g, rotational quadratic, squared exponential, M´
atern), and determining which to use
is largely dependent on identifying a choice for h(x, x0)that will fit the generated data well. The
GP then uses these elements to construct a predictive posterior distribution at a new experimen-
tal point xthat generates an estimate ˆ
f`(x)of the value of the performance function at x. This
estimate can be shown to be Gaussian random variable ˆ
f`(x)∼ N(µ`
f(x), σ`
f(x)) with mean and
variance:
µ`
f(x) := K(x, x`
K)K(x`
K, x`
K)1f`
K, x X(2a)
σ`
f(x) := K(x, x)K(x, x`
K)TK(x`
K, x`
K)1K(x`
K, x), x X. (2b)
4
http://zavalab.engr.wisc.edu
The mean and variance of the prediction capture the expected performance and its uncertainty/information;
these measures are used to construct an acquisition function (AF), which is used to guide the se-
lection of the next experiment. Here, we use an AF of the form:
AF `
f(x;κ) = µ`
f(x)κ·σ`
f(x), x X(3)
where κR+is a hyperparameter that prioritizes exploration (information) over exploitation
(performance). The next experiment, x`+1 X, is obtained by solving the AF optimization prob-
lem:
x`+1 argmin
x
AF `
f(x;κ)(4a)
s.t. xX(4b)
Once the experiment x`+1 has been computed, the system performance is evaluated at this point
(by running the experiment) to obtain the new data point {x`+1, f(x`+1)}; this point is added
to the data collection D`+1 ← D`∪ {x`+1, f(x`+1)}. The GP model is re-trained using this new
data collection to obtain ˆ
f`+1(x)∼ N(µ`+1
f(x), σ`+1
f(x)) and the acquisition function is updated
to AF `+1
fand minimized to obtain a new design x`+2. This process is repeated over multiple
cycles/iterations `= 1,2, ..., L until a satisfactory performance is obtained (or no additional im-
provement in performance is observed). We highlight that the standard BO algorithm (which we
refer to as S-BO) does not have a natural stopping criterion and can get stuck in a local solution
(as opposed to finding a global solution). The pseudocode for S-BO can be found in Algorithm 1.
Algorithm 1: Standard Bayesian Optimization (S-BO)
Given κ,L, and D`;
Train GP ˆ
f`using initial dataset D`and obtain AF `
f;
for `= 1,2, ..., L do
Compute experiment x`+1 argminxAF `
f(x;κ)s.t. xX;
Evaluate performance at x`+1 to obtain f`+1;
Update dataset D`+1 ← D`x`+1, f`+1;
Train GP using D`+1 to obtain ˆ
f`+1 and AF `+1
f;
end
2.2 Reference-Based BO (Ref-BO)
Recent work has shown that allowing the BO algorithm to exploit available preexisting informa-
tion can improve its performance. One approach found in [19], for example, exploits the fact that
the performance function is usually a known composite function that can be expressed as f(y(x))
where a set of intermediate variables, y(x), are the unknowns. As a result, the performance func-
tion can be optimized using derivative-based methods, allowing one to set targets for the various
5
摘要:

NewParadigmsforExploitingParallelExperimentsinBayesianOptimizationLeonardoD.Gonz´alezandVictorM.Zavala*DepartmentofChemicalandBiologicalEngineeringUniversityofWisconsin-Madison,1415EngineeringDr,Madison,WI53706,USAAbstractBayesianoptimization(BO)isoneofthemosteffectivemethodsforclosed-loopexperimen-...

展开>> 收起<<
New Paradigms for Exploiting Parallel Experiments in Bayesian Optimization Leonardo D. Gonz alez and Victor M. Zavala.pdf

共36页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:36 页 大小:7.67MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 36
客服
关注