Design Amortization for Bayesian Optimal Experimental Design

2025-04-22 0 0 1.04MB 14 页 10玖币
侵权投诉
Design Amortization for Bayesian Optimal Experimental Design
Noble Kennamer, 1Steven Walton, 2Alexander Ihler, 1
1Department of Computer Science, University of California Irvine
2Department of Computer Science, University of Oregon
nkenname@uci.edu
Abstract
Bayesian optimal experimental design is a sub-field of statis-
tics focused on developing methods to make efficient use
of experimental resources. Any potential design is evalu-
ated in terms of a utility function, such as the (theoreti-
cally well-justified) expected information gain (EIG); unfor-
tunately however, under most circumstances the EIG is in-
tractable to evaluate. In this work we build off of successful
variational approaches, which optimize a parameterized vari-
ational model with respect to bounds on the EIG. Past work
focused on learning a new variational model from scratch for
each new design considered. Here we present a novel neu-
ral architecture that allows experimenters to optimize a single
variational model that can estimate the EIG for potentially in-
finitely many designs. To further improve computational ef-
ficiency, we also propose to train the variational model on a
significantly cheaper-to-evaluate lower bound, and show em-
pirically that the resulting model provides an excellent guide
for more accurate, but expensive to evaluate bounds on the
EIG. We demonstrate the effectiveness of our technique on
generalized linear models, a class of statistical models that
is widely used in the analysis of controlled experiments. Ex-
periments show that our method is able to greatly improve
accuracy over existing approximation strategies, and achieve
these results with far better sample efficiency.
1 Introduction
Conducting experiments is often a resource-intensive en-
deavour, motivating experimenters to design their exper-
iments to be maximally informative given the resources
available. Optimal experimental design (OED) aims to ad-
dress this challenge by developing approaches to define a
utility, U(d), of a possible designs, d, and algorithms for
evaluating and optimizing this utility over all feasible de-
signs D. OED has been used widely across science and en-
gineering, including systems biology (Liepe et al. 2013),
geostatistics (Diggle and Lophaven 2006), manufacturing
(Antony 2001) and more (Goos and Jones 2011).
In this work we focus on evaluating the expected infor-
mation gain (EIG), a commonly used utility function in
Bayesian optimal experiment design (BOED) (Chaloner and
Verdinelli 1995; Ryan et al. 2016). We specify our model,
composed of a likelihood and prior p(y|θ, d)p(θ)for design
d, possible experimental outcomes yand latent variables θ.
The EIG is then defined to be:
EIG(d) = Ep(y|d)[H[p(θ)] H[p(θ|y, d)]] (1)
where H[·]is the entropy function. The experimenter then
seeks the solution to argmaxd∈D EIG(d), where Dis the
set of all feasible designs. The EIG has sound theoretical
justifications, proven to be optimal in certain settings (Se-
bastiani and Wynn 2000; Bernardo and Smith 2009).
While powerful, this framework is limited by the diffi-
culty in evaluating the EIG due to the intractability of the
posterior distribution p(θ|y, d). Foster et al. (2019) proposed
four variational bounds for efficiently approximating the
EIG. The method involves defining a variational distribu-
tion, qφ(·)that will approximate either the posterior distri-
bution p(θ|y, d)or the marginal likelihood p(y|d). The pa-
rameters φof this variational distribution are optimized ac-
cording to the proposed bounds. In principle their method
allows for estimating the EIG for arbitrarily complex models
using flexible variational models, however their implemen-
tations focused on simpler variational forms that required
fitting a new variational model for every possible design. In
this work we focus on design amortization by proposing a
novel deep learning architecture based on conditional nor-
malizing flows (NF) (Papamakarios et al. 2021; Kobyzev,
Prince, and Brubaker 2020) and set invariant models (Za-
heer et al. 2017) to define a flexible variational distribution
qφ(·|d)that only needs to be trained once, but can then ac-
curately estimate the EIG for potentially infinitely many de-
signs. Our experiments will show how design amortization
can dramatically improve computational efficiency and how
our more flexible variational form can make much more ac-
curate approximations to the EIG over competing methods.
We provide our code here1.
2 Background
Scientists consistently face the challenge of having to con-
duct experiments under limited resources, and must design
their experiments to use these resources as efficiently as pos-
sible. BOED provides a conceptually clear framework for
doing so. We assume we are given a model with design vari-
ables d, experimental outcomes yand latent parameters θ
about which we wish to learn. We have prior information
1redacted for anonymity during review.
arXiv:2210.03283v2 [cs.LG] 20 Oct 2022
on the latent variables encoded in a prior distribution, p(θ),
and a likelihood that predicts experimental outcomes from a
design and latent variables, p(y|θ, d). Via Bayes rule, these
two functions combine to give us the posterior distribution
p(θ|y, d)p(y|θ, d)p(θ)representing our state of knowl-
edge about the latent variables after conducting an experi-
ment with design dand observing outcomes y. For example
the design variables, d, could represent the environmental
conditions and chemical concentrations of a medium used
to culture a strain of bacteria, which produces an important
chemical compound. This design problem becomes more
complex with increasing dimension of d, for example, if we
have Spetri dishes to work on (often called the experimental
units or subjects). The experimental outcomes, y, would rep-
resent the amount of the chemical compound yielded from
growing the culture in each of the conditions of d, and the
latent variables θrepresent parameters that define how the
design variables dmediate the yield of the chemical com-
pound y. After conducting the experiment and observing y,
we can quantify our information gain (IG) as:
IG(y, d) = H[p(θ)] H[p(θ|y, d)] (2)
However, this gain cannot be evaluated before conducting
the experiment, as it requires knowing the outcomes y. How-
ever, taking the expectation of the information gain with re-
spect to the outcomes, p(y|d), gives the EIG:
EIG(d) = Ep(θ,y|d)log p(θ|y, d)
p(θ)
=Ep(θ,y|d)log p(y|θ, d)
p(y|d)(3)
Nested Monte Carlo: Typically p(θ|y, d)and p(y|d)are
intractable, making the EIG challenging to compute. One
common approach to approximating EIG is to use a nested
Monte Carlo (NMC) estimator (Myung, Cavagnaro, and Pitt
2013; Vincent and Rainforth 2017; Rainforth et al. 2018):
ˆµNMC =1
N
N
X
n=1
log p(yn|θn,0, d)
1
MPM
m=1 p(yn|θn,m, d)
where θn,m p(θ)and ynp(y|θn,0, d)
(4)
Rainforth et al. (2018) showed that NMC is a consistent es-
timator converging as N, M → ∞. They also showed that it
is asymptotically optimal to set MN, resulting in the
overall convergence rate of O(T1
3), where Tis the total
number of samples drawn (i.e. T=NM for NMC). How-
ever, this is much slower than the O(T1
2)rate of standard
Monte Carlo estimators (Robert and Casella 1999), in which
the total number of samples is simply T=N.
The slow convergence of the NMC estimator can be lim-
iting in practical applications of BOED. The inefficiencies
can be traced to requiring an independent estimate of the
marginal likelihood, p(yn|d), for each yn(the denominator
of Eq. (4)). Inspired by this, Foster et al. (2019) proposed
employing techniques from variational inference by defin-
ing a functional approximation to either p(θ|y, d)or p(y|d),
and allowing these estimators to amortize across the samples
of ynfor more efficient estimation of the EIG. In this work
we focus on two of the four estimators they proposed: the
posterior estimator and variational nested Monte Carlo.
Posterior Estimator: The posterior estimator is an appli-
cation of the Barber-Agakov bound to BOED, which was
originally proposed for estimating the mutual information in
noisy communication channels (Barber and Agakov 2003).
It requires defining a variational approximation qφ(θ|y, d)to
the posterior distribution, giving a lower bound to the EIG:
EIG(d)≥ Lpost(d),Ep(θ,y|d)log qφ(θ|y, d)
p(θ)
1
N
N
X
n=1
log qφ(θn|yn, d)
p(θn)
where yn, θnp(y, θ|d).
(5)
By maximizing this bound with respect to the variational
parameters φ, we can learn a variational form that can ef-
ficiently estimate the EIG. A Monte Carlo estimate of this
bound converges with rate O(T1
2), and if the true poste-
rior distribution is within the class of functions defined by
the variational form qφ, the bound can be made tight (de-
pendent on the optimization) (Foster et al. 2019).
Variational Nested Monte Carlo: The second bound we
discuss is variational nested Monte Carlo (VNMC). It is
closely related to NMC, but differs by applying a variational
approximation qφ(θ|y, d)as an importance sampler to esti-
mate the marginal likelihood term in NMC:
EIG(d)
UV N MC (d, M),E
log p(y|θ0, d)
1
MPM
m=1
p(ym|d)
qφ(θm|y,d)
(6)
where the expectation is taken with respect to y, θ0:M
p(y, θ0|d)QM
m=1 qφ(θm|y, d).
By minimizing this upper bound with respect to the vari-
ational parameters φ, we can learn an importance distribu-
tion that allows for much more efficient computation of the
EIG. Note that if qφ(θ|y, d)exactly equals the posterior dis-
tribution, the bound is tight and requires only a single nested
sample (M= 1). Even if the variational form does not equal
the posterior, the bound remains consistent as M→ ∞. Fi-
nally, it is worth noting that by taking qφ(θ|y, d) = p(θ), the
estimator simply reduces to NMC.
It was further shown by Foster et al. (2020) that VNMC
can be easily made into a lower bound by including θ0(the
sample from the prior) when estimating the marginal likeli-
hood, a method we denote as contrastive VNMC (CVNMC):
EIG(d)
LCoV N M C (d, M),E
log p(y|θ0, d)
1
M+1 PM
m=0
p(ym|d)
qφ(θm|y,d)
(7)
where the expectation is taken with respect to y, θ0:M
p(y, θ0|d)QM
m=1 qφ(θm|y, d). We can also employ this
same technique to regular NMC to estimate both lower and
upper bounds.
Note that the upper bound (6) and lower bound (7) are
particularly useful when evaluating the performance of our
method in settings where ground truth is not available. In
these cases we can examine the bound pairs produced by
NMC and by VNMC to assess which set more tightly con-
strains the true value.
Practical considerations. In this work, we apply the same
flexible mathematical framework proposed in Foster et al.
(2019). However, Foster et al. (2019) adopted a “classical”
variational distribution setting, in which the variational form
qφis selected to take a standard, parametric form. They
found this approach effective, but tested only on very simple
design problems, with only one experimental unit at a time.
Their variational models only incorporate the design implic-
itly, requiring a separate optimization for every design to be
considered2. Unfortunately, as we show in the experiments
this approach is not effective on more complex design prob-
lems. Instead, we propose a far more flexible, deep learning
based distributional form that incorporates the design explic-
itly, allowing us to amortize training across and apply our
trained model to evaluation of all (potentially continuously
many) designs in our feasible set.
3 Method
We are interested in learning a parameterized function,
qφ(θ|y, d), for approximating the posterior distribution. We
now describe our proposed deep learning architecture for
amortizing over designs, allowing practitioners to train a sin-
gle model that is capable of evaluating the EIG for poten-
tially infinitely many designs. We also discuss how we can
efficiently train this model using the (simpler and cheaper)
equation (5), then use the resulting approximation in the
more accurate bounds provided by VNMC, (6)–(7). This ad-
vances the work in Foster et al. (2019) by providing a highly
flexible variational form that can be used in a wide variety
of contexts and an inexpensive procedure to train it.
Neural Architecture
Figure 1 shows a high level representation of our architec-
ture. Broadly, it consists of two major components. The first
is a learnable function for taking in the design variables d
and simulated experimental outcomes from the model, y,
and producing a design context vector, cy,d, that will be used
to define a conditional distribution. We focus on the com-
mon case where the experimental units lack any meaning-
ful order and our learnable function must therefore be per-
mutation invariant. We can incorporate this inductive bias
into our model by making our function follow the general
form of set functions proposed in Zaheer et al. (2017). In
the sequel, we denote this component as our set invariant
2Although subsequent work (Foster et al. 2020) considered
evolving both the design and distribution qsimultaneously, even
that work remains focused on a single (if evolving) design.
Set Encoder
Set Emitter
Aggregator
Design_Context
y,_1
d_1
y_2,
d_2
y_n,
d_n
Posterior Samples
Conditional Base
Distribution
Invertible Transform
1
Invertible Transform
M
Set
Invariant
Model
Conditional
Flow
Figure 1: A high-level schematic of our architecture for
amortizing over designs. The first component (left) takes
in the design variables and simulated observations and pro-
duces a design context, cy,d. In many experiments the indi-
vidual units being experimented on are exchangeable, thus
we use a set invariant architecture. The second (right) is a
conditional normalizing flow, conditioned on the design con-
text produced by the first component. Together, they define
our variational posteriors qφ(θ|y, d), amortized over designs.
model. The second major component is a learnable distribu-
tion conditioned on the design context produced by the set
invariant model. In this work we use conditional normalizing
flows, which consist of a base distribution and sequence of
invertible transformations with tractable Jacobian determi-
nants to maintain proper accounting of the probability den-
sity through the transformations. Both the base distribution
and transformations are learnable and conditioned on the de-
sign context.
Set Invariant Model. It is often the case that the individ-
ual units being experimented on do not posses an inherent
ordering – for example, subjects in a randomized controlled
clinical trial, or the petri dishes in our previous example.
Suppose we would like to find the optimally informative de-
sign for an experiment with Sexperimental units, where di
and yidenote the design variables and simulated outcomes
of unit i, respectively. In this setting we want our design
context to be invariant to permutations in its inputs, e.g., re-
ordering the individuals in the trial should not change our
results. Learning permutation invariant functions is an ac-
tive area of research (e.g., Bloem-Reddy and Teh 2020). In
this work we follow the general form proposed by Zaheer
et al. (2017), where our set invariant model is defined as,
cy,d =EMITφEMIT "S
X
i=0
ENCφEN C (yi, di)#.(8)
In particular, we define two learnable functions. The set en-
coder ENCφENC (yi, di)takes as input the design variables
and simulated outcomes for each individual experimen-
tal unit. Its output is an intermediary representation for
each experimental unit, which are aggregated together by
summation; the permutation invariance of the sum ensures
invariance of the overall function. The aggregated repre-
sentation is then passed through the set emitter function
摘要:

DesignAmortizationforBayesianOptimalExperimentalDesignNobleKennamer,1StevenWalton,2AlexanderIhler,11DepartmentofComputerScience,UniversityofCaliforniaIrvine2DepartmentofComputerScience,UniversityofOregonnkenname@uci.eduAbstractBayesianoptimalexperimentaldesignisasub-eldofstatis-ticsfocusedondevelop...

展开>> 收起<<
Design Amortization for Bayesian Optimal Experimental Design.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.04MB 格式:PDF 时间:2025-04-22

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注