Design Amortization for Bayesian Optimal Experimental Design

2025-04-22 0 0 1.04MB 14 页 10玖币

侵权投诉

Noble Kennamer, 1Steven Walton, 2Alexander Ihler, 1

1Department of Computer Science, University of California Irvine

2Department of Computer Science, University of Oregon

nkenname@uci.edu

Abstract

Bayesian optimal experimental design is a sub-ﬁeld of statis-

tics focused on developing methods to make efﬁcient use

of experimental resources. Any potential design is evalu-

ated in terms of a utility function, such as the (theoreti-

cally well-justiﬁed) expected information gain (EIG); unfor-

tunately however, under most circumstances the EIG is in-

tractable to evaluate. In this work we build off of successful

variational approaches, which optimize a parameterized vari-

ational model with respect to bounds on the EIG. Past work

focused on learning a new variational model from scratch for

each new design considered. Here we present a novel neu-

ral architecture that allows experimenters to optimize a single

variational model that can estimate the EIG for potentially in-

ﬁnitely many designs. To further improve computational ef-

ﬁciency, we also propose to train the variational model on a

signiﬁcantly cheaper-to-evaluate lower bound, and show em-

pirically that the resulting model provides an excellent guide

for more accurate, but expensive to evaluate bounds on the

EIG. We demonstrate the effectiveness of our technique on

generalized linear models, a class of statistical models that

is widely used in the analysis of controlled experiments. Ex-

periments show that our method is able to greatly improve

accuracy over existing approximation strategies, and achieve

these results with far better sample efﬁciency.

1 Introduction

Conducting experiments is often a resource-intensive en-

deavour, motivating experimenters to design their exper-

iments to be maximally informative given the resources

available. Optimal experimental design (OED) aims to ad-

dress this challenge by developing approaches to deﬁne a

utility, U(d), of a possible designs, d, and algorithms for

evaluating and optimizing this utility over all feasible de-

signs D. OED has been used widely across science and en-

gineering, including systems biology (Liepe et al. 2013),

geostatistics (Diggle and Lophaven 2006), manufacturing

(Antony 2001) and more (Goos and Jones 2011).

In this work we focus on evaluating the expected infor-

mation gain (EIG), a commonly used utility function in

Bayesian optimal experiment design (BOED) (Chaloner and

Verdinelli 1995; Ryan et al. 2016). We specify our model,

composed of a likelihood and prior p(y|θ, d)p(θ)for design

d, possible experimental outcomes yand latent variables θ.

The EIG is then deﬁned to be:

EIG(d) = Ep(y|d)[H[p(θ)] −H[p(θ|y, d)]] (1)

where H[·]is the entropy function. The experimenter then

seeks the solution to argmaxd∈D EIG(d), where Dis the

set of all feasible designs. The EIG has sound theoretical

justiﬁcations, proven to be optimal in certain settings (Se-

bastiani and Wynn 2000; Bernardo and Smith 2009).

While powerful, this framework is limited by the difﬁ-

culty in evaluating the EIG due to the intractability of the

posterior distribution p(θ|y, d). Foster et al. (2019) proposed

four variational bounds for efﬁciently approximating the

EIG. The method involves deﬁning a variational distribu-

tion, qφ(·)that will approximate either the posterior distri-

bution p(θ|y, d)or the marginal likelihood p(y|d). The pa-

rameters φof this variational distribution are optimized ac-

cording to the proposed bounds. In principle their method

allows for estimating the EIG for arbitrarily complex models

using ﬂexible variational models, however their implemen-

tations focused on simpler variational forms that required

ﬁtting a new variational model for every possible design. In

this work we focus on design amortization by proposing a

novel deep learning architecture based on conditional nor-

malizing ﬂows (NF) (Papamakarios et al. 2021; Kobyzev,

Prince, and Brubaker 2020) and set invariant models (Za-

heer et al. 2017) to deﬁne a ﬂexible variational distribution

qφ(·|d)that only needs to be trained once, but can then ac-

curately estimate the EIG for potentially inﬁnitely many de-

signs. Our experiments will show how design amortization

can dramatically improve computational efﬁciency and how

our more ﬂexible variational form can make much more ac-

curate approximations to the EIG over competing methods.

We provide our code here1.

2 Background

Scientists consistently face the challenge of having to con-

duct experiments under limited resources, and must design

their experiments to use these resources as efﬁciently as pos-

sible. BOED provides a conceptually clear framework for

doing so. We assume we are given a model with design vari-

ables d, experimental outcomes yand latent parameters θ

about which we wish to learn. We have prior information

1redacted for anonymity during review.

arXiv:2210.03283v2 [cs.LG] 20 Oct 2022

on the latent variables encoded in a prior distribution, p(θ),

and a likelihood that predicts experimental outcomes from a

design and latent variables, p(y|θ, d). Via Bayes rule, these

two functions combine to give us the posterior distribution

p(θ|y, d)∝p(y|θ, d)p(θ)representing our state of knowl-

edge about the latent variables after conducting an experi-

ment with design dand observing outcomes y. For example

the design variables, d, could represent the environmental

conditions and chemical concentrations of a medium used

to culture a strain of bacteria, which produces an important

chemical compound. This design problem becomes more

complex with increasing dimension of d, for example, if we

have Spetri dishes to work on (often called the experimental

units or subjects). The experimental outcomes, y, would rep-

resent the amount of the chemical compound yielded from

growing the culture in each of the conditions of d, and the

latent variables θrepresent parameters that deﬁne how the

design variables dmediate the yield of the chemical com-

pound y. After conducting the experiment and observing y,

we can quantify our information gain (IG) as:

IG(y, d) = H[p(θ)] −H[p(θ|y, d)] (2)

However, this gain cannot be evaluated before conducting

the experiment, as it requires knowing the outcomes y. How-

ever, taking the expectation of the information gain with re-

spect to the outcomes, p(y|d), gives the EIG:

EIG(d) = Ep(θ,y|d)log p(θ|y, d)

p(θ)

=Ep(θ,y|d)log p(y|θ, d)

p(y|d)(3)

Nested Monte Carlo: Typically p(θ|y, d)and p(y|d)are

intractable, making the EIG challenging to compute. One

common approach to approximating EIG is to use a nested

Monte Carlo (NMC) estimator (Myung, Cavagnaro, and Pitt

2013; Vincent and Rainforth 2017; Rainforth et al. 2018):

ˆµNMC =1

n=1

log p(yn|θn,0, d)

MPM

m=1 p(yn|θn,m, d)

where θn,m ∼p(θ)and yn∼p(y|θn,0, d)

(4)

Rainforth et al. (2018) showed that NMC is a consistent es-

timator converging as N, M −→ ∞. They also showed that it

is asymptotically optimal to set M∝√N, resulting in the

overall convergence rate of O(T−1

3), where Tis the total

number of samples drawn (i.e. T=NM for NMC). How-

ever, this is much slower than the O(T−1

2)rate of standard

Monte Carlo estimators (Robert and Casella 1999), in which

the total number of samples is simply T=N.

The slow convergence of the NMC estimator can be lim-

iting in practical applications of BOED. The inefﬁciencies

can be traced to requiring an independent estimate of the

marginal likelihood, p(yn|d), for each yn(the denominator

of Eq. (4)). Inspired by this, Foster et al. (2019) proposed

employing techniques from variational inference by deﬁn-

ing a functional approximation to either p(θ|y, d)or p(y|d),

and allowing these estimators to amortize across the samples

of ynfor more efﬁcient estimation of the EIG. In this work

we focus on two of the four estimators they proposed: the

posterior estimator and variational nested Monte Carlo.

Posterior Estimator: The posterior estimator is an appli-

cation of the Barber-Agakov bound to BOED, which was

originally proposed for estimating the mutual information in

noisy communication channels (Barber and Agakov 2003).

It requires deﬁning a variational approximation qφ(θ|y, d)to

the posterior distribution, giving a lower bound to the EIG:

EIG(d)≥ Lpost(d),Ep(θ,y|d)log qφ(θ|y, d)

p(θ)

≈1

n=1

log qφ(θn|yn, d)

p(θn)

where yn, θn∼p(y, θ|d).

(5)

By maximizing this bound with respect to the variational

parameters φ, we can learn a variational form that can ef-

ﬁciently estimate the EIG. A Monte Carlo estimate of this

bound converges with rate O(T−1

2), and if the true poste-

rior distribution is within the class of functions deﬁned by

the variational form qφ, the bound can be made tight (de-

pendent on the optimization) (Foster et al. 2019).

Variational Nested Monte Carlo: The second bound we

discuss is variational nested Monte Carlo (VNMC). It is

closely related to NMC, but differs by applying a variational

approximation qφ(θ|y, d)as an importance sampler to esti-

mate the marginal likelihood term in NMC:

EIG(d)≤

UV N MC (d, M),E

log p(y|θ0, d)

MPM

m=1

p(y,θm|d)

qφ(θm|y,d)

(6)

where the expectation is taken with respect to y, θ0:M∼

p(y, θ0|d)QM

m=1 qφ(θm|y, d).

By minimizing this upper bound with respect to the vari-

ational parameters φ, we can learn an importance distribu-

tion that allows for much more efﬁcient computation of the

EIG. Note that if qφ(θ|y, d)exactly equals the posterior dis-

tribution, the bound is tight and requires only a single nested

sample (M= 1). Even if the variational form does not equal

the posterior, the bound remains consistent as M−→ ∞. Fi-

nally, it is worth noting that by taking qφ(θ|y, d) = p(θ), the

estimator simply reduces to NMC.

It was further shown by Foster et al. (2020) that VNMC

can be easily made into a lower bound by including θ0(the

sample from the prior) when estimating the marginal likeli-

hood, a method we denote as contrastive VNMC (CVNMC):

EIG(d)≥

LCoV N M C (d, M),E

log p(y|θ0, d)

M+1 PM

m=0

p(y,θm|d)

qφ(θm|y,d)

(7)

where the expectation is taken with respect to y, θ0:M∼

p(y, θ0|d)QM

m=1 qφ(θm|y, d). We can also employ this

same technique to regular NMC to estimate both lower and

upper bounds.

Note that the upper bound (6) and lower bound (7) are

particularly useful when evaluating the performance of our

method in settings where ground truth is not available. In

these cases we can examine the bound pairs produced by

NMC and by VNMC to assess which set more tightly con-

strains the true value.

Practical considerations. In this work, we apply the same

ﬂexible mathematical framework proposed in Foster et al.

(2019). However, Foster et al. (2019) adopted a “classical”

variational distribution setting, in which the variational form

qφis selected to take a standard, parametric form. They

found this approach effective, but tested only on very simple

design problems, with only one experimental unit at a time.

Their variational models only incorporate the design implic-

itly, requiring a separate optimization for every design to be

considered2. Unfortunately, as we show in the experiments

this approach is not effective on more complex design prob-

lems. Instead, we propose a far more ﬂexible, deep learning

based distributional form that incorporates the design explic-

itly, allowing us to amortize training across and apply our

trained model to evaluation of all (potentially continuously

many) designs in our feasible set.

3 Method

We are interested in learning a parameterized function,

qφ(θ|y, d), for approximating the posterior distribution. We

now describe our proposed deep learning architecture for

amortizing over designs, allowing practitioners to train a sin-

gle model that is capable of evaluating the EIG for poten-

tially inﬁnitely many designs. We also discuss how we can

efﬁciently train this model using the (simpler and cheaper)

equation (5), then use the resulting approximation in the

more accurate bounds provided by VNMC, (6)–(7). This ad-

vances the work in Foster et al. (2019) by providing a highly

ﬂexible variational form that can be used in a wide variety

of contexts and an inexpensive procedure to train it.

Neural Architecture

Figure 1 shows a high level representation of our architec-

ture. Broadly, it consists of two major components. The ﬁrst

is a learnable function for taking in the design variables d

and simulated experimental outcomes from the model, y,

and producing a design context vector, cy,d, that will be used

to deﬁne a conditional distribution. We focus on the com-

mon case where the experimental units lack any meaning-

ful order and our learnable function must therefore be per-

mutation invariant. We can incorporate this inductive bias

into our model by making our function follow the general

form of set functions proposed in Zaheer et al. (2017). In

the sequel, we denote this component as our set invariant

2Although subsequent work (Foster et al. 2020) considered

evolving both the design and distribution qsimultaneously, even

that work remains focused on a single (if evolving) design.

Set Encoder

Set Emitter

Aggregator

Design_Context

y,_1

d_1

y_2,

d_2

y_n,

d_n

Posterior Samples

Conditional Base

Distribution

Invertible Transform

Set

Invariant

Model

Conditional

Flow

…

Figure 1: A high-level schematic of our architecture for

amortizing over designs. The ﬁrst component (left) takes

in the design variables and simulated observations and pro-

duces a design context, cy,d. In many experiments the indi-

vidual units being experimented on are exchangeable, thus

we use a set invariant architecture. The second (right) is a

conditional normalizing ﬂow, conditioned on the design con-

text produced by the ﬁrst component. Together, they deﬁne

our variational posteriors qφ(θ|y, d), amortized over designs.

model. The second major component is a learnable distribu-

tion conditioned on the design context produced by the set

invariant model. In this work we use conditional normalizing

ﬂows, which consist of a base distribution and sequence of

invertible transformations with tractable Jacobian determi-

nants to maintain proper accounting of the probability den-

sity through the transformations. Both the base distribution

and transformations are learnable and conditioned on the de-

sign context.

Set Invariant Model. It is often the case that the individ-

ual units being experimented on do not posses an inherent

ordering – for example, subjects in a randomized controlled

clinical trial, or the petri dishes in our previous example.

Suppose we would like to ﬁnd the optimally informative de-

sign for an experiment with Sexperimental units, where di

and yidenote the design variables and simulated outcomes

of unit i, respectively. In this setting we want our design

context to be invariant to permutations in its inputs, e.g., re-

ordering the individuals in the trial should not change our

results. Learning permutation invariant functions is an ac-

tive area of research (e.g., Bloem-Reddy and Teh 2020). In

this work we follow the general form proposed by Zaheer

et al. (2017), where our set invariant model is deﬁned as,

cy,d =EMITφEMIT "S

i=0

ENCφEN C (yi, di)#.(8)

In particular, we deﬁne two learnable functions. The set en-

coder ENCφENC (yi, di)takes as input the design variables

and simulated outcomes for each individual experimen-

tal unit. Its output is an intermediary representation for

each experimental unit, which are aggregated together by

summation; the permutation invariance of the sum ensures

invariance of the overall function. The aggregated repre-

sentation is then passed through the set emitter function

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

DesignAmortizationforBayesianOptimalExperimentalDesignNobleKennamer,1StevenWalton,2AlexanderIhler,11DepartmentofComputerScience,UniversityofCaliforniaIrvine2DepartmentofComputerScience,UniversityofOregonnkenname@uci.eduAbstractBayesianoptimalexperimentaldesignisasub-eldofstatis-ticsfocusedondevelop...

展开>> 收起<<

Design Amortization for Bayesian Optimal Experimental Design.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Design Amortization for Bayesian Optimal Experimental Design

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: