
on the latent variables encoded in a prior distribution, p(θ),
and a likelihood that predicts experimental outcomes from a
design and latent variables, p(y|θ, d). Via Bayes rule, these
two functions combine to give us the posterior distribution
p(θ|y, d)∝p(y|θ, d)p(θ)representing our state of knowl-
edge about the latent variables after conducting an experi-
ment with design dand observing outcomes y. For example
the design variables, d, could represent the environmental
conditions and chemical concentrations of a medium used
to culture a strain of bacteria, which produces an important
chemical compound. This design problem becomes more
complex with increasing dimension of d, for example, if we
have Spetri dishes to work on (often called the experimental
units or subjects). The experimental outcomes, y, would rep-
resent the amount of the chemical compound yielded from
growing the culture in each of the conditions of d, and the
latent variables θrepresent parameters that define how the
design variables dmediate the yield of the chemical com-
pound y. After conducting the experiment and observing y,
we can quantify our information gain (IG) as:
IG(y, d) = H[p(θ)] −H[p(θ|y, d)] (2)
However, this gain cannot be evaluated before conducting
the experiment, as it requires knowing the outcomes y. How-
ever, taking the expectation of the information gain with re-
spect to the outcomes, p(y|d), gives the EIG:
EIG(d) = Ep(θ,y|d)log p(θ|y, d)
p(θ)
=Ep(θ,y|d)log p(y|θ, d)
p(y|d)(3)
Nested Monte Carlo: Typically p(θ|y, d)and p(y|d)are
intractable, making the EIG challenging to compute. One
common approach to approximating EIG is to use a nested
Monte Carlo (NMC) estimator (Myung, Cavagnaro, and Pitt
2013; Vincent and Rainforth 2017; Rainforth et al. 2018):
ˆµNMC =1
N
N
X
n=1
log p(yn|θn,0, d)
1
MPM
m=1 p(yn|θn,m, d)
where θn,m ∼p(θ)and yn∼p(y|θn,0, d)
(4)
Rainforth et al. (2018) showed that NMC is a consistent es-
timator converging as N, M −→ ∞. They also showed that it
is asymptotically optimal to set M∝√N, resulting in the
overall convergence rate of O(T−1
3), where Tis the total
number of samples drawn (i.e. T=NM for NMC). How-
ever, this is much slower than the O(T−1
2)rate of standard
Monte Carlo estimators (Robert and Casella 1999), in which
the total number of samples is simply T=N.
The slow convergence of the NMC estimator can be lim-
iting in practical applications of BOED. The inefficiencies
can be traced to requiring an independent estimate of the
marginal likelihood, p(yn|d), for each yn(the denominator
of Eq. (4)). Inspired by this, Foster et al. (2019) proposed
employing techniques from variational inference by defin-
ing a functional approximation to either p(θ|y, d)or p(y|d),
and allowing these estimators to amortize across the samples
of ynfor more efficient estimation of the EIG. In this work
we focus on two of the four estimators they proposed: the
posterior estimator and variational nested Monte Carlo.
Posterior Estimator: The posterior estimator is an appli-
cation of the Barber-Agakov bound to BOED, which was
originally proposed for estimating the mutual information in
noisy communication channels (Barber and Agakov 2003).
It requires defining a variational approximation qφ(θ|y, d)to
the posterior distribution, giving a lower bound to the EIG:
EIG(d)≥ Lpost(d),Ep(θ,y|d)log qφ(θ|y, d)
p(θ)
≈1
N
N
X
n=1
log qφ(θn|yn, d)
p(θn)
where yn, θn∼p(y, θ|d).
(5)
By maximizing this bound with respect to the variational
parameters φ, we can learn a variational form that can ef-
ficiently estimate the EIG. A Monte Carlo estimate of this
bound converges with rate O(T−1
2), and if the true poste-
rior distribution is within the class of functions defined by
the variational form qφ, the bound can be made tight (de-
pendent on the optimization) (Foster et al. 2019).
Variational Nested Monte Carlo: The second bound we
discuss is variational nested Monte Carlo (VNMC). It is
closely related to NMC, but differs by applying a variational
approximation qφ(θ|y, d)as an importance sampler to esti-
mate the marginal likelihood term in NMC:
EIG(d)≤
UV N MC (d, M),E
log p(y|θ0, d)
1
MPM
m=1
p(y,θm|d)
qφ(θm|y,d)
(6)
where the expectation is taken with respect to y, θ0:M∼
p(y, θ0|d)QM
m=1 qφ(θm|y, d).
By minimizing this upper bound with respect to the vari-
ational parameters φ, we can learn an importance distribu-
tion that allows for much more efficient computation of the
EIG. Note that if qφ(θ|y, d)exactly equals the posterior dis-
tribution, the bound is tight and requires only a single nested
sample (M= 1). Even if the variational form does not equal
the posterior, the bound remains consistent as M−→ ∞. Fi-
nally, it is worth noting that by taking qφ(θ|y, d) = p(θ), the
estimator simply reduces to NMC.
It was further shown by Foster et al. (2020) that VNMC
can be easily made into a lower bound by including θ0(the
sample from the prior) when estimating the marginal likeli-
hood, a method we denote as contrastive VNMC (CVNMC):
EIG(d)≥
LCoV N M C (d, M),E
log p(y|θ0, d)
1
M+1 PM
m=0
p(y,θm|d)
qφ(θm|y,d)
(7)