ADAPTIVE SYNAPTIC FAILURE 2
and Staras, 2009, Huang and Stevens, 1997] leading to speculation that synaptic failure con-
stitutes a mechanism of variational inference in the brain [Llera-Montero et al., 2019, Maass
and Zador, 1999, Aitchison and Latham, 2015,Aitchison et al., 2021]. In turn, some have
demonstrated the plausibility of Bayesian neural computation by implementing generative
neural architectures that emulate synaptic failure among other biological constraints [Guo
et al., 2019, Neftci et al., 2016, Mostafa and Cauwenberghs, 2018]. Whereas such models
have previously focused on using dropout to sample over uncertainty in model parameters,
we demonstrate that synaptic failure can also sample over posterior predictive distributions,
of which parameter uncertainty is only one component.
To understand the basic structure of a posterior predictive distribution, consider
a model relating two observed variables
ut, vt∈R
at time
t
that are jointly distributed
P
(
ut, vt
). The model takes a new observed input
ut+1
and uses parameters
θt
, trained on
all previous data up to time
t
, to generate a corresponding prediction
ˆvt+1
according to
P
(
ˆvt+1|θt, ut+1
). In Bayesian models,
θ
is randomly distributed conditional on finite vectors
of previously observed inputs and outputs,
P
(
θt|u0. . . ut, v0. . . vt
), i.e., model training is
synonymous with inference of
θ
from past observations. As such,
θ
is known imprecisely,
with precision depending on its role in the model and the number of relevant observations up
to the present,
t
. As we are only interested in the distribution of the final prediction given
some novel input,
P
(
ˆvt+1|ut+1
), any model parameters
θ
are known as nuisance variables. To
obtain
P
(
ˆvt+1|ut+1
), we marginalize out
θ
, meaning we integrate over all of its possible values,
each weighted by its respective likelihood. If observations are independent and identically
distributed, i.e.,
P
(
ut, ut−h
) =
P
(
ut
)
P
(
ut−h
)and
P
(
vt, vt−h
) =
P
(
vt
)
P
(
vt−h
)
,∀h6
=
t
, then
the posterior predictive distribution is given as
P(ˆvt+1|ut+1) = ZP(θt,ˆvt+1|ut+1, u0. . . ut, v0. . . vt)dθ, (1a)
=ZP(ˆvt+1|θt, ut+1)P(θt|u0. . . ut, v0. . . vt)dθ. (1b)
The total imprecision of a predicted output
ˆvt+1
, defined by
P
(
ˆvt+1|ut+1
), thus
includes internal sources of uncertainty, i.e. imprecision in
θ
defined by the parameter
distribution,
P
(
θt|u0. . . ut, v0. . . vt
)and external sources of uncertainty or the residual
distribution,
P
(
ˆvt+1|θt, ut+1
), so named to denote random variation that remains after the
outcome is conditioned on all available inputs. For instance, if
P
(
θt|u0. . . ut, v0. . . vt
)and
P
(
ˆvt+1|θt, ut+1
)are both Gaussian, then the variance of the prediction is
var
(
ˆvt+1|ut+1
) =
var(ˆvt+1|θt, ut+1) + var(θt|u0. . . ut, v0. . . vt).
Approximate integration over posterior predictive distributions is likely to be particu-
larly important to the field of reinforcement learning and models of human decision-making.
For instance, the method of Monte Carlo Tree Search (MCTS) involves estimating the
value of a possible action by averaging over the expected returns from many simulated
trajectories [Sutton and Barto, 2018]. The expected returns are weighted by the likelihood
of their respective states. To assign accurate values, the agent must be able to average over
not only the range of possible state predictions, but any uncertainty in the model parameters
used to make those predictions. The addition of parameter uncertainty would result in
modulation of the breadth of the search based on prior knowledge, allowing a wider range