Adaptive Synaptic Failure Enables Sampling from Posterior Predictive Distributions in the Brain Kevin McKee

2025-05-01 0 0 2.11MB 23 页 10玖币
侵权投诉
Adaptive Synaptic Failure Enables Sampling from Posterior
Predictive Distributions in the Brain
Kevin McKee
University of California, Davis
Ian Crandell
Virginia Tech
Rishidev Chaudhuri
University of California, Davis
Randall O’Reilly
University of California, Davis
Abstract
Bayesian interpretations of neural processing require that biological mech-
anisms represent and operate upon probability distributions in accordance
with Bayes’ theorem. Many have speculated that synaptic failure constitutes
a mechanism of variational, i.e., approximate, Bayesian inference in the brain.
Whereas models have previously used synaptic failure to sample over uncer-
tainty in model parameters, we demonstrate that by adapting transmission
probabilities to learned network weights, synaptic failure can sample not only
over model uncertainty, but complete posterior predictive distributions as
well. Our results potentially explain the brain’s ability to perform proba-
bilistic searches and to approximate complex integrals. These operations are
involved in numerous calculations, including likelihood evaluation and state
value estimation for complex planning.
Introduction
Bayesian interpretations of neural processing require that biological mechanisms
represent and operate upon probability distributions in accordance with Bayes’ theorem. In
this paper, we demonstrate how the random failure of synapses to transmit information may
allow the brain to accurately represent multiple, compounding sources of uncertainty and
perform accurate Bayesian inference.
It has been shown that artificial neural networks can perform variational (i.e., ap-
proximate) Bayesian inference by randomly masking network weights, a form of dropout
sampling [Srivastava et al., 2014,Wan et al., 2013, Gal and Ghahramani, 2016, Labach et al.,
2019, Gal et al., 2017]. Analogously, it is well established that synaptic vesicles randomly
fail at a high rate to release neurotransmitters, [Allen and Stevens, 1994, Borst, 2010, Branco
arXiv:2210.01691v1 [q-bio.NC] 4 Oct 2022
ADAPTIVE SYNAPTIC FAILURE 2
and Staras, 2009, Huang and Stevens, 1997] leading to speculation that synaptic failure con-
stitutes a mechanism of variational inference in the brain [Llera-Montero et al., 2019, Maass
and Zador, 1999, Aitchison and Latham, 2015,Aitchison et al., 2021]. In turn, some have
demonstrated the plausibility of Bayesian neural computation by implementing generative
neural architectures that emulate synaptic failure among other biological constraints [Guo
et al., 2019, Neftci et al., 2016, Mostafa and Cauwenberghs, 2018]. Whereas such models
have previously focused on using dropout to sample over uncertainty in model parameters,
we demonstrate that synaptic failure can also sample over posterior predictive distributions,
of which parameter uncertainty is only one component.
To understand the basic structure of a posterior predictive distribution, consider
a model relating two observed variables
ut, vtR
at time
t
that are jointly distributed
P
(
ut, vt
). The model takes a new observed input
ut+1
and uses parameters
θt
, trained on
all previous data up to time
t
, to generate a corresponding prediction
ˆvt+1
according to
P
(
ˆvt+1|θt, ut+1
). In Bayesian models,
θ
is randomly distributed conditional on finite vectors
of previously observed inputs and outputs,
P
(
θt|u0. . . ut, v0. . . vt
), i.e., model training is
synonymous with inference of
θ
from past observations. As such,
θ
is known imprecisely,
with precision depending on its role in the model and the number of relevant observations up
to the present,
t
. As we are only interested in the distribution of the final prediction given
some novel input,
P
(
ˆvt+1|ut+1
), any model parameters
θ
are known as nuisance variables. To
obtain
P
(
ˆvt+1|ut+1
), we marginalize out
θ
, meaning we integrate over all of its possible values,
each weighted by its respective likelihood. If observations are independent and identically
distributed, i.e.,
P
(
ut, uth
) =
P
(
ut
)
P
(
uth
)and
P
(
vt, vth
) =
P
(
vt
)
P
(
vth
)
,h6
=
t
, then
the posterior predictive distribution is given as
P(ˆvt+1|ut+1) = ZP(θt,ˆvt+1|ut+1, u0. . . ut, v0. . . vt)dθ, (1a)
=ZP(ˆvt+1|θt, ut+1)P(θt|u0. . . ut, v0. . . vt). (1b)
The total imprecision of a predicted output
ˆvt+1
, defined by
P
(
ˆvt+1|ut+1
), thus
includes internal sources of uncertainty, i.e. imprecision in
θ
defined by the parameter
distribution,
P
(
θt|u0. . . ut, v0. . . vt
)and external sources of uncertainty or the residual
distribution,
P
(
ˆvt+1|θt, ut+1
), so named to denote random variation that remains after the
outcome is conditioned on all available inputs. For instance, if
P
(
θt|u0. . . ut, v0. . . vt
)and
P
(
ˆvt+1|θt, ut+1
)are both Gaussian, then the variance of the prediction is
var
(
ˆvt+1|ut+1
) =
var(ˆvt+1|θt, ut+1) + var(θt|u0. . . ut, v0. . . vt).
Approximate integration over posterior predictive distributions is likely to be particu-
larly important to the field of reinforcement learning and models of human decision-making.
For instance, the method of Monte Carlo Tree Search (MCTS) involves estimating the
value of a possible action by averaging over the expected returns from many simulated
trajectories [Sutton and Barto, 2018]. The expected returns are weighted by the likelihood
of their respective states. To assign accurate values, the agent must be able to average over
not only the range of possible state predictions, but any uncertainty in the model parameters
used to make those predictions. The addition of parameter uncertainty would result in
modulation of the breadth of the search based on prior knowledge, allowing a wider range
ADAPTIVE SYNAPTIC FAILURE 3
of trajectories to be simulated where less is known in advance. As a consequence of this
work, agents designed to reflect neurobiology, such as spiking neural networks, may be able
to plan and act according to both the complexity of the decision and the capacity of past
experiences to inform it.
In this study, we aim to define a neural network constrained for biological plausibility
that uses synaptic failure to draw random samples from
P
(
ˆvt+1|ut+1
). We approach this
by deriving separate dropout probability functions to sample from
P
(
θt|u0. . . ut, v0. . . vt
)
and
P
(
ˆvt+1|θt, ut+1
), then show that combining the two functions results in approximate
samples from
P
(
ˆvt+1|ut+1
). So far, dropout sampling of
P
(
θt|u0. . . ut, v0. . . vt
)has only been
formulated in biologically implausible contexts, such as with signed, Gaussian distributed
weights. We are tasked with deriving it for weights constrained between 0 and 1, representing
generic bounds on synaptic efficacy. It has not previously been shown in any context how
synaptic failure may result in samples from
P
(
ˆvt+1|θt, ut+1
), and hence, from
P
(
ˆvt+1|ut+1
)
as a whole.
In the first section, we formulate an artificial neural network based on biological
principles and subject to probabilistic interpretation that will be critical for our primary result.
Second, we find an analytic mapping from synaptic weights to transmission probabilities
that allows representative samples to be drawn from
P
(
ˆvt+1|ut+1
), consistent with recent
evidence that the rate of synaptic failures appears to be under adaptive control [Branco
and Staras, 2009, Borst, 2010]. Finally, we use simulations to demonstrate sampling from
internal, parameter uncertainty, external, residual uncertainty, and from complete posterior
predictive distributions in an abstracted network using only random synaptic failure.
Probabilistic neural network
We present our model in five parts. First, a biological, soft winner-take-all network
model is outlined. Second, we define a method of decoding neural activity in the network to
obtain real posterior predictive samples
ˆvs
so that the network may be evaluated. Third, we
show how learning in the network is approximately inference of its weights from observed
data. Fourth, we use the learning principles to derive transmission probabilities to sample
from the parameter distributions. And finally, we derive a mapping from network weight
values to transmission probabilities to sample from residual distributions. By combining
the fourth and fifth steps, we will obtain synaptic transmission probabilities that accurately
sample the network’s posterior predictive distribution.
Let us specify a neural network that senses the states of two real stimuli
ut, vtR
at
time
t
1
. . . T
. For each new input,
ut+1
, the network generates a prediction,
ˆvt+1
. When
presented with
ut+1
, the network can represent uncertainty in
ˆvt+1
by drawing
S
random
samples from P(ˆv|u).
The network consists of an equal number of input and output neurons
N
, respectively
indexed
i, j
1
. . . N
. To represent
u, v
, input and output neurons fire at rates determined
by their Gaussian receptive fields centered at locations
µ
with equal widths
σ
. Action
potentials for each neuron are represented by Bernoulli random variables xi, yj∈ {0,1}:
P(xi= 1|u):=exp "(uµi)2
σ2#, P (yj= 1|v):=exp "(vµj)2
σ2#.(2)
ADAPTIVE SYNAPTIC FAILURE 4
Real−valued stimulus
Stimulus domain
Tuning curve activation
(Input value)
Population coded stimulus
Tuning curve centroid
Neuron firing rate
Figure 1 . Dual representation of a stimulus as a real-valued number, the position of the
vertical line on the left, and as neural firing rates, the heights of the lines on the right. Gray
background curves represent the tuning curves of the sensory receptors. Asterisks shows
how each representation is understood in the space of the other.
The internal model is a linear, soft winner-take-all network connecting
x
to
y
by
positive, bounded weights
wi,j
[0
,
1]. A posterior sample
s
is represented by randomly
masking weights with a Bernoulli random variable
mi,j,s ∈ {
0
,
1
}
. The degree of lateral
inhibition active is given by parameter
γ
. When
γ
= 1, no lateral inhibition is active. As
γ→ ∞
,
y
approaches a one-hot vector or maximal sparsity in the output activations. The
model is defined as
P(yj,s = 1|W,x,Ms):=(Pimi,j,swi,j xi)γ
Pj(Pimi,j,swi,j xi)γ.(3)
The neural network represents each stimulus pair
u, v
, and each sample prediction
ˆvs
by a series of action potentials. The exact number of action potentials generated
per stimulus value determines a third source of random variability in the output that is
extraneous to this analysis. Instead, we will use the expected values of
xi
and
yj
to represent
standardized, asymptotic rate codes for both predictions and each posterior sample, with
E[xi|u] = P(xi= 1|u),E[yj|v] = P(yj= 1|v), and
E[yj,s|u]P(yj= 1|W, xi=E[xi|u],Ms)(4a)
(Pimi,j,swi,j E[xi|u])γ
Pj(Pimi,j,swi,j E[xi|u])γ.(4b)
The above gives the expected value of
y
when
γ
= 1 and is a close approximation
otherwise. With this model, we leave weight dropout mask
m
as the sole source of internal
random variability. We will show that, as a representation of synaptic failure, it is a sufficient
mechanism to represent both Bayesian parameter uncertainty and residual uncertainty.
Depending on the role of each layer in a multi-layer network, the few neurons that
remain active after lateral inhibition may perform local lateral excitation. This can be
represented by Gaussian kernel smoothing of magnitude ζ,
LEj(z) = ζX
k
exp "(µkµj)2
σ2#+ (1 ζ)zj,(5)
ADAPTIVE SYNAPTIC FAILURE 5
Firing rate, P(y|w,x)
Input neurons
Output neurons
Max activation
Global inhibition,
local excitation
Random Synaptic
Failure
Repeat several
times for each
stimulus value u
Internal Model
External Stimuli
Firing rate, P(y|w,x)
Final sample
Stimulus (u)
642 0 2 4 6
P(u)
642 0 2 4 6
Sample response (v)
v
New sample v
v
P(v|u)
v
Firing rate, P(x|u)
u
Encode
Decode
Figure 2 . The neural network learns a distribution of responses or predictions
ˆv
that follow
stimuli
u
by encoding them as neural activations
x
and
y
. To sample from the distribution
of
ˆv
, synapses (
w
) relating
x
to
y
randomly fail. Second, lateral inhibition results in selection
of the most active neuron from the resulting subset. Third, local lateral excitation results
in sustained activation of the nearest neighbors to the maximum, making
y
a naturalistic
population code for
ˆv
. We decode the samples by inferring
ˆv
from
y
. By repeating this
process, the whole distribution of ˆvis represented over time.
which can then be applied to our previously defined expected output as
E0
[
yj,s|u
] =
LEj
(
E
[
ys|u
]). Lateral excitation of this form produces a naturalistic population code around
the few remaining maxima among the output neurons. If the receiving layer represents a
previously observed stimulus that mediates the input and output layers, then in the absence
of the mediating stimulus, the network may use lateral excitation to produce naturalistic
samples in the mediating layer that have the same expected effect on the output layer as
an observed mediator. If the receiving layer is a hidden layer in a multilayer network, then
lateral excitation induces spatial continuity among the learned representations.
Neural decoding scheme.
To test our hypothesis that the brain maps real input
stimuli
u
to implied real predictions
ˆv
in accordance with Bayes’ theorem, it is necessary
to obtain sample predictions
ˆv
by decoding each sample of neural activity. In practice,
decoding has no correlate in the biological theory per se. The brain only operates on its
internal representations in terms of action potentials (
y
) and has no other way to represent
摘要:

AdaptiveSynapticFailureEnablesSamplingfromPosteriorPredictiveDistributionsintheBrainKevinMcKeeUniversityofCalifornia,DavisIanCrandellVirginiaTechRishidevChaudhuriUniversityofCalifornia,DavisRandallO'ReillyUniversityofCalifornia,DavisAbstractBayesianinterpretationsofneuralprocessingrequirethatbiologi...

展开>> 收起<<
Adaptive Synaptic Failure Enables Sampling from Posterior Predictive Distributions in the Brain Kevin McKee.pdf

共23页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:23 页 大小:2.11MB 格式:PDF 时间:2025-05-01

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 23
客服
关注