Adaptive Synaptic Failure Enables Sampling from Posterior Predictive Distributions in the Brain Kevin McKee

2025-05-01 0 0 2.11MB 23 页 10玖币

侵权投诉

Adaptive Synaptic Failure Enables Sampling from Posterior

Predictive Distributions in the Brain

Kevin McKee

University of California, Davis

Ian Crandell

Virginia Tech

Rishidev Chaudhuri

University of California, Davis

Randall O’Reilly

University of California, Davis

Abstract

Bayesian interpretations of neural processing require that biological mech-

anisms represent and operate upon probability distributions in accordance

with Bayes’ theorem. Many have speculated that synaptic failure constitutes

a mechanism of variational, i.e., approximate, Bayesian inference in the brain.

Whereas models have previously used synaptic failure to sample over uncer-

tainty in model parameters, we demonstrate that by adapting transmission

probabilities to learned network weights, synaptic failure can sample not only

over model uncertainty, but complete posterior predictive distributions as

well. Our results potentially explain the brain’s ability to perform proba-

bilistic searches and to approximate complex integrals. These operations are

involved in numerous calculations, including likelihood evaluation and state

value estimation for complex planning.

Introduction

Bayesian interpretations of neural processing require that biological mechanisms

represent and operate upon probability distributions in accordance with Bayes’ theorem. In

this paper, we demonstrate how the random failure of synapses to transmit information may

allow the brain to accurately represent multiple, compounding sources of uncertainty and

perform accurate Bayesian inference.

It has been shown that artificial neural networks can perform variational (i.e., ap-

proximate) Bayesian inference by randomly masking network weights, a form of dropout

sampling [Srivastava et al., 2014,Wan et al., 2013, Gal and Ghahramani, 2016, Labach et al.,

2019, Gal et al., 2017]. Analogously, it is well established that synaptic vesicles randomly

fail at a high rate to release neurotransmitters, [Allen and Stevens, 1994, Borst, 2010, Branco

arXiv:2210.01691v1 [q-bio.NC] 4 Oct 2022

ADAPTIVE SYNAPTIC FAILURE 2

and Staras, 2009, Huang and Stevens, 1997] leading to speculation that synaptic failure con-

stitutes a mechanism of variational inference in the brain [Llera-Montero et al., 2019, Maass

and Zador, 1999, Aitchison and Latham, 2015,Aitchison et al., 2021]. In turn, some have

demonstrated the plausibility of Bayesian neural computation by implementing generative

neural architectures that emulate synaptic failure among other biological constraints [Guo

et al., 2019, Neftci et al., 2016, Mostafa and Cauwenberghs, 2018]. Whereas such models

have previously focused on using dropout to sample over uncertainty in model parameters,

we demonstrate that synaptic failure can also sample over posterior predictive distributions,

of which parameter uncertainty is only one component.

To understand the basic structure of a posterior predictive distribution, consider

a model relating two observed variables

ut, vt∈R

at time

that are jointly distributed

(

ut, vt

). The model takes a new observed input

ut+1

and uses parameters

θt

, trained on

all previous data up to time

, to generate a corresponding prediction

ˆvt+1

according to

(

ˆvt+1|θt, ut+1

). In Bayesian models,

is randomly distributed conditional on finite vectors

of previously observed inputs and outputs,

(

θt|u0. . . ut, v0. . . vt

), i.e., model training is

synonymous with inference of

from past observations. As such,

is known imprecisely,

with precision depending on its role in the model and the number of relevant observations up

to the present,

. As we are only interested in the distribution of the final prediction given

some novel input,

(

ˆvt+1|ut+1

), any model parameters

are known as nuisance variables. To

obtain

(

ˆvt+1|ut+1

), we marginalize out

, meaning we integrate over all of its possible values,

each weighted by its respective likelihood. If observations are independent and identically

distributed, i.e.,

(

ut, ut−h

) =

(

)

(

ut−h

)and

(

vt, vt−h

) =

(

)

(

vt−h

)

,∀h6

, then

the posterior predictive distribution is given as

P(ˆvt+1|ut+1) = ZP(θt,ˆvt+1|ut+1, u0. . . ut, v0. . . vt)dθ, (1a)

=ZP(ˆvt+1|θt, ut+1)P(θt|u0. . . ut, v0. . . vt)dθ. (1b)

The total imprecision of a predicted output

ˆvt+1

, defined by

(

ˆvt+1|ut+1

), thus

includes internal sources of uncertainty, i.e. imprecision in

defined by the parameter

distribution,

(

θt|u0. . . ut, v0. . . vt

)and external sources of uncertainty or the residual

distribution,

(

ˆvt+1|θt, ut+1

), so named to denote random variation that remains after the

outcome is conditioned on all available inputs. For instance, if

(

θt|u0. . . ut, v0. . . vt

)and

(

ˆvt+1|θt, ut+1

)are both Gaussian, then the variance of the prediction is

var

(

ˆvt+1|ut+1

) =

var(ˆvt+1|θt, ut+1) + var(θt|u0. . . ut, v0. . . vt).

Approximate integration over posterior predictive distributions is likely to be particu-

larly important to the field of reinforcement learning and models of human decision-making.

For instance, the method of Monte Carlo Tree Search (MCTS) involves estimating the

value of a possible action by averaging over the expected returns from many simulated

trajectories [Sutton and Barto, 2018]. The expected returns are weighted by the likelihood

of their respective states. To assign accurate values, the agent must be able to average over

not only the range of possible state predictions, but any uncertainty in the model parameters

used to make those predictions. The addition of parameter uncertainty would result in

modulation of the breadth of the search based on prior knowledge, allowing a wider range

ADAPTIVE SYNAPTIC FAILURE 3

of trajectories to be simulated where less is known in advance. As a consequence of this

work, agents designed to reflect neurobiology, such as spiking neural networks, may be able

to plan and act according to both the complexity of the decision and the capacity of past

experiences to inform it.

In this study, we aim to define a neural network constrained for biological plausibility

that uses synaptic failure to draw random samples from

(

ˆvt+1|ut+1

). We approach this

by deriving separate dropout probability functions to sample from

(

θt|u0. . . ut, v0. . . vt

)

and

(

ˆvt+1|θt, ut+1

), then show that combining the two functions results in approximate

samples from

(

ˆvt+1|ut+1

). So far, dropout sampling of

(

θt|u0. . . ut, v0. . . vt

)has only been

formulated in biologically implausible contexts, such as with signed, Gaussian distributed

weights. We are tasked with deriving it for weights constrained between 0 and 1, representing

generic bounds on synaptic efficacy. It has not previously been shown in any context how

synaptic failure may result in samples from

(

ˆvt+1|θt, ut+1

), and hence, from

(

ˆvt+1|ut+1

)

as a whole.

In the first section, we formulate an artificial neural network based on biological

principles and subject to probabilistic interpretation that will be critical for our primary result.

Second, we find an analytic mapping from synaptic weights to transmission probabilities

that allows representative samples to be drawn from

(

ˆvt+1|ut+1

), consistent with recent

evidence that the rate of synaptic failures appears to be under adaptive control [Branco

and Staras, 2009, Borst, 2010]. Finally, we use simulations to demonstrate sampling from

internal, parameter uncertainty, external, residual uncertainty, and from complete posterior

predictive distributions in an abstracted network using only random synaptic failure.

Probabilistic neural network

We present our model in five parts. First, a biological, soft winner-take-all network

model is outlined. Second, we define a method of decoding neural activity in the network to

obtain real posterior predictive samples

ˆvs

so that the network may be evaluated. Third, we

show how learning in the network is approximately inference of its weights from observed

data. Fourth, we use the learning principles to derive transmission probabilities to sample

from the parameter distributions. And finally, we derive a mapping from network weight

values to transmission probabilities to sample from residual distributions. By combining

the fourth and fifth steps, we will obtain synaptic transmission probabilities that accurately

sample the network’s posterior predictive distribution.

Let us specify a neural network that senses the states of two real stimuli

ut, vt∈R

time

t∈

. . . T

. For each new input,

ut+1

, the network generates a prediction,

ˆvt+1

. When

presented with

ut+1

, the network can represent uncertainty in

ˆvt+1

by drawing

random

samples from P(ˆv|u).

The network consists of an equal number of input and output neurons

, respectively

indexed

i, j ∈

. . . N

. To represent

u, v

, input and output neurons fire at rates determined

by their Gaussian receptive fields centered at locations

with equal widths

. Action

potentials for each neuron are represented by Bernoulli random variables xi, yj∈ {0,1}:

P(xi= 1|u):=exp "−(u−µi)2

σ2#, P (yj= 1|v):=exp "−(v−µj)2

σ2#.(2)

ADAPTIVE SYNAPTIC FAILURE 4

Real−valued stimulus

Stimulus domain

Tuning curve activation

(Input value)

Population coded stimulus

Tuning curve centroid

Neuron firing rate

Figure 1 . Dual representation of a stimulus as a real-valued number, the position of the

vertical line on the left, and as neural firing rates, the heights of the lines on the right. Gray

background curves represent the tuning curves of the sensory receptors. Asterisks shows

how each representation is understood in the space of the other.

The internal model is a linear, soft winner-take-all network connecting

positive, bounded weights

wi,j ∈

1]. A posterior sample

is represented by randomly

masking weights with a Bernoulli random variable

mi,j,s ∈ {

}

. The degree of lateral

inhibition active is given by parameter

. When

= 1, no lateral inhibition is active. As

γ→ ∞

approaches a one-hot vector or maximal sparsity in the output activations. The

model is defined as

P(yj,s = 1|W,x,Ms):=(Pimi,j,swi,j xi)γ

Pj(Pimi,j,swi,j xi)γ.(3)

The neural network represents each stimulus pair

u, v

, and each sample prediction

ˆvs

by a series of action potentials. The exact number of action potentials generated

per stimulus value determines a third source of random variability in the output that is

extraneous to this analysis. Instead, we will use the expected values of

and

to represent

standardized, asymptotic rate codes for both predictions and each posterior sample, with

E[xi|u] = P(xi= 1|u),E[yj|v] = P(yj= 1|v), and

E[yj,s|u]≈P(yj= 1|W, xi=E[xi|u],Ms)(4a)

≈(Pimi,j,swi,j E[xi|u])γ

Pj(Pimi,j,swi,j E[xi|u])γ.(4b)

The above gives the expected value of

when

= 1 and is a close approximation

otherwise. With this model, we leave weight dropout mask

as the sole source of internal

random variability. We will show that, as a representation of synaptic failure, it is a sufficient

mechanism to represent both Bayesian parameter uncertainty and residual uncertainty.

Depending on the role of each layer in a multi-layer network, the few neurons that

remain active after lateral inhibition may perform local lateral excitation. This can be

represented by Gaussian kernel smoothing of magnitude ζ,

LEj(z) = ζX

exp "−(µk−µj)2

σ2#+ (1 −ζ)zj,(5)

ADAPTIVE SYNAPTIC FAILURE 5

Firing rate, P(y|w,x)

Input neurons

Output neurons

Max activation

Global inhibition,

local excitation

Random Synaptic

Failure

Repeat several

times for each

stimulus value u

Internal Model

External Stimuli

Firing rate, P(y|w,x)

Final sample

Stimulus (u)

−6−4−2 0 2 4 6

P(u)

−6−4−2 0 2 4 6

Sample response (v)

New sample v

P(v|u)

Firing rate, P(x|u)

Encode

Decode

Figure 2 . The neural network learns a distribution of responses or predictions

ˆv

that follow

stimuli

by encoding them as neural activations

and

. To sample from the distribution

ˆv

, synapses (

) relating

randomly fail. Second, lateral inhibition results in selection

of the most active neuron from the resulting subset. Third, local lateral excitation results

in sustained activation of the nearest neighbors to the maximum, making

a naturalistic

population code for

ˆv

. We decode the samples by inferring

ˆv

from

. By repeating this

process, the whole distribution of ˆvis represented over time.

which can then be applied to our previously defined expected output as

[

yj,s|u

] =

LEj

(

[

ys|u

]). Lateral excitation of this form produces a naturalistic population code around

the few remaining maxima among the output neurons. If the receiving layer represents a

previously observed stimulus that mediates the input and output layers, then in the absence

of the mediating stimulus, the network may use lateral excitation to produce naturalistic

samples in the mediating layer that have the same expected effect on the output layer as

an observed mediator. If the receiving layer is a hidden layer in a multilayer network, then

lateral excitation induces spatial continuity among the learned representations.

Neural decoding scheme.

To test our hypothesis that the brain maps real input

stimuli

to implied real predictions

ˆv

in accordance with Bayes’ theorem, it is necessary

to obtain sample predictions

ˆv

by decoding each sample of neural activity. In practice,

decoding has no correlate in the biological theory per se. The brain only operates on its

internal representations in terms of action potentials (

) and has no other way to represent

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AdaptiveSynapticFailureEnablesSamplingfromPosteriorPredictiveDistributionsintheBrainKevinMcKeeUniversityofCalifornia,DavisIanCrandellVirginiaTechRishidevChaudhuriUniversityofCalifornia,DavisRandallO'ReillyUniversityofCalifornia,DavisAbstractBayesianinterpretationsofneuralprocessingrequirethatbiologi...

展开>> 收起<<

Adaptive Synaptic Failure Enables Sampling from Posterior Predictive Distributions in the Brain Kevin McKee.pdf

共23页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Adaptive Synaptic Failure Enables Sampling from Posterior Predictive Distributions in the Brain Kevin McKee

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: