Bayesian inference is facilitated by modular neural networks with different time scales

2025-05-06 0 0 2.91MB 17 页 10玖币

侵权投诉

BAYESIAN INFERENCE IS FACILITATED BY MODULAR NEURAL

NETWORKS WITH DIFFERENT TIME SCALES

A PREPRINT

Kohei Ichikawa

Graduate School of Arts and Sciences

The University of Tokyo

Meguro-ku, Tokyo 153-8902, Japan

Kunihiko Kaneko

The Niels Bohr Institute

University of Copenhagen

Blegdamsvej 17, Copenhagen, 2100-DK, Denmark

October 25, 2022

ABSTRACT

Various animals, including humans, have been suggested to perform Bayesian inferences to handle

noisy, time-varying external information. In performing Bayesian inference, the prior distribution

must be shaped by sampling noisy external inputs. However, the mechanism by which neural activities

represent such distributions has not yet been elucidated. In this study, we demonstrated that the neural

networks with modular structures including fast and slow modules effectively represented the prior

distribution in performing accurate Bayesian inferences. Using a recurrent neural network consisting

of a main module connected with input and output layers and a sub-module connected only with

the main module and having slower neural activity, we demonstrated that the modular network with

distinct time scales performed more accurate Bayesian inference compared with the neural networks

with uniform time scales. Prior information was represented selectively by the slow sub-module,

which could integrate observed signals over an appropriate period and represent input means and

variances. Accordingly, the network could effectively predict the time-varying inputs. Furthermore,

by training the time scales of neurons starting from networks with uniform time scales and without

modular structure, the above slow-fast modular network structure spontaneously emerged as a result

of learning wherein prior information was selectively represented in the slower sub-module. These

results explain how the prior distribution for Bayesian inference is represented in the brain, provide

insight into the relevance of modular structure with time scale hierarchy to information processing,

and elucidate the signiﬁcance of brain areas with slower time scales.

Keywords Recurrent Neural Network ·Bayesian inference ·Neural dynamics

1 Introduction

In the human and various animal brain, information processing involves inference based on inputs from the external

world through the sensory systems, which obtains information with uncertainty due to noise. Previous studies suggested

that animals such as humans and monkeys process inputs according to a Bayesian inference framework to deal with

such uncertainty[Knill and Pouget, 2004, Angelaki et al., 2009, Haefner et al., 2016, Ernst and Banks, 2002, Friston,

2012, Merfeld et al., 1999, Doya et al., 2007, Pouget et al., 2013, Beck et al., 2011, Geisler and Kersten, 2002, Honig

et al., 2020].

Bayesian inference is performed by calculating the posterior from the prior, which refers to the information possessed

in advance about the signal, and the likelihood estimated by observing the input signal. Hence, it is believed that the

prior must ﬁrst be represented in the brain, but how prior information is shaped in the brain remains unclear. In previous

studies, the prior has often been treated as a given value[Echeveste et al., 2020], and the mechanism for shaping the

prior by learning has not been considered. Evolutionary acquisition of the prior has been proposed[Campbell, 2016,

Lo and Zhang, 2021], whereas it is naturally expected that such information should be shaped within one generation

arXiv:2210.12294v1 [q-bio.NC] 21 Oct 2022

Bayesian inference in neural networks with different time scales A PREPRINT

through observing and learning time-dependent signals. Experimental results suggest that the prior and the likelihood

for Bayesian inference are encoded in different brain areas[Vilares et al., 2012, Chan et al., 2016, d’Acremont et al.,

2013]. Still, the validity and the mechanisms underlying the results remain controversial, and how area differentiation is

relevant to the accuracy of Bayesian inference is not well understood. A simulation[Quax et al., 2021] suggested that

a gain of the activation function encodes the prior. However, because the prior was ﬁxed in this study, how shaping

occurs when the prior varies over time was not considered.

In general, to obtain the prior, it is necessary to estimate the prior distribution based on previous observations, and the

population of neurons that represents the prior must integrate observed inputs over time. One possible mechanism

for achieving such integration may be two neural modules functioning at distinct time scales: a downstream neuron

population with slower activity changes separated from an upstream neuron population that processes input information.

In this structure, the slow module that does not directly receive inputs may facilitate integration. Some experimental

reports have suggested that the time scale of neural activities in downstream areas of the brain that do not directly receive

external input is slow[Murray et al., 2014, Cavanagh et al., 2020, Golesorkhi et al., 2021]. On this basis, we evaluated

recurrent neural networks (RNNs) with two modules; a main module with direct connection to the input-output layer

and a sub-module with a direct connection to the main module and no connections to the input-output layer (i.e., a

hierarchical structure)(Fig.1). Then, we examined the role of modular structure and the relevance of the time scale

difference between the main and sub-modules for the prior representation for Bayesian inference.

We found that RNNs with a modular structure shape the prior more appropriately than regular RNNs. Further, Bayesian

inference is more accurate when the time scale of the sub-module is appropriately slow. When the time scale is uniform,

prior information is maintained in both the main module and sub-module. In contrast, when the time scales are different,

prior information is represented by the slow sub-module. Comparing these two cases revealed that the coded variance

of prior on the neural manifold was easier to decode in the time scale difference model, which facilitated the distinction

of the average input change from noise.

In addition, we examined if the modular structure with distinct time scales would emerge from a homogeneous neural

network. We trained the network in a Bayesian inference task where the time scale of each neuron varied in time. As

the training progressed, we observed that the time scales of neurons differentiated into slower and faster scales. A

modular structure arose in which slow neurons were separated from the input/output layers, which were predominantly

connected to the fast neurons, and a sub-module with slow neurons represented the prior information.

These results are crucial for understanding the prior representation mechanism in Bayesian inference and provide insight

into the relationships between neural network structure, neural dynamics[Amunts et al., 2022, Mastrogiuseppe and

Ostojic, 2018, Vyas et al., 2020, Beiran et al., 2021], and time scales[Papo, 2013] underlying information processing in

the brain, which is considered the central issue of computational neuroscience.

2 Model

2.1 Recurrent Neural Networks with/without modular structure

To investigate the effect of structure and time scale on Bayesian inference, we considered the following RNNs[Barak,

2017].

First, we established a regular RNN consisting of an input layer, a recurrent(hidden) layer, and an output layer, as shown

in Fig.1(a). The following equation represents the dynamics of the recurrent layer:

x(t+ 1) = (I−α)x(t) + αReLU(Winu(t) + Wx(t)) + √αξ,(1)

where α= (α1, α2, ..., α200)Trepresents a vector to introduce the time scale of the neurons as

αi=αm(1 ≤i≤150)

αs(150 ≤i≤200),(2)

where the standard homogeneous network is given by

αs=αm

; the case with

αs< αm

was also studied to investigate

the effect of time scale difference. Although we mainly studied the systems with 150 fast, and 50 slow neurons, the

results to be discussed are not altered, as long as both the numbers are sufﬁcient (say 100 vs 50, 150 vs 150 for fast and

slow neurons). Here,

u(t)

is the input signal, and

is the state of the neurons in the recurrent layer. We adopted the

Bayesian inference in neural networks with different time scales A PREPRINT

Figure 1: Schematic of RNN. (a) Standard RNN without modular structure (b) RNN with modular structure

activation function ReLU(

RELU(z)=0

for

z≤0

and

= 0

for

z > 0

)[Nair and Hinton, 2010]. Then, the output of the

RNN was determined by the linear combination of the internal states as follows.

y(t) = Woutx(t)(3)

In Eq.1,

was used to account for noise in dynamics given by a random variable that follows a normal distribution with

mean 0and standard deviation 0.05.

Next, we introduced a modular structure to the above RNN to ensure the distinction of main and sub-modules(Fig.1(b)).

Only the main module was connected to the input/output layers. Thus, the dynamics of the recurrent layer are given by

xm(t+ 1) = (1 −αm)xm(t) + αmReLU(Winu(t) + Wmainxm(t) + Ws→mxs(t)) + √αmξm(4)

xs(t+ 1) = (1 −αs)xs(t) + αsReLU(Wsubxs(t) + Wm→sxm(t)) + √αsξs,(5)

where

xmandxs

represent the ﬁring rate of neurons in the main and sub-modules, respectively. Here,

αmandαs

represent the time scale of the main and the sub-module, respectively.

αm

is ﬁxed at

, while we varied

αs

from

0.01

to examine the effect of the time scale difference. The RNN output was determined by the linear combination of

internal states of the main module.

y(t) = Woutxm(t)(6)

2.2 Task

In this study, we considered a task in which Bayesian inference improves estimation accuracy. Speciﬁcally, the RNN

was tasked with estimating the true value from an observed signal with noise. We generated the external input as

follows: First, the true value

ytrue

was randomly sampled from a generator(cause) distribution, given by the normal

distribution with mean

µg

and variance

σ2

. Next, the observed signal

was generated from

ytrue

by adding noise

so that the input is given by the normal distribution with mean

ytrue

and variance

σ2

. The generator did not remain

constant: It changed with probability

over time. When the generator changed,

µg, σg

were sampled uniformly from

µg∈[−0.5,0.5], σg∈[0,0.8] respectively.

As mentioned in the Introduction, the prior distribution needed for Bayesian estimation must be estimated from the

observed signal so that it is close to the generator distribution. Then,

u(t)

for Eq.1 (or 4,5) is given by using the

Probabilistic Population Code (PPC), which has been proposed as the neural basis for Bayesian inference[Ma et al.,

2006]. PPC assumes that the information in a signal is encoded by a population of neurons with a position-based

preferred stimulus that ﬁres probabilistically according to a Poisson distribution. It has been shown that neural networks

with a population of neurons following PPC as the input layer can learn probabilistic inference effectively[Orhan and

Ma, 2017]. Therefore, in this study, we also assumed that the activity

of the input-layer neurons encoding the observed

signal followed the PPC model.

was sampled from the following Poisson distribution[Ichikawa and Kataoka, 2022]:

p(u|s) = Y

e−fi(s)fi(s)ui

ui!(7)

Bayesian inference in neural networks with different time scales A PREPRINT

Table 1: Hyperparameters

Attribute Value

Range of µp−0.5≤µp≤0.5

Range of σp0≤σp≤0.8

Range of σlp1/5≤σl≤1

Switching probability of prior pt= 0.03

Length of u(t) 100

σPPC 0.5

Lasting time of u(t)T= 120

#Neurons in the main module 150

#Neurons in the submodule 50

αm1

αs1, 0.5, 0.2, 0.1, 0.05, 0.01

Batch size 50

Optimization algorithm Adam

Learning rate 0.001

Iteration 6000

Weight decay 0.0001

Here,

is the observed signal generated from

ytrue

by adding noise, and

is the tuning curve of the neurons. This

selective ﬁring occurs in proportion to the gain when the observed signal is generated. This gain is inversely proportional

to the noise variance as

g= 1/σ2

, and corresponds to signal clarity. Namely, the gain decreases and noise increases

due to uncertainty in observations[Tolhurst et al., 1983]. Considering the gain, we obtain:

fi(s) = gexp−(s−φi)2

2σ2

PPC ,(8)

where

φi

represents the preferred stimuli of neurons in the input layer. It was assumed that

φi

follows an arithmetic

sequence for i(φi=−1/2 + i/m when the number of neurons in the input layer is m)[Swindale, 1998]. Also, σ2

PPC

is a constant that represents the ease of ﬁring and was set as σ2

PPC = 1/2in this study.

In this task, the true value

ytrue

was to be estimated based on the input signal

. Therefore, training was performed to

minimize the mean squared error (MSE) between the neural network output

y(t)

and the true value

ytrue(t)

. Note that

the loss function was not based on the Bayesian optimal value calculated from the generator distribution and the noise

in the observed signal but only calculated based on the true value.

L=1

(y(t)−ytrue(t))2(9)

Training was performed by the backpropagation method [Rumelhart et al., 1986, Werbos, 1990]. An efﬁcient Stochastic

Gradient Descent method, Adam[Kingma and Ba, 2014], was used for optimization. The batch size of training samples

was set to 50, and the weight decay rate was set to 0.0001; training was performed for 6000 iterations(See Table.1 for

the hyperparameters used in the experiment).

Results1: Fixed structure and time scales

Bayesian optimality

Because the generated signal

was observed under noise, the neural network was required to estimate the true value

sampled from the generator. If the information from the generator was known,

ytrue

would be estimated by minimizing

the long-term MSE, which reveals the optimal

value as follows (maximum a posteriori(MAP) estimation [Bishop,

2006]).

yopt =σ2

σ2

g+σ2

s+σ2

σ2

g+σ2

µg(10)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BAYESIANINFERENCEISFACILITATEDBYMODULARNEURALNETWORKSWITHDIFFERENTTIMESCALESAPREPRINTKoheiIchikawaGraduateSchoolofArtsandSciencesTheUniversityofTokyoMeguro-ku,Tokyo153-8902,JapanKunihikoKanekoTheNielsBohrInstituteUniversityofCopenhagenBlegdamsvej17,Copenhagen,2100-DK,DenmarkOctober25,2022ABSTRACTVar...

展开>> 收起<<

Bayesian inference is facilitated by modular neural networks with different time scales.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Bayesian inference is facilitated by modular neural networks with different time scales

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: