Bayesian inference is facilitated by modular neural networks with different time scales

2025-05-06 0 0 2.91MB 17 页 10玖币
侵权投诉
BAYESIAN INFERENCE IS FACILITATED BY MODULAR NEURAL
NETWORKS WITH DIFFERENT TIME SCALES
A PREPRINT
Kohei Ichikawa
Graduate School of Arts and Sciences
The University of Tokyo
Meguro-ku, Tokyo 153-8902, Japan
Kunihiko Kaneko
The Niels Bohr Institute
University of Copenhagen
Blegdamsvej 17, Copenhagen, 2100-DK, Denmark
October 25, 2022
ABSTRACT
Various animals, including humans, have been suggested to perform Bayesian inferences to handle
noisy, time-varying external information. In performing Bayesian inference, the prior distribution
must be shaped by sampling noisy external inputs. However, the mechanism by which neural activities
represent such distributions has not yet been elucidated. In this study, we demonstrated that the neural
networks with modular structures including fast and slow modules effectively represented the prior
distribution in performing accurate Bayesian inferences. Using a recurrent neural network consisting
of a main module connected with input and output layers and a sub-module connected only with
the main module and having slower neural activity, we demonstrated that the modular network with
distinct time scales performed more accurate Bayesian inference compared with the neural networks
with uniform time scales. Prior information was represented selectively by the slow sub-module,
which could integrate observed signals over an appropriate period and represent input means and
variances. Accordingly, the network could effectively predict the time-varying inputs. Furthermore,
by training the time scales of neurons starting from networks with uniform time scales and without
modular structure, the above slow-fast modular network structure spontaneously emerged as a result
of learning wherein prior information was selectively represented in the slower sub-module. These
results explain how the prior distribution for Bayesian inference is represented in the brain, provide
insight into the relevance of modular structure with time scale hierarchy to information processing,
and elucidate the significance of brain areas with slower time scales.
Keywords Recurrent Neural Network ·Bayesian inference ·Neural dynamics
1 Introduction
In the human and various animal brain, information processing involves inference based on inputs from the external
world through the sensory systems, which obtains information with uncertainty due to noise. Previous studies suggested
that animals such as humans and monkeys process inputs according to a Bayesian inference framework to deal with
such uncertainty[Knill and Pouget, 2004, Angelaki et al., 2009, Haefner et al., 2016, Ernst and Banks, 2002, Friston,
2012, Merfeld et al., 1999, Doya et al., 2007, Pouget et al., 2013, Beck et al., 2011, Geisler and Kersten, 2002, Honig
et al., 2020].
Bayesian inference is performed by calculating the posterior from the prior, which refers to the information possessed
in advance about the signal, and the likelihood estimated by observing the input signal. Hence, it is believed that the
prior must first be represented in the brain, but how prior information is shaped in the brain remains unclear. In previous
studies, the prior has often been treated as a given value[Echeveste et al., 2020], and the mechanism for shaping the
prior by learning has not been considered. Evolutionary acquisition of the prior has been proposed[Campbell, 2016,
Lo and Zhang, 2021], whereas it is naturally expected that such information should be shaped within one generation
arXiv:2210.12294v1 [q-bio.NC] 21 Oct 2022
Bayesian inference in neural networks with different time scales A PREPRINT
through observing and learning time-dependent signals. Experimental results suggest that the prior and the likelihood
for Bayesian inference are encoded in different brain areas[Vilares et al., 2012, Chan et al., 2016, d’Acremont et al.,
2013]. Still, the validity and the mechanisms underlying the results remain controversial, and how area differentiation is
relevant to the accuracy of Bayesian inference is not well understood. A simulation[Quax et al., 2021] suggested that
a gain of the activation function encodes the prior. However, because the prior was fixed in this study, how shaping
occurs when the prior varies over time was not considered.
In general, to obtain the prior, it is necessary to estimate the prior distribution based on previous observations, and the
population of neurons that represents the prior must integrate observed inputs over time. One possible mechanism
for achieving such integration may be two neural modules functioning at distinct time scales: a downstream neuron
population with slower activity changes separated from an upstream neuron population that processes input information.
In this structure, the slow module that does not directly receive inputs may facilitate integration. Some experimental
reports have suggested that the time scale of neural activities in downstream areas of the brain that do not directly receive
external input is slow[Murray et al., 2014, Cavanagh et al., 2020, Golesorkhi et al., 2021]. On this basis, we evaluated
recurrent neural networks (RNNs) with two modules; a main module with direct connection to the input-output layer
and a sub-module with a direct connection to the main module and no connections to the input-output layer (i.e., a
hierarchical structure)(Fig.1). Then, we examined the role of modular structure and the relevance of the time scale
difference between the main and sub-modules for the prior representation for Bayesian inference.
We found that RNNs with a modular structure shape the prior more appropriately than regular RNNs. Further, Bayesian
inference is more accurate when the time scale of the sub-module is appropriately slow. When the time scale is uniform,
prior information is maintained in both the main module and sub-module. In contrast, when the time scales are different,
prior information is represented by the slow sub-module. Comparing these two cases revealed that the coded variance
of prior on the neural manifold was easier to decode in the time scale difference model, which facilitated the distinction
of the average input change from noise.
In addition, we examined if the modular structure with distinct time scales would emerge from a homogeneous neural
network. We trained the network in a Bayesian inference task where the time scale of each neuron varied in time. As
the training progressed, we observed that the time scales of neurons differentiated into slower and faster scales. A
modular structure arose in which slow neurons were separated from the input/output layers, which were predominantly
connected to the fast neurons, and a sub-module with slow neurons represented the prior information.
These results are crucial for understanding the prior representation mechanism in Bayesian inference and provide insight
into the relationships between neural network structure, neural dynamics[Amunts et al., 2022, Mastrogiuseppe and
Ostojic, 2018, Vyas et al., 2020, Beiran et al., 2021], and time scales[Papo, 2013] underlying information processing in
the brain, which is considered the central issue of computational neuroscience.
2 Model
2.1 Recurrent Neural Networks with/without modular structure
To investigate the effect of structure and time scale on Bayesian inference, we considered the following RNNs[Barak,
2017].
First, we established a regular RNN consisting of an input layer, a recurrent(hidden) layer, and an output layer, as shown
in Fig.1(a). The following equation represents the dynamics of the recurrent layer:
x(t+ 1) = (Iα)x(t) + αReLU(Winu(t) + Wx(t)) + αξ,(1)
where α= (α1, α2, ..., α200)Trepresents a vector to introduce the time scale of the neurons as
αi=αm(1 i150)
αs(150 i200),(2)
where the standard homogeneous network is given by
αs=αm
; the case with
αs< αm
was also studied to investigate
the effect of time scale difference. Although we mainly studied the systems with 150 fast, and 50 slow neurons, the
results to be discussed are not altered, as long as both the numbers are sufficient (say 100 vs 50, 150 vs 150 for fast and
slow neurons). Here,
u(t)
is the input signal, and
x
is the state of the neurons in the recurrent layer. We adopted the
2
Bayesian inference in neural networks with different time scales A PREPRINT
Figure 1: Schematic of RNN. (a) Standard RNN without modular structure (b) RNN with modular structure
activation function ReLU(
RELU(z)=0
for
z0
and
= 0
for
z > 0
)[Nair and Hinton, 2010]. Then, the output of the
RNN was determined by the linear combination of the internal states as follows.
y(t) = Woutx(t)(3)
In Eq.1,
ξ
was used to account for noise in dynamics given by a random variable that follows a normal distribution with
mean 0and standard deviation 0.05.
Next, we introduced a modular structure to the above RNN to ensure the distinction of main and sub-modules(Fig.1(b)).
Only the main module was connected to the input/output layers. Thus, the dynamics of the recurrent layer are given by
xm(t+ 1) = (1 αm)xm(t) + αmReLU(Winu(t) + Wmainxm(t) + Wsmxs(t)) + αmξm(4)
xs(t+ 1) = (1 αs)xs(t) + αsReLU(Wsubxs(t) + Wmsxm(t)) + αsξs,(5)
where
xmandxs
represent the firing rate of neurons in the main and sub-modules, respectively. Here,
αmandαs
represent the time scale of the main and the sub-module, respectively.
αm
is fixed at
1
, while we varied
αs
from
1
to
0.01
to examine the effect of the time scale difference. The RNN output was determined by the linear combination of
internal states of the main module.
y(t) = Woutxm(t)(6)
2.2 Task
In this study, we considered a task in which Bayesian inference improves estimation accuracy. Specifically, the RNN
was tasked with estimating the true value from an observed signal with noise. We generated the external input as
follows: First, the true value
ytrue
was randomly sampled from a generator(cause) distribution, given by the normal
distribution with mean
µg
and variance
σ2
g
. Next, the observed signal
s
was generated from
ytrue
by adding noise
so that the input is given by the normal distribution with mean
ytrue
and variance
σ2
s
. The generator did not remain
constant: It changed with probability
pt
over time. When the generator changed,
µg, σg
were sampled uniformly from
µg[0.5,0.5], σg[0,0.8] respectively.
As mentioned in the Introduction, the prior distribution needed for Bayesian estimation must be estimated from the
observed signal so that it is close to the generator distribution. Then,
u(t)
for Eq.1 (or 4,5) is given by using the
Probabilistic Population Code (PPC), which has been proposed as the neural basis for Bayesian inference[Ma et al.,
2006]. PPC assumes that the information in a signal is encoded by a population of neurons with a position-based
preferred stimulus that fires probabilistically according to a Poisson distribution. It has been shown that neural networks
with a population of neurons following PPC as the input layer can learn probabilistic inference effectively[Orhan and
Ma, 2017]. Therefore, in this study, we also assumed that the activity
u
of the input-layer neurons encoding the observed
signal followed the PPC model.
u
was sampled from the following Poisson distribution[Ichikawa and Kataoka, 2022]:
p(u|s) = Y
i
efi(s)fi(s)ui
ui!(7)
3
Bayesian inference in neural networks with different time scales A PREPRINT
Table 1: Hyperparameters
Attribute Value
Range of µp0.5µp0.5
Range of σp0σp0.8
Range of σlp1/5σl1
Switching probability of prior pt= 0.03
Length of u(t) 100
σPPC 0.5
Lasting time of u(t)T= 120
#Neurons in the main module 150
#Neurons in the submodule 50
αm1
αs1, 0.5, 0.2, 0.1, 0.05, 0.01
Batch size 50
Optimization algorithm Adam
Learning rate 0.001
Iteration 6000
Weight decay 0.0001
Here,
s
is the observed signal generated from
ytrue
by adding noise, and
fi
is the tuning curve of the neurons. This
selective firing occurs in proportion to the gain when the observed signal is generated. This gain is inversely proportional
to the noise variance as
g= 12
l
, and corresponds to signal clarity. Namely, the gain decreases and noise increases
due to uncertainty in observations[Tolhurst et al., 1983]. Considering the gain, we obtain:
fi(s) = gexp(sφi)2
2σ2
PPC ,(8)
where
φi
represents the preferred stimuli of neurons in the input layer. It was assumed that
φi
follows an arithmetic
sequence for i(φi=1/2 + i/m when the number of neurons in the input layer is m)[Swindale, 1998]. Also, σ2
PPC
is a constant that represents the ease of firing and was set as σ2
PPC = 1/2in this study.
In this task, the true value
ytrue
was to be estimated based on the input signal
u
. Therefore, training was performed to
minimize the mean squared error (MSE) between the neural network output
y(t)
and the true value
ytrue(t)
. Note that
the loss function was not based on the Bayesian optimal value calculated from the generator distribution and the noise
in the observed signal but only calculated based on the true value.
L=1
TX
t
(y(t)ytrue(t))2(9)
Training was performed by the backpropagation method [Rumelhart et al., 1986, Werbos, 1990]. An efficient Stochastic
Gradient Descent method, Adam[Kingma and Ba, 2014], was used for optimization. The batch size of training samples
was set to 50, and the weight decay rate was set to 0.0001; training was performed for 6000 iterations(See Table.1 for
the hyperparameters used in the experiment).
Results1: Fixed structure and time scales
Bayesian optimality
Because the generated signal
s
was observed under noise, the neural network was required to estimate the true value
sampled from the generator. If the information from the generator was known,
ytrue
would be estimated by minimizing
the long-term MSE, which reveals the optimal
y
value as follows (maximum a posteriori(MAP) estimation [Bishop,
2006]).
yopt =σ2
g
σ2
g+σ2
s
s+σ2
s
σ2
g+σ2
g
µg(10)
4
摘要:

BAYESIANINFERENCEISFACILITATEDBYMODULARNEURALNETWORKSWITHDIFFERENTTIMESCALESAPREPRINTKoheiIchikawaGraduateSchoolofArtsandSciencesTheUniversityofTokyoMeguro-ku,Tokyo153-8902,JapanKunihikoKanekoTheNielsBohrInstituteUniversityofCopenhagenBlegdamsvej17,Copenhagen,2100-DK,DenmarkOctober25,2022ABSTRACTVar...

展开>> 收起<<
Bayesian inference is facilitated by modular neural networks with different time scales.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:2.91MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注