
Bayesian inference in neural networks with different time scales A PREPRINT
through observing and learning time-dependent signals. Experimental results suggest that the prior and the likelihood
for Bayesian inference are encoded in different brain areas[Vilares et al., 2012, Chan et al., 2016, d’Acremont et al.,
2013]. Still, the validity and the mechanisms underlying the results remain controversial, and how area differentiation is
relevant to the accuracy of Bayesian inference is not well understood. A simulation[Quax et al., 2021] suggested that
a gain of the activation function encodes the prior. However, because the prior was fixed in this study, how shaping
occurs when the prior varies over time was not considered.
In general, to obtain the prior, it is necessary to estimate the prior distribution based on previous observations, and the
population of neurons that represents the prior must integrate observed inputs over time. One possible mechanism
for achieving such integration may be two neural modules functioning at distinct time scales: a downstream neuron
population with slower activity changes separated from an upstream neuron population that processes input information.
In this structure, the slow module that does not directly receive inputs may facilitate integration. Some experimental
reports have suggested that the time scale of neural activities in downstream areas of the brain that do not directly receive
external input is slow[Murray et al., 2014, Cavanagh et al., 2020, Golesorkhi et al., 2021]. On this basis, we evaluated
recurrent neural networks (RNNs) with two modules; a main module with direct connection to the input-output layer
and a sub-module with a direct connection to the main module and no connections to the input-output layer (i.e., a
hierarchical structure)(Fig.1). Then, we examined the role of modular structure and the relevance of the time scale
difference between the main and sub-modules for the prior representation for Bayesian inference.
We found that RNNs with a modular structure shape the prior more appropriately than regular RNNs. Further, Bayesian
inference is more accurate when the time scale of the sub-module is appropriately slow. When the time scale is uniform,
prior information is maintained in both the main module and sub-module. In contrast, when the time scales are different,
prior information is represented by the slow sub-module. Comparing these two cases revealed that the coded variance
of prior on the neural manifold was easier to decode in the time scale difference model, which facilitated the distinction
of the average input change from noise.
In addition, we examined if the modular structure with distinct time scales would emerge from a homogeneous neural
network. We trained the network in a Bayesian inference task where the time scale of each neuron varied in time. As
the training progressed, we observed that the time scales of neurons differentiated into slower and faster scales. A
modular structure arose in which slow neurons were separated from the input/output layers, which were predominantly
connected to the fast neurons, and a sub-module with slow neurons represented the prior information.
These results are crucial for understanding the prior representation mechanism in Bayesian inference and provide insight
into the relationships between neural network structure, neural dynamics[Amunts et al., 2022, Mastrogiuseppe and
Ostojic, 2018, Vyas et al., 2020, Beiran et al., 2021], and time scales[Papo, 2013] underlying information processing in
the brain, which is considered the central issue of computational neuroscience.
2 Model
2.1 Recurrent Neural Networks with/without modular structure
To investigate the effect of structure and time scale on Bayesian inference, we considered the following RNNs[Barak,
2017].
First, we established a regular RNN consisting of an input layer, a recurrent(hidden) layer, and an output layer, as shown
in Fig.1(a). The following equation represents the dynamics of the recurrent layer:
x(t+ 1) = (I−α)x(t) + αReLU(Winu(t) + Wx(t)) + √αξ,(1)
where α= (α1, α2, ..., α200)Trepresents a vector to introduce the time scale of the neurons as
αi=αm(1 ≤i≤150)
αs(150 ≤i≤200),(2)
where the standard homogeneous network is given by
αs=αm
; the case with
αs< αm
was also studied to investigate
the effect of time scale difference. Although we mainly studied the systems with 150 fast, and 50 slow neurons, the
results to be discussed are not altered, as long as both the numbers are sufficient (say 100 vs 50, 150 vs 150 for fast and
slow neurons). Here,
u(t)
is the input signal, and
x
is the state of the neurons in the recurrent layer. We adopted the
2