Multi-level Data Representation For Training Deep Helmholtz Machines 3
1.2 Helmholtz Machines’ Biological Inspiration
We propose to look at an older Generative model called Helmholtz Machine
(HM) [12], which uses the Wake-Sleep (WS) algorithm [21] (details in Appendix
A) instead of Back-propagation.
The Wake-Sleep is an unsupervised learning algorithm that uses two different
networks to simultaneously learn a predictive Recognition Model and a genera-
tive Generation Model. Despite not being a completely Hebbian algorithm, its
activation and learning rules are as local as the Hebb rule [33].
Hebbian learning algorithms respect the original proposition made by Hebb
[19], that learning and memory in the brain would arise from increased synaptic
efficacy, triggered by the coordinated firing of the pre- and post-synaptic neurons
[37], and more importantly, they solve the previously mentioned locality problem
because the synaptic weight updates only depend on the previous layer. Thus,
the locality of WS also helps to avoid that problem in a similar way to the
Hebbian rule.
The unsupervised nature of the algorithm, also contributes to its plausibility,
since the human brain’s learning is mostly done with unsupervised data. And un-
like in Back-propagation where it is very difficult to find an implementation that
works by propagating neuron activations instead of firing probabilities, the WS
algorithm can work effectively with both options, solving the third mentioned
back-propagation implausibility argument.
Furthermore, the learning algorithm of these machines is based on the bio-
logical idea of being awake and asleep. Its intuition is that after we experience an
event, we also produce our own variations of those events. This idea can be easily
extrapolated to what happens on a big scale daily, where we experience reality
during our wake phase, and then recreate it in our sleep, but there is a shorter
scale example that perhaps compares better to the actual behavior of the model
that occurs, for example, in the interaction between the human eyes and the
brain. Our brain receives continuous streams of images that our eyes are captur-
ing, and while we are receiving them, we subconsciously try to predict what will
happen in the next frame, and when the reality does not match your expectation,
for example, when a magician pulls a rabbit out of the hat, we become surprised.
The HM network also mimics this behavior, and after receiving an observation
from the world, it will produce a dream, then the network will adjust its weights
in order to create more plausible dreams, and try to reduce the surprise when
experiencing the next event. Likewise, if you see the same magic trick performed
enough times, you will learn to expect what was previously unexpected.
This “reduce of surprise” corresponds to minimizing a quantity very immi-
nent in neuro-scientific research called Free Energy [14, 16], which is “an infor-
mation theory measure that bounds the surprise on sampling some data, given a
generative model” [15]. Thus, the minimization of Free Energy corroborates the
hypothesis that “a biological agent resists the tendency toward disorder through
a minimization of uncertainty” [37, 15, 13] alluded to in the previous example.