Multi-level Data Representation For Training Deep Helmholtz Machines Jose Miguel Ramos jose.miguel.ramostecnico.ulisboa.pt

2025-05-02 0 0 1.1MB 24 页 10玖币

侵权投诉

Multi-level Data Representation For Training

Deep Helmholtz Machines

Jose Miguel Ramos jose.miguel.ramos@tecnico.ulisboa.pt

Luis Sa-Couto luis.sa.couto@tecnico.ulisboa.pt

Andreas Wichert andreas.wichert@tecnico.ulisboa.pt

Department of Computer Science and Engineering, INESC-ID & Instituto Superior

T´ecnico, University of Lisbon, 2744-016 Porto Salvo, Portugal

Abstract. A vast majority of the current research in the ﬁeld of Ma-

chine Learning is done using algorithms with strong arguments point-

ing to their biological implausibility such as Backpropagation, deviating

the ﬁeld’s focus from understanding its original organic inspiration to a

compulsive search for optimal performance. Yet, there have been a few

proposed models that respect most of the biological constraints present

in the human brain and are valid candidates for mimicking some of its

properties and mechanisms. In this paper, we will focus on guiding the

learning of a biologically plausible generative model called the Helmholtz

Machine in complex search spaces using a heuristic based on the Human

Image Perception mechanism. We hypothesize that this model’s learn-

ing algorithm is not ﬁt for Deep Networks due to its Hebbian-like local

update rule, rendering it incapable of taking full advantage of the com-

positional properties that multi-layer networks provide. We propose to

overcome this problem, by providing the network’s hidden layers with

visual queues at diﬀerent resolutions using a Multi-level Data represen-

tation. The results on several image datasets showed the model was able

to not only obtain better overall quality but also a wider diversity in the

generated images, corroborating our intuition that using our proposed

heuristic allows the model to take more advantage of the network’s depth

growth. More importantly, they show the unexplored possibilities under-

lying brain-inspired models and techniques.

Keywords: Helmholtz Machine ·Biologically-inspired Models ·Deep

Learning ·Generative Models ·Hebbian Learning ·Wake-Sleep.

1 Introduction

Most recent machine learning models have shown great eﬀectiveness at solving a

wide range of complex cognitive tasks [40, 27], and back-propagation algorithms

seem to be at the core of the majority of those models, proving it to be one of the

most reliable and fast ways for machines to learn [4, 31, 29]. Visual pattern recog-

nition is one of the many ﬁelds in which back-propagation algorithms thrive [38,

27, 18]. The evolution of these models’ quality has been impressively swift, but

as we get closer to perfection, the possible improvements get evermore diﬃcult

arXiv:2210.14855v1 [cs.LG] 26 Oct 2022

2 J. M. Ramos, L. Sa-Couto, A. Wichert

[2]. For some of the more simple visual tasks like image classiﬁcation of hand-

written digits in the famous MNIST dataset [28], these models have surpassed

the brain capabilities, performing better than human participants [2, 9].

The surpassing of the human brain’s accuracy is an amazing scientiﬁc mark

and allows for more reliable and robust technology.

In the midst of this search for better and more powerful models, grew a ﬁrmer

and ﬁrmer connection between the two concepts of intelligence and accuracy.

We seem to have been intuitively led to the conclusion that the better a model

performs at a certain task, the more intelligent it is. In a sense, we deviate from

trying to mimic the brain’s biological way of processing information and focus

instead on neural network models that perform better [31, 26].

Nonetheless, even if there are models that compete with the human brain at

performing speciﬁc tasks, there is no model that comes close to the robustness

and ﬂexibility of the human brain when dealing with general image classiﬁcation

and pattern recognition problems.

Therefore, a large part of the scientiﬁc community is still focused on the bi-

ologically plausible side of machine learning, proposing new competitive models

that remain an arguably plausible implementation of some human brain mech-

anisms and properties [26, 22, 5, 34, 10].

1.1 Back-propagation’s Biological Plausibility

Despite the obvious biological inspiration of the Back-propagation (Backprop)

algorithm [31, 32], its biological plausibility has been questioned very early on

from its appearance [11, 36]. In recent years, although there have been many

attempts to create biologically plausible and empirically powerful learning algo-

rithms similar to Backprop [30, 4, 26], there is an overall consensus that some

fundamental properties of back-propagation are too diﬃcult for the human brain

to implement [22, 31].

The ﬁrst and most relevant argument is related to the fact that backprop

synaptic weight updates depend on computations and activation on an entire

chain of neurons whereas biological synapses change their connection strength

solely based on local signals. Furthermore, for this Gradient-based algorithm to

work, biological neurons’ updates would have to be frozen in time waiting for

the signal to reach its ﬁnal destination where the error comparison is made, and

only after the signal travels backwards the membrane permeability would be

changed in accordance to its success or failure [5].

The second is the fact that back-propagation uses the same weights when

performing forward and backwards passes, which would require identical bidi-

rectional connections in biological neurons that are not present in all parts of

the brain.

And lastly, the fact backprop networks propagate ﬁring probabilities, whereas

biological neurons only propagate neuron spikes [40].

Multi-level Data Representation For Training Deep Helmholtz Machines 3

1.2 Helmholtz Machines’ Biological Inspiration

We propose to look at an older Generative model called Helmholtz Machine

(HM) [12], which uses the Wake-Sleep (WS) algorithm [21] (details in Appendix

A) instead of Back-propagation.

The Wake-Sleep is an unsupervised learning algorithm that uses two diﬀerent

networks to simultaneously learn a predictive Recognition Model and a genera-

tive Generation Model. Despite not being a completely Hebbian algorithm, its

activation and learning rules are as local as the Hebb rule [33].

Hebbian learning algorithms respect the original proposition made by Hebb

[19], that learning and memory in the brain would arise from increased synaptic

eﬃcacy, triggered by the coordinated ﬁring of the pre- and post-synaptic neurons

[37], and more importantly, they solve the previously mentioned locality problem

because the synaptic weight updates only depend on the previous layer. Thus,

the locality of WS also helps to avoid that problem in a similar way to the

Hebbian rule.

The unsupervised nature of the algorithm, also contributes to its plausibility,

since the human brain’s learning is mostly done with unsupervised data. And un-

like in Back-propagation where it is very diﬃcult to ﬁnd an implementation that

works by propagating neuron activations instead of ﬁring probabilities, the WS

algorithm can work eﬀectively with both options, solving the third mentioned

back-propagation implausibility argument.

Furthermore, the learning algorithm of these machines is based on the bio-

logical idea of being awake and asleep. Its intuition is that after we experience an

event, we also produce our own variations of those events. This idea can be easily

extrapolated to what happens on a big scale daily, where we experience reality

during our wake phase, and then recreate it in our sleep, but there is a shorter

scale example that perhaps compares better to the actual behavior of the model

that occurs, for example, in the interaction between the human eyes and the

brain. Our brain receives continuous streams of images that our eyes are captur-

ing, and while we are receiving them, we subconsciously try to predict what will

happen in the next frame, and when the reality does not match your expectation,

for example, when a magician pulls a rabbit out of the hat, we become surprised.

The HM network also mimics this behavior, and after receiving an observation

from the world, it will produce a dream, then the network will adjust its weights

in order to create more plausible dreams, and try to reduce the surprise when

experiencing the next event. Likewise, if you see the same magic trick performed

enough times, you will learn to expect what was previously unexpected.

This “reduce of surprise” corresponds to minimizing a quantity very immi-

nent in neuro-scientiﬁc research called Free Energy [14, 16], which is “an infor-

mation theory measure that bounds the surprise on sampling some data, given a

generative model” [15]. Thus, the minimization of Free Energy corroborates the

hypothesis that “a biological agent resists the tendency toward disorder through

a minimization of uncertainty” [37, 15, 13] alluded to in the previous example.

4 J. M. Ramos, L. Sa-Couto, A. Wichert

2 Improving Wake-Sleep

In spite of the WS algorithm being interesting from a neuro-scientiﬁc perspec-

tive, its’ lack of eﬃciency [23] and ability to perform as well as other learning

algorithms have led it to be less and less explored in recent years. One of its

biggest disadvantages is that when the complexity of the network increases, the

algorithm’s performance starts to be less impressive. If the complexity of the

world we are trying to mimic increases, our model needs to be able to capture

higher-level abstractions and generalize better, which can be done by increasing

the size of its network [6, 7]. However, by increasing the number of neurons on

a model’s network, the size of the search space also grows. When any model is

searching through the energy surface it can easily get stuck at a sub-optimal

local minima [20], and we believe this is the main problem of the HM with a

large hidden network.

Our proposition to overcome this problem is to provide the algorithm with

a heuristic for it to be more consistently led to optimal solutions.

Heuristics consist of ways to navigate the search space, that guide the algo-

rithm to either ﬁnd a better solution, ﬁnd a solution faster, or both. They can

be seen as generic rules that apply to a majority of the cases, allowing the agent

to avoid exploring search paths that seem unpromising.

2.1 Multi-level Data Representation and Human Image Perception

One thing that might help humans understand what they see in a better and

more structured way, is the ability to evaluate a given visual image at diﬀerent

scales. Many studies point to the fact that the human brain processes visual

inquiries at diﬀerent resolutions [44, 8]. This multi-level biological visual analysis

could be one of the many keys that enable the human brain to capture the world

it perceives in such a robust and accurate way despite the obvious extreme

complexity of its neural network.

A way to incorporate this multi-level perception into the HM is by using an

Image Pyramid representation of the dataset [35]. The Image Pyramid is a simple

way of having multi-level data representation that enables models to detect

patterns on diﬀerent scales. It consists of creating lower-level representations of

the original images in a convolutional fashion, reducing an image by a factor

each time, and creating a “sequence of copies of an original image in which both

sample density and resolution are decreased in regular steps” [1], like shown in

Fig. 1. Introducing this data representation to the training of the network would

be in accordance with the high biological plausibility that motivated the interest

in the HM model and by doing so we hope to guide its learning, in a way that ﬁrst

detects high-level patterns, and then as we add details to the samples, it would

learn more correlations on diﬀerent scales, acting as a heuristic to overcome the

exponential increase of the search space that inevitably comes with the increase

of the number of hidden layers.

Multi-level Data Representation For Training Deep Helmholtz Machines 5

Fig. 1. Example of an Image Pyramid representation of a handwritten number 5 gen-

erated by continuously downsampling the original image on the left.

2.2 Image Pyramid Heuristic for Helmholtz Machines

One way to guide our model’s training is by conﬁguring its initial position on

the search space, to a zone where we believe the probability of ﬁnding a smaller

local minima is higher like the one highlighted in Fig. 2.

Fig. 2. Example of a two-dimensional energy landscape described by a blue curve.

When traveling the energy surface with a non-stochastic gradient method, our model

would move in a way similar to a sphere being dropped in the said landscape, moved by

the force of gravity. We can understand that the starting conﬁguration of our model,

meaning, the starting position on the landscape, would have a major impact on the

absolute value of the minima achieved. There is in this case an optimal starting zone

that we highlighted in green, where if the initial conﬁguration corresponds to a point

in that zone, the minima reached would be generally better.

Weight initialization has been known to have a signiﬁcant impact on the

model’s convergence state when training with deep neural networks [38, 17].

The idea of the heuristic we want to apply to the learning of the HM is to

initialize the weights of the network so that the initial conﬁguration contains

queues of the image particularities at diﬀerent scales.

We propose to create a network with multiple hidden layers, with increasing

sizes from top to bottom where each layer must correspond to the size of a

down-sampled image.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Multi-levelDataRepresentationForTrainingDeepHelmholtzMachinesJoseMiguelRamosjose.miguel.ramos@tecnico.ulisboa.ptLuisSa-Coutoluis.sa.couto@tecnico.ulisboa.ptAndreasWichertandreas.wichert@tecnico.ulisboa.ptDepartmentofComputerScienceandEngineering,INESC-ID&InstitutoSuperiorTecnico,UniversityofLisbon,...

展开>> 收起<<

Multi-level Data Representation For Training Deep Helmholtz Machines Jose Miguel Ramos jose.miguel.ramostecnico.ulisboa.pt.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Multi-level Data Representation For Training Deep Helmholtz Machines Jose Miguel Ramos jose.miguel.ramostecnico.ulisboa.pt

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: