1 Toward an Over-parameterized Direct-Fit Model of Visual Perception

2025-04-28 0 0 871.3KB 17 页 10玖币

侵权投诉

Toward an Over-parameterized Direct-Fit Model of

Visual Perception

Xin Li

xin.li@ieee.org

Abstract

In this paper, we revisit the problem of computational modeling of simple and complex cells for an over-

parameterized and direct-ﬁt model of visual perception. Unlike conventional wisdom, we highlight the difference in

parallel and sequential binding mechanisms between simple and complex cells. A new proposal for abstracting them

into space partitioning and composition is developed as the foundation of our new hierarchical construction. Our

construction can be interpreted as a product topology-based generalization of the existing k-d tree, making it suitable

for brute-force direct-ﬁt in a high-dimensional space. The constructed model has been applied to several classical

experiments in neuroscience and psychology. We provide an anti-sparse coding interpretation of the constructed

vision model and show how it leads to a dynamic programming (DP)-like approximate nearest-neighbor search

based on `∞-optimization. We also brieﬂy discuss two possible implementations based on asymmetrical (decoder

matters more) auto-encoder and spiking neural networks (SNN), respectively.

I. INTRODUCTION

How do we learn to see in the ﬁrst six months after birth? To answer this question, David Hubel and

Torsten Wiesel conducted pioneering experiments in the 1950s, leading to the discovery of simple and

complex cells [43]. Inspired by their discovery, David Marr developed a theory of the neocortex [62] in

1970 and a theory of the hippocampus [63] in 1971. His computational investigation of vision [61] was

published in 1982 after his death. The construction of neocognitron by Fukushima [31] and connectionist

models by LeCun [50] in the 1980s represented the continuing effort to construct biologically plausible

computational models for visual perception. Wavelet theory [57, 23] and sparse coding [70, 71] in the 1990s

further supplied mathematical formulations of multi-resolution analysis for scale-invariant representation

of images. Rapid advances in deep learning [33, 51], especially the class of over-parameterized models

[7, 6] have expedited both the theory and practice of data-driven/learning-based visual processing.

Despite the great progress of today, the gap between biological and artiﬁcial vision remains signiﬁcant

in the following aspects. First, the network architecture of the convolutional neural network (CNN) is

characterized by the pooling of layers, which reduces the dimensionality of the input data. This is in

sharp contrast to the increase in the number of neurons and synapses as we move from the lower layer

(e.g. V1) to the higher layer (e.g. V4) of neocortex. This anatomical ﬁnding has inspired H. Barlow

to revise his redundancy reduction hypothesis into the redundancy exploitation hypothesis [9] in 2001.

Second, although recurrent neural networks including long short-term memory (LSTM) [42] take into

account temporal dynamics and have found successful applications in 1D signal analysis, the role of

memory in visual perception has remained explored. In the human vision system, the hippocampus is

known to play a critical role in various cognition tasks, including memory consolidation and novelty

detection [49]. Finally, it remains a mystery how the human brain can manage to achieve the objectives

of learning and memory with more than 100-1000 trillion synapses and a power budget of less than 20W.

The challenge of breaking the conventional von Neumann architecture built upon the Turing machine

remains a holy grail in neuromorphic computing.

The motivation behind this paper is two-fold. On the one hand, both the human brain and CNN are

characterized by the ability to optimize an astronomical number of synaptic weights [20]. The class of

over-parameterized models [7, 6] has shown some counterintuitive properties, such as double descent

[68]. Analytical tools such as neural tangent kernel (NTK) offer an approach to understanding over-

parameterization in the Hilbert space, but, like all kernel methods, they are not compatible with the

arXiv:2210.03850v2 [cs.CV] 11 Oct 2022

recursion strategy (e.g. dynamic programming that builds upon the optimality of substructures [13]).

We seek to understand overparameterization in the framework of optimizing hierarchical representations

[85]. On the other hand, an evolutionary perspective on biological and artiﬁcial neural networks [34]

offers a direct-ﬁt approach in brute force. Such a deceivingly simple model, when combined with over-

parameterized optimization, offers an appealing solution to increase the generalization (predictive) power

without explicitly modeling the unknown generative structure underlying sensory inputs.

In this paper, we construct an over-parameterized direct-ﬁt model for visual perception. Unlike the

conventional wisdom of abstracting simple and complex cells, we use space partitioning and composition

as the building block of our hierarchical construction. In addition to biological plausibility, we offer

a geometric analysis of our construction in topological space (i.e., topological manifolds without the

deﬁnition of a distance metric or an inner product). Our construction can be interpreted as a product-

topology-based generalization of the existing k-d tree [14], making it suitable for brute-force direct-ﬁt

in a high-dimensional space. In the presence of novelty/anomaly, a surrogate model that mimics the

escape mechanism of the hippocampus can be activated for unsupervised continual learning [98]. The

constructed model has been applied to several classical experiments in neuroscience and psychology. We

also provide an anti-sparse coding interpretation [46] of the constructed vision model and present a dynamic

programming (DP)-like solution to approximate nearest neighbor in high-dimensional space. Finally,

we brieﬂy discuss two possible network implementations of the proposed model based on asymmetric

autoencoder [69] and spiking neural networks (SNN) [45], respectively.

II. NEUROSCIENCE FOUNDATION

A. Dichotomy: Excitatory and Inhibitory Neurons

During the work of Wilson and Cowan in the 1970s [92, 93], they made the crucial assumption that “all

nervous processes of any complexity depend on the interaction of excitatory and inhibitory cells.” Using

phase plane methods, they have shown simple and multiple hysteresis phenomena and limit cycle activity

with localized populations of model neurons. Their results, more or less, offer the primitive basis for

memory storage, namely stimulus intensity, which can be coded in both the average spike frequency and

the frequency of periodic variations in the average spike frequency [78]. However, such ad hoc sensory

encoding cannot explain the sophistication of learning, memory, and recognition associated with higher

functions.

B. Hebbian Learning and Anti-Hebbian Learning

Hebbian learning [38] is a dogma that claims that an increase in synaptic efﬁcacy arises from repeated

and persistent stimulation of a presynaptic cell by a postsynaptic cell. Hebbian learning rule is often

summarized as “cells that ﬁre together wire together”. The physical implementation of the Hebbian learning

rule has been well studied in the literature, for example, through spike timing-dependent plasticity (STDP)

[16]. The mechanism of STDP is to adjust the connection strengths based on the relative timing of

some neuron’s input and output action potentials. STDP as a Hebbian synaptic learning rule has been

demonstrated in various neural circuits, from insects to humans.

By analogy to excitatory and inhibitory neurons, it has been suggested that a reversal of Hebb’s postulate,

named anti-Hebbian learning, dictates the reduction (rather than increase) of the synaptic connectivity

strength between neurons following a ﬁring scenario. Synaptic plasticity that operates under the control

of an anti-Hebbian learning rule has been found to occur in the cerebellum [12]. More importantly, local

anti-Hebbian learning has been shown to be the foundation for forming sparse representations [27]. By

connecting a layer of simple Hebbian units with modiﬁable anti-Hebbian feedback connections, one can

learn to encode a set of patterns into a sparse representation in which statistical dependency between

the elements is reduced while preserving the information. However, the sparse coding represents only a

local approximation of the sensory processing machinery. To extend it to global (nonlocal) integration,

we have to assume an additional postulate, called the “hierarchical organization principle ”, which we

will introduce in the next section.

C. Simple and Complex Cells in V1

These two classes of cells were discovered by Torsten Wiesel and David Hubel in the early 1960s [43].

Simple cells respond primarily to oriented edges and gratings, which can be mathematically characterized

by Gabor ﬁlters [24]. Complex cells also respond to oriented structures; unlike simple cells, they have a

degree of spatial invariance. The difference between receptive ﬁelds and the characteristics of simple and

complex cells has inspired the invention of the neocognitron by Fukushima in 1979 [31], which foresaw

the subsequent convolutional neural network. The hierarchical convergent nature of visual processing has

also inspired the construction of the HMAX model in 1999 [81].

An important observation with the difference between simple and complex cells, as described in [43],

is their neural circuits and the corresponding temporal dynamics. Simple cells are built from center-

surrounding cells, which require simultaneous summation. On the contrary, activation of the complex

cell by a moving stimulus requires successive activation of many simple cells. Therefore, the spatial

invariance of complex cells is achieved by summation and integration of the receptive ﬁelds of simple

cells. Mathematical modeling of complex cells has been extensively studied in the literature (e.g., energy

model [3]). However, the abstraction strategies taken in this paper will be different from those in the open

literature.

D. Mountcastle’s Universal Principle

In 1978 Mountcastle suggested a universal processing principle that has been acclaimed as the Rosetta

Stone of neuroscience. According to Mountcastle, all parts of the neocortex operate according to a

common principle, the cortical column being the unit of computation [67]. If Mountcastle were correct,

the “simple discovery” made by Hubel and Wiesel might have deeper implications in the mechanism of

visual processing beyond V1. Along this line of reasoning, the striking difference between simultaneous

and successive activation of simple and complex cells might illustrate a fundamental contrast between

two classes of binding mechanism among neurons.

In visual perception, it has been hypothesized that the characteristics of individual objects are bound

/ segregated by Hebbian / anti-Hebbian learning of different groups of neurons [66]. We conjecture that

there exist two types of binding mechanism (parallel vs. sequential) that are analogous to combinatorial

and sequential logic in digital circuits. The former plays the role of integrating spatially overlapped parts

into a whole (e.g., a horizontal edge and a vertical edge form a letter “T”) or multiple features of the

same object into a coherent perception (e.g., the age and gender of a face) in object recognition. Parallel

binding can be interpreted as an extension of von der Malsburg’s correlation theory [58]. This is the

mechanism adopted by the dorsal stream to support the task of object vision. The latter is at the core of

integrating spatially non-overlapped parts into a whole (e.g., the concatenation of letters into a word) in

spatial vision, which belongs to the ventral stream/pathway. Sequential binding is closely related to the

formation of short-term memory (e.g., Miller’s law [65]) and long-term memory (e.g., Atkinson–Shiffrin

model [8]) in the brain. The fundamental difference between parallel and sequential binding is that the

former is invariant to permutation (the ordering of parts does not affect the perception of the whole),

while the latter is sensitive to the ordering of the neuronal groups.

Binding by neuronal synchrony has been widely recognized in the literature; however, the binding

problem is often thought to suffer from the so-called “superposition catastrophe” [91]. The combination

coding argument often faces the dilemma of the curse of dimensionality, and it has been suggested that

a representation with hierarchical structure can at least partially overcome this barrier of impracticality

with combination coding. More importantly, we argue that our intuition about the capacity of the cortex

might be misleading and our understanding of the power of hierarchical structures is inadequate [35]. If

the curse of dimensionality can become a blessing [18], the combination coding can be made compatible

with the hypothesis of redundancy exploitation [9].

III. CONSTRUCTION OF AN OVER-PARAMETERIZED DIRECT-FIT MODEL

In neuroscience, the principle of hierarchical organization can be roughly stated as follows: The nested

structure of the physical world is mirrored by the hierarchical organization of the neocortex [35]. Unlike

the mathematical construction of wavelets [23, 57], we envision that nature has discovered an elegant

“elementary” solution in topological space (without the extra structure of distance metric or inner product)

to manage the complexity of sensory stimuli in the physical world. We propose to study the following

problem as the fruit-ﬂy problem in visual perception [39].

Problem Formulation of Visual Perception

Given a visual stimulus (e.g., a sequence of images) as input, group/cluster them into different classes

in an unsupervised manner.

The solution, as manifested by an infant’s development of the visual cortex (primarily for ventral stream

for object vision) during the ﬁrst six months after birth, lies in a novel construction of a hierarchical direct-

ﬁt model based on simple and complex cells. As argued by Jean Piaget [76], the ordering of mathematical

spaces in early children cognitive development (topology before geometry) is the opposite to that of

what we have learned in school (topology after geometry). Therefore, we attempt to construct our visual

perception model in topological space with the least amount of assumed mathematical structures.

A. Preliminary on Topological Space

To facilitate our abstraction of simple and complex cells by subspace and product topology, we brieﬂy

review the basics of topological space as follows. We will follow the axiomatization of Felix Hausdorff to

construct the topological space using neighborhood as the building block. Let Ndenote the neighborhood

function assigning to each point x∈Xa non-empty subset N(x)∈X. Then the following axioms must

be satisﬁed for Xwith Nto be called a topological space.

1) If Nis a neighborhood of x(i.e., N∈ N (x)), then x∈N;

2) If Nis a subset of Xand includes a neighborhood of x, then Nis a neighborhood of x;

3) The intersection of two neighborhoods of xis a neighborhood of x;

4) Any neighborhood Nof xincludes a neighborhood Mof xsuch that Nis a neighborhood of each

point in M.

Note that the fourth axiom plays the role of linking the neighborhoods of different points in Xtogether.

Since no distance metric is deﬁned, we need to deﬁne the basis as the starting point for deﬁning a topology.

Basis of a Topology: Let Xbe a set, and suppose that Bis a collection of subsets of X. Then Bis a

basis for some topology in Xif and only if the following two conditions are satisﬁed: a) ∪B∈BB=X; b)

If B1,B2∈ B and x∈B1∩B2, there exists an element B3∈ B such that x∈B3⊆B1∩B2.

In this work, we will only consider the neighborhood basis, which is deﬁned by

Neighborhood Basis: A neighborhood basis is a subset B ⊆ N (x)such that for all V∈ N (x), there

exists some B∈ B such that B⊆V. In other words, for any neighborhood Vwe can ﬁnd a neighborhood

Bon the basis of the neighborhood contained in V.

With the above setup, the objective is to construct a hierarchical direct-ﬁt model in the Hausdorff space

(a.k.a. topological manifold), which generalizes the existing multi-resolution analysis in the Hilbert space

[57]. Following our intuition above, simple and complex cells will be abstracted into subspace and product

topology [54], respectively. Formally, we have the following.

Subspace Topology: Let Xbe a topological space and let S⊆Xbe any subset. Then TS={U⊆S:

U=S∩Vfor some open subset V⊆X}is the subspace topology.

Product Topology: Suppose that X1, ..., Xnare arbitrary topological spaces. In its Cartesian prod-

uct X1×... ×Xn, the product topology is generated on the following basis: B={U1×... ×Un:

Uiis an open subset of Xi, i = 1, ..., n}.

Both subspace and product topologies have their uniqueness in terms of satisfying the characteristic

property [48]. The geometric intuition behind our construction of the new hierarchical model is best

illustrated by the duality between space partitioning (i.e., subspace topology) and composition (i.e., product

topology).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1TowardanOver-parameterizedDirect-FitModelofVisualPerceptionXinLixin.li@ieee.orgAbstractInthispaper,werevisittheproblemofcomputationalmodelingofsimpleandcomplexcellsforanover-parameterizedanddirect-tmodelofvisualperception.Unlikeconventionalwisdom,wehighlightthedifferenceinparallelandsequentialbind...

展开>> 收起<<

1 Toward an Over-parameterized Direct-Fit Model of Visual Perception.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Toward an Over-parameterized Direct-Fit Model of Visual Perception

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: