1 Toward an Over-parameterized Direct-Fit Model of Visual Perception

2025-04-28 0 0 871.3KB 17 页 10玖币
侵权投诉
1
Toward an Over-parameterized Direct-Fit Model of
Visual Perception
Xin Li
xin.li@ieee.org
Abstract
In this paper, we revisit the problem of computational modeling of simple and complex cells for an over-
parameterized and direct-fit model of visual perception. Unlike conventional wisdom, we highlight the difference in
parallel and sequential binding mechanisms between simple and complex cells. A new proposal for abstracting them
into space partitioning and composition is developed as the foundation of our new hierarchical construction. Our
construction can be interpreted as a product topology-based generalization of the existing k-d tree, making it suitable
for brute-force direct-fit in a high-dimensional space. The constructed model has been applied to several classical
experiments in neuroscience and psychology. We provide an anti-sparse coding interpretation of the constructed
vision model and show how it leads to a dynamic programming (DP)-like approximate nearest-neighbor search
based on `-optimization. We also briefly discuss two possible implementations based on asymmetrical (decoder
matters more) auto-encoder and spiking neural networks (SNN), respectively.
I. INTRODUCTION
How do we learn to see in the first six months after birth? To answer this question, David Hubel and
Torsten Wiesel conducted pioneering experiments in the 1950s, leading to the discovery of simple and
complex cells [43]. Inspired by their discovery, David Marr developed a theory of the neocortex [62] in
1970 and a theory of the hippocampus [63] in 1971. His computational investigation of vision [61] was
published in 1982 after his death. The construction of neocognitron by Fukushima [31] and connectionist
models by LeCun [50] in the 1980s represented the continuing effort to construct biologically plausible
computational models for visual perception. Wavelet theory [57, 23] and sparse coding [70, 71] in the 1990s
further supplied mathematical formulations of multi-resolution analysis for scale-invariant representation
of images. Rapid advances in deep learning [33, 51], especially the class of over-parameterized models
[7, 6] have expedited both the theory and practice of data-driven/learning-based visual processing.
Despite the great progress of today, the gap between biological and artificial vision remains significant
in the following aspects. First, the network architecture of the convolutional neural network (CNN) is
characterized by the pooling of layers, which reduces the dimensionality of the input data. This is in
sharp contrast to the increase in the number of neurons and synapses as we move from the lower layer
(e.g. V1) to the higher layer (e.g. V4) of neocortex. This anatomical finding has inspired H. Barlow
to revise his redundancy reduction hypothesis into the redundancy exploitation hypothesis [9] in 2001.
Second, although recurrent neural networks including long short-term memory (LSTM) [42] take into
account temporal dynamics and have found successful applications in 1D signal analysis, the role of
memory in visual perception has remained explored. In the human vision system, the hippocampus is
known to play a critical role in various cognition tasks, including memory consolidation and novelty
detection [49]. Finally, it remains a mystery how the human brain can manage to achieve the objectives
of learning and memory with more than 100-1000 trillion synapses and a power budget of less than 20W.
The challenge of breaking the conventional von Neumann architecture built upon the Turing machine
remains a holy grail in neuromorphic computing.
The motivation behind this paper is two-fold. On the one hand, both the human brain and CNN are
characterized by the ability to optimize an astronomical number of synaptic weights [20]. The class of
over-parameterized models [7, 6] has shown some counterintuitive properties, such as double descent
[68]. Analytical tools such as neural tangent kernel (NTK) offer an approach to understanding over-
parameterization in the Hilbert space, but, like all kernel methods, they are not compatible with the
arXiv:2210.03850v2 [cs.CV] 11 Oct 2022
2
recursion strategy (e.g. dynamic programming that builds upon the optimality of substructures [13]).
We seek to understand overparameterization in the framework of optimizing hierarchical representations
[85]. On the other hand, an evolutionary perspective on biological and artificial neural networks [34]
offers a direct-fit approach in brute force. Such a deceivingly simple model, when combined with over-
parameterized optimization, offers an appealing solution to increase the generalization (predictive) power
without explicitly modeling the unknown generative structure underlying sensory inputs.
In this paper, we construct an over-parameterized direct-fit model for visual perception. Unlike the
conventional wisdom of abstracting simple and complex cells, we use space partitioning and composition
as the building block of our hierarchical construction. In addition to biological plausibility, we offer
a geometric analysis of our construction in topological space (i.e., topological manifolds without the
definition of a distance metric or an inner product). Our construction can be interpreted as a product-
topology-based generalization of the existing k-d tree [14], making it suitable for brute-force direct-fit
in a high-dimensional space. In the presence of novelty/anomaly, a surrogate model that mimics the
escape mechanism of the hippocampus can be activated for unsupervised continual learning [98]. The
constructed model has been applied to several classical experiments in neuroscience and psychology. We
also provide an anti-sparse coding interpretation [46] of the constructed vision model and present a dynamic
programming (DP)-like solution to approximate nearest neighbor in high-dimensional space. Finally,
we briefly discuss two possible network implementations of the proposed model based on asymmetric
autoencoder [69] and spiking neural networks (SNN) [45], respectively.
II. NEUROSCIENCE FOUNDATION
A. Dichotomy: Excitatory and Inhibitory Neurons
During the work of Wilson and Cowan in the 1970s [92, 93], they made the crucial assumption that “all
nervous processes of any complexity depend on the interaction of excitatory and inhibitory cells.” Using
phase plane methods, they have shown simple and multiple hysteresis phenomena and limit cycle activity
with localized populations of model neurons. Their results, more or less, offer the primitive basis for
memory storage, namely stimulus intensity, which can be coded in both the average spike frequency and
the frequency of periodic variations in the average spike frequency [78]. However, such ad hoc sensory
encoding cannot explain the sophistication of learning, memory, and recognition associated with higher
functions.
B. Hebbian Learning and Anti-Hebbian Learning
Hebbian learning [38] is a dogma that claims that an increase in synaptic efficacy arises from repeated
and persistent stimulation of a presynaptic cell by a postsynaptic cell. Hebbian learning rule is often
summarized as “cells that fire together wire together”. The physical implementation of the Hebbian learning
rule has been well studied in the literature, for example, through spike timing-dependent plasticity (STDP)
[16]. The mechanism of STDP is to adjust the connection strengths based on the relative timing of
some neuron’s input and output action potentials. STDP as a Hebbian synaptic learning rule has been
demonstrated in various neural circuits, from insects to humans.
By analogy to excitatory and inhibitory neurons, it has been suggested that a reversal of Hebb’s postulate,
named anti-Hebbian learning, dictates the reduction (rather than increase) of the synaptic connectivity
strength between neurons following a firing scenario. Synaptic plasticity that operates under the control
of an anti-Hebbian learning rule has been found to occur in the cerebellum [12]. More importantly, local
anti-Hebbian learning has been shown to be the foundation for forming sparse representations [27]. By
connecting a layer of simple Hebbian units with modifiable anti-Hebbian feedback connections, one can
learn to encode a set of patterns into a sparse representation in which statistical dependency between
the elements is reduced while preserving the information. However, the sparse coding represents only a
local approximation of the sensory processing machinery. To extend it to global (nonlocal) integration,
we have to assume an additional postulate, called the “hierarchical organization principle ”, which we
will introduce in the next section.
3
C. Simple and Complex Cells in V1
These two classes of cells were discovered by Torsten Wiesel and David Hubel in the early 1960s [43].
Simple cells respond primarily to oriented edges and gratings, which can be mathematically characterized
by Gabor filters [24]. Complex cells also respond to oriented structures; unlike simple cells, they have a
degree of spatial invariance. The difference between receptive fields and the characteristics of simple and
complex cells has inspired the invention of the neocognitron by Fukushima in 1979 [31], which foresaw
the subsequent convolutional neural network. The hierarchical convergent nature of visual processing has
also inspired the construction of the HMAX model in 1999 [81].
An important observation with the difference between simple and complex cells, as described in [43],
is their neural circuits and the corresponding temporal dynamics. Simple cells are built from center-
surrounding cells, which require simultaneous summation. On the contrary, activation of the complex
cell by a moving stimulus requires successive activation of many simple cells. Therefore, the spatial
invariance of complex cells is achieved by summation and integration of the receptive fields of simple
cells. Mathematical modeling of complex cells has been extensively studied in the literature (e.g., energy
model [3]). However, the abstraction strategies taken in this paper will be different from those in the open
literature.
D. Mountcastle’s Universal Principle
In 1978 Mountcastle suggested a universal processing principle that has been acclaimed as the Rosetta
Stone of neuroscience. According to Mountcastle, all parts of the neocortex operate according to a
common principle, the cortical column being the unit of computation [67]. If Mountcastle were correct,
the “simple discovery” made by Hubel and Wiesel might have deeper implications in the mechanism of
visual processing beyond V1. Along this line of reasoning, the striking difference between simultaneous
and successive activation of simple and complex cells might illustrate a fundamental contrast between
two classes of binding mechanism among neurons.
In visual perception, it has been hypothesized that the characteristics of individual objects are bound
/ segregated by Hebbian / anti-Hebbian learning of different groups of neurons [66]. We conjecture that
there exist two types of binding mechanism (parallel vs. sequential) that are analogous to combinatorial
and sequential logic in digital circuits. The former plays the role of integrating spatially overlapped parts
into a whole (e.g., a horizontal edge and a vertical edge form a letter “T”) or multiple features of the
same object into a coherent perception (e.g., the age and gender of a face) in object recognition. Parallel
binding can be interpreted as an extension of von der Malsburg’s correlation theory [58]. This is the
mechanism adopted by the dorsal stream to support the task of object vision. The latter is at the core of
integrating spatially non-overlapped parts into a whole (e.g., the concatenation of letters into a word) in
spatial vision, which belongs to the ventral stream/pathway. Sequential binding is closely related to the
formation of short-term memory (e.g., Miller’s law [65]) and long-term memory (e.g., Atkinson–Shiffrin
model [8]) in the brain. The fundamental difference between parallel and sequential binding is that the
former is invariant to permutation (the ordering of parts does not affect the perception of the whole),
while the latter is sensitive to the ordering of the neuronal groups.
Binding by neuronal synchrony has been widely recognized in the literature; however, the binding
problem is often thought to suffer from the so-called “superposition catastrophe” [91]. The combination
coding argument often faces the dilemma of the curse of dimensionality, and it has been suggested that
a representation with hierarchical structure can at least partially overcome this barrier of impracticality
with combination coding. More importantly, we argue that our intuition about the capacity of the cortex
might be misleading and our understanding of the power of hierarchical structures is inadequate [35]. If
the curse of dimensionality can become a blessing [18], the combination coding can be made compatible
with the hypothesis of redundancy exploitation [9].
4
III. CONSTRUCTION OF AN OVER-PARAMETERIZED DIRECT-FIT MODEL
In neuroscience, the principle of hierarchical organization can be roughly stated as follows: The nested
structure of the physical world is mirrored by the hierarchical organization of the neocortex [35]. Unlike
the mathematical construction of wavelets [23, 57], we envision that nature has discovered an elegant
“elementary” solution in topological space (without the extra structure of distance metric or inner product)
to manage the complexity of sensory stimuli in the physical world. We propose to study the following
problem as the fruit-fly problem in visual perception [39].
Problem Formulation of Visual Perception
Given a visual stimulus (e.g., a sequence of images) as input, group/cluster them into different classes
in an unsupervised manner.
The solution, as manifested by an infant’s development of the visual cortex (primarily for ventral stream
for object vision) during the first six months after birth, lies in a novel construction of a hierarchical direct-
fit model based on simple and complex cells. As argued by Jean Piaget [76], the ordering of mathematical
spaces in early children cognitive development (topology before geometry) is the opposite to that of
what we have learned in school (topology after geometry). Therefore, we attempt to construct our visual
perception model in topological space with the least amount of assumed mathematical structures.
A. Preliminary on Topological Space
To facilitate our abstraction of simple and complex cells by subspace and product topology, we briefly
review the basics of topological space as follows. We will follow the axiomatization of Felix Hausdorff to
construct the topological space using neighborhood as the building block. Let Ndenote the neighborhood
function assigning to each point xXa non-empty subset N(x)X. Then the following axioms must
be satisfied for Xwith Nto be called a topological space.
1) If Nis a neighborhood of x(i.e., N N (x)), then xN;
2) If Nis a subset of Xand includes a neighborhood of x, then Nis a neighborhood of x;
3) The intersection of two neighborhoods of xis a neighborhood of x;
4) Any neighborhood Nof xincludes a neighborhood Mof xsuch that Nis a neighborhood of each
point in M.
Note that the fourth axiom plays the role of linking the neighborhoods of different points in Xtogether.
Since no distance metric is defined, we need to define the basis as the starting point for defining a topology.
Basis of a Topology: Let Xbe a set, and suppose that Bis a collection of subsets of X. Then Bis a
basis for some topology in Xif and only if the following two conditions are satisfied: a) B∈BB=X; b)
If B1,B2∈ B and xB1B2, there exists an element B3∈ B such that xB3B1B2.
In this work, we will only consider the neighborhood basis, which is defined by
Neighborhood Basis: A neighborhood basis is a subset B N (x)such that for all V N (x), there
exists some B∈ B such that BV. In other words, for any neighborhood Vwe can find a neighborhood
Bon the basis of the neighborhood contained in V.
With the above setup, the objective is to construct a hierarchical direct-fit model in the Hausdorff space
(a.k.a. topological manifold), which generalizes the existing multi-resolution analysis in the Hilbert space
[57]. Following our intuition above, simple and complex cells will be abstracted into subspace and product
topology [54], respectively. Formally, we have the following.
Subspace Topology: Let Xbe a topological space and let SXbe any subset. Then TS={US:
U=SVfor some open subset VX}is the subspace topology.
Product Topology: Suppose that X1, ..., Xnare arbitrary topological spaces. In its Cartesian prod-
uct X1×... ×Xn, the product topology is generated on the following basis: B={U1×... ×Un:
Uiis an open subset of Xi, i = 1, ..., n}.
Both subspace and product topologies have their uniqueness in terms of satisfying the characteristic
property [48]. The geometric intuition behind our construction of the new hierarchical model is best
illustrated by the duality between space partitioning (i.e., subspace topology) and composition (i.e., product
topology).
摘要:

1TowardanOver-parameterizedDirect-FitModelofVisualPerceptionXinLixin.li@ieee.orgAbstractInthispaper,werevisittheproblemofcomputationalmodelingofsimpleandcomplexcellsforanover-parameterizedanddirect-tmodelofvisualperception.Unlikeconventionalwisdom,wehighlightthedifferenceinparallelandsequentialbind...

展开>> 收起<<
1 Toward an Over-parameterized Direct-Fit Model of Visual Perception.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:871.3KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注