2
beyond conventional generalisation theory is, therefore,
needed.
Here we bring to completion our previous conjecture
[19] that DNN architectures can be considered the semi-
classical limit of a generalised quantum neural-network
(QNN) architecture, the topological quantum neural net-
work (TQNN). To this purpose, we elaborate on the
theoretical framework we proposed in [19], and provide
empirical evidence to support our thesis, ultimately de-
riving a novel answer to the problem of generalisation.
TQNNs differ from conventional QNNs [20,21] by allow-
ing the number of “layers” and their connection-topology
to vary arbitrarily, provided only that the input and out-
put boundary conditions are preserved. The full gen-
erality of the TQNN framework as a representation of
computational processes acting on classical data to yield
classical outputs has been recently proven [22]. We pro-
pose, in particular, that DNNs generalise because they
are classical limits of TQNNs. DNNs are, from this per-
spective, implementations of distributed, high-entropy,
error-correcting codes that can be seen as classical lim-
its of the quantum error-correcting codes (QECCs) [23]
implemented by TQNNs [24]. Increasing the model ca-
pacity of a DNN increases its generalisation ability be-
cause it increases the redundancy available in the code
space. We demonstrate, using the perceptron architec-
ture as a simplified example, that a TQNN can be used
to directly compute the parameters needed to generalise
from a training set, in the absence of any actual train-
ing of the network. This raises the possibility of replacing
data-intensive training of DNNs with quantum computa-
tions, with a significant increase in efficiency and decrease
in operational costs.
The plan of the paper is the following. In §II, we
provide a formal definition of generalisation (§II A), and
briefly review some expectations of conventional general-
isation theory and how they have been observed to fail
in DNNs (§II B). We then show, via a simple thought
experiment, how generalisation can be expected to im-
prove as parameters are added when a QECC is employed
(§II C). We provide a brief review of topological quantum
field theory (TQFT), the theoretical framework under-
lying TQNNs, in §III, and summarise the TQNN con-
struction in §IV. In §Vwe discuss the theoretical foun-
dations of the correspondence between generalisation and
the semi-classical limit of TQNNs. Specifically, in §V A
we introduce the concept of generalisation as the semi-
classical limit of TQNNs. In §V B we discuss the paradig-
matic case of the sum over quantum histories of a one-
particle state, considered in analogy with a perceptron.
We show that generalisation arises naturally from the
path-integral formalism. In §V C we extend this con-
struction to any TQNN, by rephrasing the path-integral
realising the sum over quantum histories for the percep-
tron, with generic spin-network states. In §VI we compu-
tationally showcase the theory of perceptron as a semi-
classical limit of TQNN. In §VII we discuss possible per-
spectives to develop the current theoretical framework.
Finally, in §VIII we spell out our conclusions.
II. BACKGROUND AND MOTIVATION
A. The generalisation bound problem
Informally, generalisation is the ability to identify
members of some set (or class) Dafter being exposed to
the members of some proper subset S⊂D. The generali-
sation ability of machine learning (ML) systems is tested
by exposing them to new members of Dafter training
via backpropagation (or similar algorithm) on the mem-
bers of S. Whether they have identified new members
of Dcorrectly is determined by human knowledge of D.
This knowledge is, however, not explicitly characterised:
if we had an algorithmic specification of the members of
D, we could encode that specification directly and ML
would not be necessary. We can make this dependence
on non-algorithmic human knowledge explicit by defining
generalisation as follows.
We can, without loss of generality, treat Das a set
of N-bit strings, i.e. D={0,1}N. Let S⊂Dbe the
training set. Let Y={0,1}M, with M≤N, be the set
of possible “classes” or “answers” to be obtained by acting
on D. In most cases of interest, M≪N, e.g. classifying
handwritten characters, distinguishing cats from dogs, or
identifying individual people from photographs.
Now consider two agents Aand B, who implement
functions fA, fB:D→Yand gA, gB:S→Yrespec-
tively, where we assume gA=fA|Sand gB=fB|S. This
latter assumption is true even if Aand/or Bis trained
on a set S′⊃Sthat also contains noise or distractors, as
in [2]. We can call Athe “ground truth” observer (e.g. a
human) and Bthe “test” observer (e.g. a ML system).
The “training error” is d(gA, gB), where dis some met-
ric on the relevant function space, and the “test error”
or “generalisation error” is d(fA, fB). We can measure
d(gA, gB)because we can explicitly list every instance of
gAand gBacting on S. Both fAand fBare unknown,
so we cannot compute d(fA, fB).
We can now state the generalisation bound problem
(GBP) as: given d(gA, gB)≤ϵ, determine an upper
bound β≥ϵsuch that d(fA, fB)≤β. There is an a
priori upper bound on β:2N, corresponding to Aand B
disagreeing on every instance in D. In this case, ϵ= 2M.
Our goal is to make β→ϵas ϵ→0, which corresponds
to both optimal learning and Sbeing optimally represen-
tative of D.
It is worth noting that the GBP cannot be solved in
the general case. The set Dmay not be computable; we
know, for example, that the generalisation from some set
Sof programs that halt to the set Dof all programs that
halt is not computable [25]. There is, moreover, every
reason to expect that human judges may disagree about
edge cases for some classes of objects, and hence about
membership in some candidate sets D, rendering test er-
ror in such cases observer-dependent. It is also worth