
Belief propagation generalizes backpropagation
Frederik Eaton
frederik@gmail.com
September 29, 2022
Abstract
The two most important algorithms in artificial
intelligence are backpropagation and belief prop-
agation. In spite of their importance, the connec-
tion between them is poorly characterized. We
show that when an input to backpropagation is
converted into an input to belief propagation so
that (loopy) belief propagation can be run on it,
then the result of belief propagation encodes the
result of backpropagation; thus backpropagation
is recovered as a special case of belief propaga-
tion. In other words, we prove for apparently the
first time that belief propagation generalizes back-
propagation. Our analysis is a theoretical contri-
bution, which we motivate with the expectation
that it might reconcile our understandings of each
of these algorithms, and serve as a guide to engi-
neering researchers seeking to improve the behav-
ior of systems that use one or the other.
1 Introduction
In this paper we consider a connection between two
algorithms which could be said to have the status of
being the two most fundamental algorithms in the
various fields of computer science concerned with the
numerical modeling of real world systems, these fields
being sometimes known as artificial intelligence or
machine learning, sometimes called control theory or
statistical modeling or approximate inference. The
two algorithms that form our subject matter are usu-
ally referred to as backpropagation and belief propa-
gation, respectively, although these are modern terms
for concepts that go back several hundred years in
Western thought [1, 2, 3].
Back-propagation is another name for the chain
rule of differential calculus [4], applied iteratively to a
network of functions, or in other words to a function
of functions of multiple independent variables and
other such functions; the input to backpropagation
may also be known in the field as a “deep network”
or a “neural network” [5].
Belief propagation, by contrast, takes as its input
a network of probability distributions, also called a
probabilistic network. Belief propagation is equiva-
lent to an iterative application of Bayes’ rule, which
is the rule for inferring the posterior distribution
P(X|Y) of a random variable Xfrom its prior P(X)
and some table of conditional probabilities P(Y|X)
describing the possible observations of some related
variable Y:P(X|Y) = 1
Z(Y)P(X)P(Y|X), often
stated as P(X|Y) = P(X)P(Y|X)
P(Y)[6, 7]. Belief propa-
gation forbids the sharing of variables among multiple
tables of conditionals in its input, implying that there
must be no loops in the network, but the symme-
try between the posteriors P(X|Y) and the possible
observations P(Y|X) allows this dynamic program-
ming algorithm to be recast as a sequence of “mes-
sage updates” which apply equally well to networks
with variable sharing and therefore loops, as was ob-
served by Pearl in the 1980s [8]. The generalized,
message-based form of belief propagation is called
“loopy belief propagation” [9] (it is also called the
“sum-product algorithm” [10]), and while this gener-
alization can no longer be said to compute exact pos-
teriors for its variables, its numerical behavior and its
usefulness in approximation have been the object of
1
arXiv:2210.00610v1 [cs.AI] 2 Oct 2022