Belief propagation generalizes backpropagation Frederik Eaton frederikgmail.com

2025-04-27 0 0 401.53KB 12 页 10玖币
侵权投诉
Belief propagation generalizes backpropagation
Frederik Eaton
frederik@gmail.com
September 29, 2022
Abstract
The two most important algorithms in artificial
intelligence are backpropagation and belief prop-
agation. In spite of their importance, the connec-
tion between them is poorly characterized. We
show that when an input to backpropagation is
converted into an input to belief propagation so
that (loopy) belief propagation can be run on it,
then the result of belief propagation encodes the
result of backpropagation; thus backpropagation
is recovered as a special case of belief propaga-
tion. In other words, we prove for apparently the
first time that belief propagation generalizes back-
propagation. Our analysis is a theoretical contri-
bution, which we motivate with the expectation
that it might reconcile our understandings of each
of these algorithms, and serve as a guide to engi-
neering researchers seeking to improve the behav-
ior of systems that use one or the other.
1 Introduction
In this paper we consider a connection between two
algorithms which could be said to have the status of
being the two most fundamental algorithms in the
various fields of computer science concerned with the
numerical modeling of real world systems, these fields
being sometimes known as artificial intelligence or
machine learning, sometimes called control theory or
statistical modeling or approximate inference. The
two algorithms that form our subject matter are usu-
ally referred to as backpropagation and belief propa-
gation, respectively, although these are modern terms
for concepts that go back several hundred years in
Western thought [1, 2, 3].
Back-propagation is another name for the chain
rule of differential calculus [4], applied iteratively to a
network of functions, or in other words to a function
of functions of multiple independent variables and
other such functions; the input to backpropagation
may also be known in the field as a “deep network”
or a “neural network” [5].
Belief propagation, by contrast, takes as its input
a network of probability distributions, also called a
probabilistic network. Belief propagation is equiva-
lent to an iterative application of Bayes’ rule, which
is the rule for inferring the posterior distribution
P(X|Y) of a random variable Xfrom its prior P(X)
and some table of conditional probabilities P(Y|X)
describing the possible observations of some related
variable Y:P(X|Y) = 1
Z(Y)P(X)P(Y|X), often
stated as P(X|Y) = P(X)P(Y|X)
P(Y)[6, 7]. Belief propa-
gation forbids the sharing of variables among multiple
tables of conditionals in its input, implying that there
must be no loops in the network, but the symme-
try between the posteriors P(X|Y) and the possible
observations P(Y|X) allows this dynamic program-
ming algorithm to be recast as a sequence of “mes-
sage updates” which apply equally well to networks
with variable sharing and therefore loops, as was ob-
served by Pearl in the 1980s [8]. The generalized,
message-based form of belief propagation is called
“loopy belief propagation” [9] (it is also called the
“sum-product algorithm” [10]), and while this gener-
alization can no longer be said to compute exact pos-
teriors for its variables, its numerical behavior and its
usefulness in approximation have been the object of
1
arXiv:2210.00610v1 [cs.AI] 2 Oct 2022
much study, and it enjoys widespread application in
certain specialized domains such as error correcting
codes [11, 12]. Loopy belief propagation, to restate
the above definition, is “exact on trees”, a tree being
a network with no loops, in which case loopy belief
propagation reduces to belief propagation. Its theo-
retical properties are otherwise difficult to character-
ize [13], and much research has been directed towards
improving the accuracy of belief propagation by ac-
counting for the presence of loops in the input model
[14, 15, 16, 17, 18].
z
f
u v
g h
w x y
j
t
Figure 1: Example network
This paper considers another class of inputs
for which loopy belief propagation computes exact
quantities, namely probabilistic networks that arise
through a straightforward “lifting”1of function net-
works. It is simple to show that the “delta func-
1Our terminology. For a similar use of the term ”lifting”
in probabilistic inference see [19]; this is connected to type-
theoretic lifting [20]; but not to be confused with lifted infer-
tion” posteriors computed by loopy belief propaga-
tion on these networks are exact, as they are just a
probabilistic representation of the deterministic com-
putation embodied in the original function network.
Moreover, we show for the first time that when the
output node of the lifted network is attached to a
Boltzmann distribution [25, 26] prior, the messages
that propagate backwards through the network en-
code a representation of the exact derivatives of the
output variable with respect to each other variable,
making loopy belief propagation on the lifted net-
work an extended or lifted form of backpropagation
on the original function network. This second result
is the main contribution of the paper, establishing
that belief propagation is a generalization of back-
propagation.
2 Example model
We find it useful to introduce the concepts of this
paper through a small example function network. We
define a network containing a single loop, and a single
shared variable x, from the following equations:
z=f(u, v)u=g(w, x)v=h(x, y)x=j(t)
(1)
The network is illustrated in figure 1.
3 Backpropagation
Running backpropagation on this function network
means calculating the derivative of zwith respect to
the six other variables recursively using the chain rule
of differential calculus, starting with the variable u
dz
du=z
u f
u f(u)(u, v) (2)
where we have used “” to show the equivalence of
alternate notations for partial derivatives. Then for
vwe have
dz
dv=f(v)(u, v) (3)
ence [21, 22], type lifting in compilers [23], or von Neumann’s
concept of lifting in measure theory [24].
2
and for w
dz
dw=dz
du
u
w dz
dug(w)(w, x) (4)
and so on. The term “adjoint” is used as a short-
hand: since dzalways appears in the numerator in
backpropagation, rather than write “the derivative of
zwith respect to w” we call this quantity “the ad-
joint of w[with respect to z]” [27]. Calculating the
adjoint of xrequires our first addition:
dz
dx=dz
du
u
x +dz
dv
v
x dz
dug(x)(w, x) + dz
dvh(x)(x, y)
(5)
The last adjoint to be calculated is t:
dz
dt=dz
dxj(t)(x) (6)
For a general function network, the chain rule
would be written [4, 28]
dz
dxi
=X
ki
dz
dxk
fk
xi
(7)
where “ki” means iterating over the parents k
of i, and where the general network is defined as a
collection of functions and variables
xk=fk({xi|ik}) (8)
The adjoints of parent variables are calculated and
recorded before their children in a backward pass over
the network, giving rise to the term “backpropaga-
tion” [5].
4 Lifting
We are interested in knowing what happens when we
try to run belief propagation on our network, but
first we have to convert the function network into
a probabilistic network with continuous real-valued
variables. To use belief propagation in this setting,
we must represent the variables in our network as
probability density functions. This requires that we
first define a probability distribution over the real
numbers which places all of its mass on a single value:
P(x= 0) = 1, P (x6= 0) = 0 (9)
The density for this distribution is called the
“Dirac delta function” [29], written δ(x). This is not
a true function since it is infinite at x= 0, but we
can think of it as a limit of functions, for example a
limit of Gaussians whose standard deviation tends to
zero (see figure 2):
δ(x) = lim
σ0
1
σ2πe1
2(x
σ)2(10)
Although this limit itself is not well-defined, it tells
us symbolically how to treat the delta function when
it appears inside an integral, namely by doing the
integral first and then taking the limit:
Zf(x)δ(x)dxlim
σ0Zf(x)1
σ2πe1
2(x
σ)2dx=f(0)
(11)
It may be that an algorithmic implementation of our
proposed lifting would approximate delta functions
with very narrow Gaussians, in which case we still
expect belief propagation to be well-behaved, but we
do not go into an analysis of that behavior here.
Figure 2: Gaussian distributions getting narrower
The lifting operation simply replaces each function
node z=f(u, v) with a positive-valued “factor” de-
fined on all three variables:
F(u, v, z) = δ(f(u, v)z) (12)
3
摘要:

BeliefpropagationgeneralizesbackpropagationFrederikEatonfrederik@gmail.comSeptember29,2022AbstractThetwomostimportantalgorithmsinarti cialintelligencearebackpropagationandbeliefprop-agation.Inspiteoftheirimportance,theconnec-tionbetweenthemispoorlycharacterized.Weshowthatwhenaninputtobackpropagation...

展开>> 收起<<
Belief propagation generalizes backpropagation Frederik Eaton frederikgmail.com.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:401.53KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注