Belief propagation generalizes backpropagation Frederik Eaton frederikgmail.com

2025-04-27 1 0 401.53KB 12 页 10玖币

侵权投诉

Belief propagation generalizes backpropagation

Frederik Eaton

frederik@gmail.com

September 29, 2022

Abstract

The two most important algorithms in artiﬁcial

intelligence are backpropagation and belief prop-

agation. In spite of their importance, the connec-

tion between them is poorly characterized. We

show that when an input to backpropagation is

converted into an input to belief propagation so

that (loopy) belief propagation can be run on it,

then the result of belief propagation encodes the

result of backpropagation; thus backpropagation

is recovered as a special case of belief propaga-

tion. In other words, we prove for apparently the

ﬁrst time that belief propagation generalizes back-

propagation. Our analysis is a theoretical contri-

bution, which we motivate with the expectation

that it might reconcile our understandings of each

of these algorithms, and serve as a guide to engi-

neering researchers seeking to improve the behav-

ior of systems that use one or the other.

1 Introduction

In this paper we consider a connection between two

algorithms which could be said to have the status of

being the two most fundamental algorithms in the

various ﬁelds of computer science concerned with the

numerical modeling of real world systems, these ﬁelds

being sometimes known as artiﬁcial intelligence or

machine learning, sometimes called control theory or

statistical modeling or approximate inference. The

two algorithms that form our subject matter are usu-

ally referred to as backpropagation and belief propa-

gation, respectively, although these are modern terms

for concepts that go back several hundred years in

Western thought [1, 2, 3].

Back-propagation is another name for the chain

rule of diﬀerential calculus [4], applied iteratively to a

network of functions, or in other words to a function

of functions of multiple independent variables and

other such functions; the input to backpropagation

may also be known in the ﬁeld as a “deep network”

or a “neural network” [5].

Belief propagation, by contrast, takes as its input

a network of probability distributions, also called a

probabilistic network. Belief propagation is equiva-

lent to an iterative application of Bayes’ rule, which

is the rule for inferring the posterior distribution

P(X|Y) of a random variable Xfrom its prior P(X)

and some table of conditional probabilities P(Y|X)

describing the possible observations of some related

variable Y:P(X|Y) = 1

Z(Y)P(X)P(Y|X), often

stated as P(X|Y) = P(X)P(Y|X)

P(Y)[6, 7]. Belief propa-

gation forbids the sharing of variables among multiple

tables of conditionals in its input, implying that there

must be no loops in the network, but the symme-

try between the posteriors P(X|Y) and the possible

observations P(Y|X) allows this dynamic program-

ming algorithm to be recast as a sequence of “mes-

sage updates” which apply equally well to networks

with variable sharing and therefore loops, as was ob-

served by Pearl in the 1980s [8]. The generalized,

message-based form of belief propagation is called

“loopy belief propagation” [9] (it is also called the

“sum-product algorithm” [10]), and while this gener-

alization can no longer be said to compute exact pos-

teriors for its variables, its numerical behavior and its

usefulness in approximation have been the object of

arXiv:2210.00610v1 [cs.AI] 2 Oct 2022

much study, and it enjoys widespread application in

certain specialized domains such as error correcting

codes [11, 12]. Loopy belief propagation, to restate

the above deﬁnition, is “exact on trees”, a tree being

a network with no loops, in which case loopy belief

propagation reduces to belief propagation. Its theo-

retical properties are otherwise diﬃcult to character-

ize [13], and much research has been directed towards

improving the accuracy of belief propagation by ac-

counting for the presence of loops in the input model

[14, 15, 16, 17, 18].

u v

g h

w x y

Figure 1: Example network

This paper considers another class of inputs

for which loopy belief propagation computes exact

quantities, namely probabilistic networks that arise

through a straightforward “lifting”1of function net-

works. It is simple to show that the “delta func-

1Our terminology. For a similar use of the term ”lifting”

in probabilistic inference see [19]; this is connected to type-

theoretic lifting [20]; but not to be confused with lifted infer-

tion” posteriors computed by loopy belief propaga-

tion on these networks are exact, as they are just a

probabilistic representation of the deterministic com-

putation embodied in the original function network.

Moreover, we show for the ﬁrst time that when the

output node of the lifted network is attached to a

Boltzmann distribution [25, 26] prior, the messages

that propagate backwards through the network en-

code a representation of the exact derivatives of the

output variable with respect to each other variable,

making loopy belief propagation on the lifted net-

work an extended or lifted form of backpropagation

on the original function network. This second result

is the main contribution of the paper, establishing

that belief propagation is a generalization of back-

propagation.

2 Example model

We ﬁnd it useful to introduce the concepts of this

paper through a small example function network. We

deﬁne a network containing a single loop, and a single

shared variable x, from the following equations:

z=f(u, v)u=g(w, x)v=h(x, y)x=j(t)

(1)

The network is illustrated in ﬁgure 1.

3 Backpropagation

Running backpropagation on this function network

means calculating the derivative of zwith respect to

the six other variables recursively using the chain rule

of diﬀerential calculus, starting with the variable u

du=∂z

∂u ≡∂f

∂u ≡f(u)(u, v) (2)

where we have used “≡” to show the equivalence of

alternate notations for partial derivatives. Then for

vwe have

dv=f(v)(u, v) (3)

ence [21, 22], type lifting in compilers [23], or von Neumann’s

concept of lifting in measure theory [24].

and for w

dw=dz

∂u

∂w ≡dz

dug(w)(w, x) (4)

and so on. The term “adjoint” is used as a short-

hand: since dzalways appears in the numerator in

backpropagation, rather than write “the derivative of

zwith respect to w” we call this quantity “the ad-

joint of w[with respect to z]” [27]. Calculating the

adjoint of xrequires our ﬁrst addition:

dx=dz

∂u

∂x +dz

∂v

∂x ≡dz

dug(x)(w, x) + dz

dvh(x)(x, y)

(5)

The last adjoint to be calculated is t:

dt=dz

dxj(t)(x) (6)

For a general function network, the chain rule

would be written [4, 28]

dxi

ki

dxk

∂fk

∂xi

(7)

where “ki” means iterating over the parents k

of i, and where the general network is deﬁned as a

collection of functions and variables

xk=fk({xi|i≺k}) (8)

The adjoints of parent variables are calculated and

recorded before their children in a backward pass over

the network, giving rise to the term “backpropaga-

tion” [5].

4 Lifting

We are interested in knowing what happens when we

try to run belief propagation on our network, but

ﬁrst we have to convert the function network into

a probabilistic network with continuous real-valued

variables. To use belief propagation in this setting,

we must represent the variables in our network as

probability density functions. This requires that we

ﬁrst deﬁne a probability distribution over the real

numbers which places all of its mass on a single value:

P(x= 0) = 1, P (x6= 0) = 0 (9)

The density for this distribution is called the

“Dirac delta function” [29], written δ(x). This is not

a true function since it is inﬁnite at x= 0, but we

can think of it as a limit of functions, for example a

limit of Gaussians whose standard deviation tends to

zero (see ﬁgure 2):

δ(x) = lim

σ→0

σ√2πe−1

2(x

σ)2(10)

Although this limit itself is not well-deﬁned, it tells

us symbolically how to treat the delta function when

it appears inside an integral, namely by doing the

integral ﬁrst and then taking the limit:

Zf(x)δ(x)dx≡lim

σ→0Zf(x)1

σ√2πe−1

2(x

σ)2dx=f(0)

(11)

It may be that an algorithmic implementation of our

proposed lifting would approximate delta functions

with very narrow Gaussians, in which case we still

expect belief propagation to be well-behaved, but we

do not go into an analysis of that behavior here.

Figure 2: Gaussian distributions getting narrower

The lifting operation simply replaces each function

node z=f(u, v) with a positive-valued “factor” de-

ﬁned on all three variables:

F(u, v, z) = δ(f(u, v)−z) (12)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

BeliefpropagationgeneralizesbackpropagationFrederikEatonfrederik@gmail.comSeptember29,2022AbstractThetwomostimportantalgorithmsinarticialintelligencearebackpropagationandbeliefprop-agation.Inspiteoftheirimportance,theconnec-tionbetweenthemispoorlycharacterized.Weshowthatwhenaninputtobackpropagation...

展开>> 收起<<

Belief propagation generalizes backpropagation Frederik Eaton frederikgmail.com.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Belief propagation generalizes backpropagation Frederik Eaton frederikgmail.com

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: