The Entropy Method in Large Deviation Theory

2025-05-06 0 0 326.96KB 32 页 10玖币

侵权投诉

arXiv:2210.13121v2 [math.PR] 3 Nov 2022

Lei Yu∗

November 4, 2022

Abstract

This paper illustrates the power of the entropy method in address-

ing problems from large deviation theory. We provide and review en-

tropy proofs for most fundamental results in large deviation theory,

including Cramer’s theorem, the Gärtner–Ellis theorem, and Sanov’s

theorem. Moreover, by the entropy method, we also strengthen Sanov’s

theorem to the strong version.

1 Introduction

Information theory was fundamentally established by the works of Harry

Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s,

which successfully addressed the theoretic limit of information communica-

tion. Since then, information theory gradually inﬂuenced other branches of

mathematics. For example, the notion of (information) entropy introduced

by Shannon after he consulted von Neumann, the father of the computer, was

widely employed in probability theory, functional analysis, ergodic theory,

graph theory and so on.

Information theory concerns realizing reliable and eﬃcient communica-

tion of a source over a noisy channel. Entropy and mutual information were

introduced by Shannon to determine the theoretical limit of such kind of

communication. Entropy and mutual information are special cases of a more

general concept, known as the relative entropy. For a nonnegative σ-ﬁnite

measure µand a probability measure Qdeﬁned on the same space such that

∗L. Yu is with the School of Statistics and Data Science, LPMC, KLMDASR, and

LEBPS, Nankai University, Tianjin 300071, China (e-mail: leiyu@nankai.edu.cn). This

work was supported by the NSFC grant 62101286 and the Fundamental Research Funds

for the Central Universities of China (Nankai University).

Q≪µ, the relative entropy of Qwith respect to (w.r.t.) µis

D(Qkµ) = Zlog dQ

dµdQ

which is the expectation of the information density ıQkµ= log dQ

dµ. When

µis the counting measure on an countable set, H(X) = −D(Qkµ)is the

Shannon entropy of X∼Q. When µis the Lebesgue measure on an Eu-

clidean space, h(X) = −D(Qkµ)is the diﬀerential entropy of X∼Q. The

relative entropy was also widely used in the case that µis a probability

measure. In this case, for two probability measures Q≪P,

D(QkP) = Zlog dQ

dPdQ. (1)

If Qis not absolutely continuous w.r.t. P, then D(QkP) := +∞. The

mutual information between Xand Ywith (X, Y )∼PXY is I(X;Y) =

D(PXY kPX⊗PY). So, the relative entropy is a general concept incorporating

the entropy and mutual information. The relative varentropy of Qwith

respect to (w.r.t.) µis

V(Qkµ) = VarQıQkµ(X)= VarQlog dQ

dµ(X).

1.1 Organization

In Section 2, we introduce the concept of information projection (or shortly, I-

projection), which will be then used to characterize the asymptotic exponent

in the large deviation theory. In Section 3, we prove several versions of

Cramer’s theorem by the entropy method. In particular, the entropy proof

for the simple version of Cramer’s theorem in Section 3.1 is considered as

a paradigm to illustrate the power of the entropy method. Still using the

entropy method, we next generalize the Cramer’s theorem to the non-i.i.d.

setting, and obtain the Gärtner–Ellis theorem in Section 4. We then focus on

another fundamental theorem in large deviation theory—Sanov’s theorem in

Section 5. The entropy proof for this theorem is given in Section 5.1, and the

entropy proof for the strong version is given in Section 5.2. As an application,

the entropy proof of Gibbs conditioning principle is introduced in Section 6.

1.2 Notations

Let Xbe a Hausdorﬀ topological space, and BXis the Borel σ-algebra on

X. Let PX(or shortly P) be a probability measure on X. We also use

QX, RX(or shortly Q, R) to denote another two probability measures on X.

The probability measures PX, QX, RXcan be thought as the push-forward

measures (or the distributions) induced jointly by the same measurable func-

tion X(random variable) from an underlying measurable space to Xand

by diﬀerent probability measures P,Q,Rdeﬁned on the underlying measur-

able space. Without loss of generality, we assume that Xis the identity

map, and P,Q,Rare the same as PX, QX, RX. So, PX, QX, RXcould be

independently speciﬁed to arbitrary probability measures. We say that all

probability measures induced by the underlying measure P, together with the

corresponding measurable spaces, constitute the P-system. So, PXis in fact

the distribution of the random variable Xin the P-system, where the letter

“P” in the notation PXrefers to the P-system and the subscript “X” refers

to the random variable. When emphasizing the random variables, we write

X∼PXto indicate that Xfollows the distribution PXin the P-system.

We use P⊗n

Xto denote the n-fold product of PX. For a probability

measure PXand a regular conditional distribution (transition probability or

Markov kernel) PY|Xfrom Xto Y, we denote PXPY|Xas the joint proba-

bility measure induced by PXand PY|X. For a distribution PXon Xand

a measurable subset A⊆ X,PX(·|A)denotes the conditional probability

measure given A. For brevity, we write PX(x) := PX({x}), x ∈ X. In par-

ticular, if X∼PXis discrete, the restriction of PXto the set of singletons

corresponds to the probability mass function of Xin the P-system. We

use QX≪PXto denote that the distribution QXis absolutely continuous

w.r.t. PX. We use Xnto denote a random vector (X1, X2, ..., Xn)taking

values on (Xn,B⊗n

X), and use xn:= (x1, x2, ..., xn)to denote its realization.

For an n-length vector xn, we use xito denote the subvector consisting

of the ﬁrst icomponents of xn, and xn

i+1 to denote the subvector consist-

ing of the last n−icomponents. For a probability measure PXnon Xn,

we use PXk|Xk−1to denote the regular conditional distribution of Xkgiven

Xk−1induced by PXn. For a measurable function f:X → R, sometimes

we adopt the notation PX(f) := EPX[X] := RXfdPX. For a conditional

probability measure PX|Y, deﬁne the conditional expectation operator in-

duced by PX|Yas PX|Y(f)(y) := RfdPX|Y=yfor any measurable function

f: (X,BX)→(R,BR)if the integral is well-deﬁned for every y. When the

random variable Xis clear from the texture, we brieﬂy denote PXas P. For

example, we denote the expectation EPX[X]as EP[X].

The relative entropy is deﬁned in (1). The conditional relative entropy

is deﬁned as

D(QX|WkPX|W|QW) = D(QX|WQWkPX|WQW).

Given n≥1, the empirical measure for a sequence xn∈ Xnis

Lxn:= 1

i=1

δxi

where δxis Dirac mass at the point x∈ X.

Denote dP(P, Q) = inf{δ > 0 : P(A)≤Q(Aδ) + δ, ∀closed A⊆ Z}

with Aδ:= Sz∈A{z′∈ Z :d(z, z′)< δ}, where Zis Xor Y. Here, dPis

known as the Lévy–Prokhorov metric on P(Z)which is compatible with the

weak topology on P(Z). We use Bδ(R) := {Q∈ P(Z) : dP(R, Q)< δ}

and Bδ](R) := {Q∈ P(Z) : dP(R, Q)≤δ}to respectively denote an open

ball and a closed ball under the Lévy–Prokhorov metric. We use A,Ao,

and Ac:= Z\Ato respectively denote the closure, interior, and complement

of the set A⊆ Z. Denote the sublevel set of the relative entropy (or the

divergence “ball”) as Dǫ](PX) := {QX:D(QXkPX)≤ǫ}for ǫ≥0. The

Lévy–Prokhorov metric, the TV distance, and the relative entropy admit

the following relation: For any QX, PX,

p2D(QXkPX)≥ kQX−PXkTV ≥dP(QX, PX),(2)

which implies for ǫ≥0,

D√2ǫ](PX)⊆Bǫ](PX).

The ﬁrst inequality in (2) is known as Pinsker’s inequality, and the second

inequality follows by deﬁnition [7].

We denote inf ∅:= +∞,sup ∅:= −∞. Given two positive sequences

{an}∞

n=1 and {bn}∞

n=1, we say that an.bnif lim supn→∞ an/bn≤1, and

an∼bnif an.bnand bn.an.

2 Preliminaries: Information Projection

For a set Γ⊆ P(X), deﬁne

D(ΓkP) := inf

R∈ΓD(RkP).

Any probability measure R∗attaining the inﬁmum above is called the infor-

mation projection or I-projection of Pon Γ.

Lemma 1. If Γis closed in the weak topology, then the I-projection exists.

Proof. This lemma follows by the lower semicontinuity of R7→ D(RkP)

and the compactness of the sublevel sets of R7→ D(RkP)under the weak

topology.

If Γis convex and such R∗exists, the convexity of Γguarantees its unique-

ness since D(RkP)is strictly convex in R. Another important property of

the I-projection is the following equivalence.

Theorem 1. [3, Theorem 2.2] A probability measure Q∈Γ∩D∞(P)is the

I-projection of Pon the convex set Γof probability measures iﬀ every R∈Γ

satisﬁes

D(RkP)≥D(RkQ) + D(QkP).(3)

The proof of this theorem is based on diﬀerentiating D(PλkP)w.r.t. λat

λ= 0, where Pλ=λP + (1 −λ)Qwith λ∈[0,1]; see details in [3, Theorem

2.2]. The inequality (3) can be written as

Zlog dQ

dPdR≥D(QkP),

or Zlog dQ

dPd(R−Q)≥0.

So, for the I-projection Q, the set

H:= R:Zlog dQ

dPd(R−Q) = 0

constitutes a “supporting hyperplane” of Γ. Moreover, the function 1 +

log dQ

dPcan be seen as the “gradient” of Q7→ D(QkP), and His the “tangent

hyperplane” of Q7→ D(QkP). Analogously to Euclidean spaces, considering

D(QkP)as a “distance”, it holds that Qis the projection of Pto H, and

D(RkP) = D(RkQ) + D(QkP),∀R∈ H.

As a consequence of Theorem 1, when Γis speciﬁed by linear constraints,

the I-projection can be written out explicitly. To illustrate this point, we let

f:X → Rbe a measurable function. Consider the following I-projection

problem:

γ(α) := inf

Q:Q(f)=αD(QkP).

To give the I-projection for γ(α), we ﬁrst introduce the following lemma,

which can be easily veriﬁed by deﬁnition.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

arXiv:2210.13121v2[math.PR]3Nov2022TheEntropyMethodinLargeDeviationTheoryLeiYu∗November4,2022AbstractThispaperillustratesthepoweroftheentropymethodinaddress-ingproblemsfromlargedeviationtheory.Weprovideandreviewen-tropyproofsformostfundamentalresultsinlargedeviationtheory,includingCramer’stheorem,th...

展开>> 收起<<

The Entropy Method in Large Deviation Theory.pdf

共32页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

The Entropy Method in Large Deviation Theory

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: