Pontryagins Minimum Principle and Forward-Backward Sweep Method for the System of HJB-FP Equations in Memory-Limited Partially Observable Stochastic Control

2025-05-02 1 0 9.14MB 27 页 10玖币

侵权投诉

Pontryagin’s Minimum Principle and Forward-Backward Sweep

Method for the System of HJB-FP Equations in

Memory-Limited Partially Observable Stochastic Control

Takehiro Tottori1and Tetsuya J. Kobayashi1,2,3,4

Abstract

Memory-limited partially observable stochastic control (ML-POSC) is the stochastic optimal control problem

under incomplete information and memory limitation. In order to obtain the optimal control function of ML-POSC,

a system of the forward Fokker-Planck (FP) equation and the backward Hamilton-Jacobi-Bellman (HJB) equation

needs to be solved. In this work, we ﬁrstly show that the system of HJB-FP equations can be interpreted via

the Pontryagin’s minimum principle on the probability density function space. Based on this interpretation, we

then propose the forward-backward sweep method (FBSM) to ML-POSC, which has been used in the Pontryagin’s

minimum principle. FBSM is an algorithm to compute the forward FP equation and the backward HJB equation

alternately. Although the convergence of FBSM is generally not guaranteed, it is guaranteed in ML-POSC because

the coupling of HJB-FP equations is limited to the optimal control function in ML-POSC.

I. Introduction

In many practical applications of the stochastic optimal control theory, several constraints need to be considered.

Especially in small devices [1], [2] and in biological systems [3], [4], [5], [6], [7], [8], incomplete information and

memory limitation become predominant because their sensors are extremely noisy and their memory resources are

severely limited. In order to account these constraints, memory-limited partially observable stochastic control (ML-

POSC) has recently been proposed [9]. Because ML-POSC formulates the noisy observation and the limited memory

explicitly, ML-POSC can directly take incomplete information and memory limitation into account in the stochastic

optimal control problem.

However, ML-POSC cannot be solved in the similar way as the conventional stochastic control, which is also

called completely observable stochastic control (COSC). In COSC, the optimal control function depends only on the

Hamilton-Jacobi-Bellman (HJB) equation, which is a time-backward partial diﬀerential equation given the terminal

condition (Figure 1(a)) [10]. Therefore, the optimal control function of COSC can be obtained by solving the HJB

equation backward in time from the terminal condition, which is called the value iteration method [11], [12], [13]. In

contrast, the optimal control function of ML-POSC depends not only on the HJB equation but also on the Fokker-

Planck (FP) equation, which is a time-forward partial diﬀerential equation given the initial condition (Figure 1(b))

[9]. Because the HJB equation and the FP equation interact with each other through the optimal control function

in ML-POSC, the optimal control function of ML-POSC cannot be obtained by the value iteration method.

In order to propose an algorithm to ML-POSC, we ﬁrstly show that the system of HJB-FP equations can be

interpreted via the Pontryagin’s minimum principle on the probability density function space. The Pontryagin’s

minimum principle is one of the most representative approaches to the deterministic control, which converts the

optimal control problem into the two-point boundary value problem of the forward state equation and the backward

adjoint equation [14], [15], [16]. We show that the system of HJB-FP equations is an extension of the system of the

state and adjoint equations from the deterministic control to the stochastic control.

The system of HJB-FP equations also appears in the mean-ﬁeld stochastic control (MFSC) [17], [18], [19].

Although the relationship between the system of HJB-FP equations and the Pontryagin’s minimum principle has

been mentioned brieﬂy in MFSC [20], [21], [22], its details have not yet been investigated. We resolve this problem

by deriving the system of HJB-FP equations in the similar way as the Pontryagin’s minimum principle.

We then propose the forward-backward sweep method (FBSM) to ML-POSC. FBSM is an algorithm to compute

the forward FP equation and the backward HJB equation alternately, which can be interpreted as an extension of

the value iteration method. FBSM has been proposed in the Pontryagin’s minimum principle of the deterministic

control, which computes the forward state equation and the backward adjoint equation alternately [23], [24], [25].

Because FBSM is easy to implement, it has been used in many applications [26], [27]. However, the convergence of

FBSM is not guaranteed in the deterministic control except for special cases [28], [29] because the coupling of the

1Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo

113-8654, Japan

2Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan

3Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo

113-8654, Japan

4Universal Biology Institute, The University of Tokyo, Tokyo 113-8654, Japan

arXiv:2210.13040v3 [math.OC] 8 Nov 2022

state and adjoint equations is not limited to the optimal control function (Figure 1(c)). In contrast, we show that

the convergence of FBSM is generally guaranteed in ML-POSC because the coupling of HJB-FP equations is limited

only to the optimal control function (Figure 1(b)).

FBSM corresponds to the ﬁxed-point iteration method in MFSC [30], [31], [32], [33]. Although the ﬁxed-point

iteration method is the most basic algorithm in MFSC, its convergence is not guaranteed for the same reason as the

deterministic control (Figure 1(d)). Therefore, ML-POSC is a special class where FBSM or the ﬁxed-point iteration

method is guaranteed to converge.

This paper is organized as follows: In Section II, we formulate ML-POSC. In Section III, we derive the system

of HJB-FP equations from the viewpoint of the Pontryagin’s minimum principle. In Section IV, we propose FBSM

to ML-POSC and prove the convergence. In Section V, we apply FBSM to the linear-quadratic-Gaussian (LQG)

problem. In Section VI, we verify the convergence of FBSM by numerical experiments. In Section VII, we discuss our

work. In Appendix I, we brieﬂy review the Pontryagin’s minimum principle of the deterministic control. In Appendix

II, we show the Pontryagin’s minimum principle of MFSC. In Appendix III, we show the proof in the main text.

II. Memory-Limited Partially Observable Stochastic Control

In this section, we brieﬂy review the formulation of ML-POSC [9], which is the stochastic optimal control problem

under incomplete information and memory limitation.

A. Problem Formulation

In this subsection, we formulate ML-POSC [9]. The state of the system xt∈Rdxat time t∈[0, T ] evolves by the

following stochastic diﬀerential equation (SDE):

dxt=b(t, xt, ut)dt +σ(t, xt, ut)dωt,(1)

where x0obeys p0(x0), ut∈Rduis the control, and ωt∈Rdωis the standard Wiener process. In ML-POSC, because

the controller cannot completely observe the state xt, the observation yt∈Rdyis obtained instead of the state xt,

which evolves by the following SDE:

dyt=h(t, xt)dt +γ(t)dνt,(2)

where y0obeys p0(y0), and νt∈Rdνis the standard Wiener process. Furthermore, because the controller cannot

completely memorize the observation history y0:t:= {yτ|τ∈[0, t]}, the observation history y0:tis compressed into

the ﬁnite-dimensional memory zt∈Rdz, which evolves by the following SDE:

dzt=c(t, zt, vt)dt +κ(t, zt, vt)dyt+η(t, zt, vt)dξt,(3)

where z0obeys p0(z0), vt∈Rdvis the control, and ξt∈Rdξis the standard Wiener process. The controller determines

the state control utand the memory control vtbased on the memory ztas follows:

ut=u(t, zt), vt=v(t, zt).(4)

The objective function of ML-POSC is given by the following expected cumulative cost function:

J[u, v] := Ep(x0:T,y0:T,z0:T;u,v)"ZT

f(t, xt, ut, vt)dt +g(xT)#,(5)

where fis the cost function, gis the terminal cost function, p(x0:T, y0:T, z0:T;u, v) is the probability of x0:T,y0:T,

and z0:Tgiven uand vas parameters, and Ep[·] is the expectation with respect to the probability p.

ML-POSC is the problem to ﬁnd the optimal control functions u∗and v∗that minimize the expected cumulative

cost function J[u, v] as follows:

u∗, v∗= argmin

u,v

J[u, v].(6)

B. Problem Reformulation

Although the formulation of ML-POSC in the previous subsection is intuitive, it is inconvenient for further

mathematical investigations. In order to resolve this problem, we reformulate ML-POSC in this subsection. The

formulation in this subsection is simpler and more general than that in the previous subsection.

We ﬁrst deﬁne the extended state stas follows:

st:= xt

zt∈Rds,(7)

Fig. 1. Schematic diagram of the relationship between the backward dynamics, the optimal control function, and the forward dynamics

in (a) COSC, (b) ML-POSC, (c) deterministic control, and (d) MFSC. w∗,p∗,λ∗, and s∗are the solutions of the HJB equation, the FP

equation, the adjoint equation, and the state equation, respectively. u∗is the optimal control function. (a) In COSC, because the optimal

control function u∗depends only on the HJB equation w∗, it can be obtained by solving the HJB equation w∗backward in time from the

terminal condition, which is called the value iteration method. (b) In ML-POSC, because the optimal control function u∗depends on the

FP equation p∗as well as the HJB equation w∗, it cannot be obtained by the value iteration method. In this paper, we propose FBSM

to ML-POSC, which computes the HJB equation w∗and the FP equation p∗alternately. Because the coupling of the HJB equation w∗

and the FP equation p∗is limited only to the optimal control function u∗, the convergence of FBSM is guaranteed in ML-POSC. (c)

In deterministic control, because the coupling of the adjoint equation λ∗and the state equation s∗is not limited to the optimal control

function u∗, the convergence of FBSM is not guaranteed. (d) In MFSC, because the coupling of the HJB equation w∗and the FP equation

p∗is not limited to the optimal control function u∗, the convergence of FBSM is not guaranteed.

where ds=dx+dz. The extended state stevolves by the following SDE:

dst=˜

b(t, st,˜ut)dt + ˜σ(t, st,˜ut)d˜ωt,(8)

where s0obeys p0(s0), ˜ut∈Rd˜uis the control, and ˜ωt∈Rd˜ωis the standard Wiener process. ML-POSC determines

the control ˜ut∈Rd˜ubased on the memory ztas follows:

˜ut= ˜u(t, zt).(9)

The extended state SDE (8) includes the previous SDEs (1), (2), (3) as a special case because they can be represented

as follows:

dst=b(t, xt, ut)

c(t, zt, vt) + κ(t, zt, vt)h(t, xt)dt +σ(t, xt, ut)O O

O κ(t, zt, vt)γ(t)η(t, zt, vt)



dωt

dνt

dξt

,(10)

Fig. 2. The relationship between the Bellman’s dynamic programming principle (top) and the Pontryagin’s minimum principle (bottom)

on the state space (left) and on the probability density function space (right). The left-hand side corresponds to the deterministic control,

which is brieﬂy reviewed in Appendix I. The right-hand side corresponds to ML-POSC and MFSC, which are shown in Section III and

Appendix II, respectively. The conventional approach in ML-POSC [9] and MFSC [21], [22] can be interpreted as the conversion from the

Bellman’s dynamic programming principle into the Pontryagin’s minimum principle on the probability density function space.

where p0(s0) = p0(x0)p0(z0).

The objective function of ML-POSC is given by the following expected cumulative cost function:

J[˜u] := Ep(s0:T;˜u)"ZT

f(t, st,˜ut)dt + ˜g(sT)#.(11)

where ˜

fis the cost function, and ˜gis the terminal cost function. It is obvious that this objective function (11) is

more general than that in the previous subsection (5).

ML-POSC is the problem to ﬁnd the optimal control function ˜u∗that minimizes the expected cumulative cost

function J[˜u] as follows:

˜u∗= argmin

˜u

J[˜u].(12)

In the following sections, we mainly consider the formulation of this subsection because it is simpler and more

general than that in the previous subsection. Moreover, we omit ˜

·for the notational simplicity.

III. Pontryagin’s Minimum Principle

If the control utis determined based on the extended state st, i.e., ut=u(t, st), ML-POSC is the same problem

with COSC of the extended state, and its optimality conditions can be obtained in the conventional way [10]. In

reality, however, because ML-POSC determines the control utbased only on the memory zt, i.e., ut=u(t, zt), its

optimality conditions cannot be obtained in the similar way as COSC. In the previous work [9], the optimality

conditions of ML-POSC were obtained by employing a mathematical technique of MFSC [21], [22].

In this section, we obtain the optimality conditions of ML-POSC by employing the Pontryagin’s minimum principle

[10], [14], [15], [16] on the probability density function space (Figure 2). The conventional approach in ML-POSC

[9] and MFSC [21], [22] can be interpreted as the conversion from the Bellman’s dynamic programming principle to

the Pontryagin’s minimum principle on the probability density function space.

In Appendix I, we brieﬂy review the Pontryagin’s minimum principle of the deterministic control. In this section,

we obtain the optimality conditions of ML-POSC in the similar way as Appendix I. Furthermore, in Appendix II, we

obtain the optimality conditions of MFSC in the similar way as Appendix I. MFSC is more general than ML-POSC

except for the partial observability. Especially, the expected Hamiltonian is non-linear with respect to the probability

density function in MFSC while it is linear in ML-POSC.

A. Hamiltonian

Before we show the optimality conditions, we deﬁne the Hamiltonian as follows:

H(t, s, u, w) := f(t, s, u) + Luw(t, s),(13)

where ∀w: [0, T ]×Rds→R, and Luis the backward diﬀusion operator, which is deﬁned as follows:

Luw(t, s) :=

i=1

bi(t, s, u)∂w(t, s)

∂si

i,j=1

Dij (t, s, u)∂2w(t, s)

∂si∂sj

,(14)

where D(t, s, u) := σ(t, s, u)σ>(t, s, u). The extended state SDE (8) can be converted into the following Fokker-Planck

(FP) equation:

∂p(t, s)

∂t =L†

up(t, s),(15)

where p(0, s) = p0(s), and L†

uis the forward diﬀusion operator, which is deﬁned as follows:

L†

up(t, s) := −

i=1

∂(bi(t, s, u)p(t, s))

∂si

i,j=1

∂2(Dij (t, s, u)p(t, s))

∂si∂sj

.(16)

We note that L†

uis the conjugate of Luas follows:

Zw(t, s)L†

up(t, s)ds =Zp(t, s)Luw(t, s)ds, (17)

The following lemma shows the relationship between the objective function J[u] and the Hamiltonian H(t, s, u, w),

which is signiﬁcant for the optimality conditions.

Lemma 1: Let ∀u: [0, T ]×Rdz→Rduand ∀u0: [0, T ]×Rdz→Rdube the arbitrary control functions, and let p

and p0be the probability density functions of the extended state driven by uand u0, respectively. Then J[u]−J[u0]

satisﬁes the following equation:

J[u]−J[u0] = ZT

0Ep(t,s)[H(t, s, u, w0)] −Ep(t,s)[H(t, s, u0, w0)]dt, (18)

where w0is the solution of the following Hamilton-Jacobi-Bellman (HJB) equation:

−∂w0(t, s)

∂t =H(t, s, u0, w0),(19)

where w0(T, s) = g(s).

Proof: The proof is shown in Appendix III-A.

B. Necessary Condition

We show the necessary condition of the optimal control function of ML-POSC that corresponds to the Pontryagin’s

minimum principle on the probability density function space.

Theorem 1: In ML-POSC, the optimal control function u∗satisﬁes the following equation:

u∗(t, z) = argmin

Ep∗

t(x|z)[H(t, s, u, w∗)] , a.s. ∀t∈[0, T ],∀z∈Rdz,(20)

where p∗

t(x|z) := p∗(t, s)/Rp∗(t, s)dx is the conditional probability density function of the state xgiven the memory

z, and p∗(t, s) is the solution of the FP equation:

∂p∗(t, s)

∂t =L†

u∗p∗(t, s),(21)

where p∗(0, s) = p0(s). w∗(t, s) is the solution of the following HJB equation:

−∂w∗(t, s)

∂t =H(t, s, u∗, w∗),(22)

where w∗(T, s) = g(s).

Proof: The proof is shown in Appendix III-B.

In the Pontryagin’s minimum principle of the deterministic control, the state equation (70) and the adjoint equation

(71) can be expressed by the derivatives of the Hamiltonian (Appendix I). Similarly, the system of HJB-FP equations

(21) and (22) can be expressed by the variations of the expected Hamiltonian

H(t, p, u, w) := Ep(s)[H(t, s, u, w)] ,(23)

as follows:

∂p∗(t, s)

∂t =δ¯

H(t, p∗, u∗, w∗)

δw (s),(24)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Pontryagin'sMinimumPrincipleandForward-BackwardSweepMethodfortheSystemofHJB-FPEquationsinMemory-LimitedPartiallyObservableStochasticControlTakehiroTottori1andTetsuyaJ.Kobayashi1;2;3;4AbstractMemory-limitedpartiallyobservablestochasticcontrol(ML-POSC)isthestochasticoptimalcontrolproblemunderincomplet...

展开>> 收起<<

Pontryagins Minimum Principle and Forward-Backward Sweep Method for the System of HJB-FP Equations in Memory-Limited Partially Observable Stochastic Control.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Pontryagins Minimum Principle and Forward-Backward Sweep Method for the System of HJB-FP Equations in Memory-Limited Partially Observable Stochastic Control

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: