Pontryagins Minimum Principle and Forward-Backward Sweep Method for the System of HJB-FP Equations in Memory-Limited Partially Observable Stochastic Control

2025-05-02 0 0 9.14MB 27 页 10玖币
侵权投诉
Pontryagin’s Minimum Principle and Forward-Backward Sweep
Method for the System of HJB-FP Equations in
Memory-Limited Partially Observable Stochastic Control
Takehiro Tottori1and Tetsuya J. Kobayashi1,2,3,4
Abstract
Memory-limited partially observable stochastic control (ML-POSC) is the stochastic optimal control problem
under incomplete information and memory limitation. In order to obtain the optimal control function of ML-POSC,
a system of the forward Fokker-Planck (FP) equation and the backward Hamilton-Jacobi-Bellman (HJB) equation
needs to be solved. In this work, we firstly show that the system of HJB-FP equations can be interpreted via
the Pontryagin’s minimum principle on the probability density function space. Based on this interpretation, we
then propose the forward-backward sweep method (FBSM) to ML-POSC, which has been used in the Pontryagin’s
minimum principle. FBSM is an algorithm to compute the forward FP equation and the backward HJB equation
alternately. Although the convergence of FBSM is generally not guaranteed, it is guaranteed in ML-POSC because
the coupling of HJB-FP equations is limited to the optimal control function in ML-POSC.
I. Introduction
In many practical applications of the stochastic optimal control theory, several constraints need to be considered.
Especially in small devices [1], [2] and in biological systems [3], [4], [5], [6], [7], [8], incomplete information and
memory limitation become predominant because their sensors are extremely noisy and their memory resources are
severely limited. In order to account these constraints, memory-limited partially observable stochastic control (ML-
POSC) has recently been proposed [9]. Because ML-POSC formulates the noisy observation and the limited memory
explicitly, ML-POSC can directly take incomplete information and memory limitation into account in the stochastic
optimal control problem.
However, ML-POSC cannot be solved in the similar way as the conventional stochastic control, which is also
called completely observable stochastic control (COSC). In COSC, the optimal control function depends only on the
Hamilton-Jacobi-Bellman (HJB) equation, which is a time-backward partial differential equation given the terminal
condition (Figure 1(a)) [10]. Therefore, the optimal control function of COSC can be obtained by solving the HJB
equation backward in time from the terminal condition, which is called the value iteration method [11], [12], [13]. In
contrast, the optimal control function of ML-POSC depends not only on the HJB equation but also on the Fokker-
Planck (FP) equation, which is a time-forward partial differential equation given the initial condition (Figure 1(b))
[9]. Because the HJB equation and the FP equation interact with each other through the optimal control function
in ML-POSC, the optimal control function of ML-POSC cannot be obtained by the value iteration method.
In order to propose an algorithm to ML-POSC, we firstly show that the system of HJB-FP equations can be
interpreted via the Pontryagin’s minimum principle on the probability density function space. The Pontryagin’s
minimum principle is one of the most representative approaches to the deterministic control, which converts the
optimal control problem into the two-point boundary value problem of the forward state equation and the backward
adjoint equation [14], [15], [16]. We show that the system of HJB-FP equations is an extension of the system of the
state and adjoint equations from the deterministic control to the stochastic control.
The system of HJB-FP equations also appears in the mean-field stochastic control (MFSC) [17], [18], [19].
Although the relationship between the system of HJB-FP equations and the Pontryagin’s minimum principle has
been mentioned briefly in MFSC [20], [21], [22], its details have not yet been investigated. We resolve this problem
by deriving the system of HJB-FP equations in the similar way as the Pontryagin’s minimum principle.
We then propose the forward-backward sweep method (FBSM) to ML-POSC. FBSM is an algorithm to compute
the forward FP equation and the backward HJB equation alternately, which can be interpreted as an extension of
the value iteration method. FBSM has been proposed in the Pontryagin’s minimum principle of the deterministic
control, which computes the forward state equation and the backward adjoint equation alternately [23], [24], [25].
Because FBSM is easy to implement, it has been used in many applications [26], [27]. However, the convergence of
FBSM is not guaranteed in the deterministic control except for special cases [28], [29] because the coupling of the
1Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo
113-8654, Japan
2Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan
3Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo
113-8654, Japan
4Universal Biology Institute, The University of Tokyo, Tokyo 113-8654, Japan
arXiv:2210.13040v3 [math.OC] 8 Nov 2022
state and adjoint equations is not limited to the optimal control function (Figure 1(c)). In contrast, we show that
the convergence of FBSM is generally guaranteed in ML-POSC because the coupling of HJB-FP equations is limited
only to the optimal control function (Figure 1(b)).
FBSM corresponds to the fixed-point iteration method in MFSC [30], [31], [32], [33]. Although the fixed-point
iteration method is the most basic algorithm in MFSC, its convergence is not guaranteed for the same reason as the
deterministic control (Figure 1(d)). Therefore, ML-POSC is a special class where FBSM or the fixed-point iteration
method is guaranteed to converge.
This paper is organized as follows: In Section II, we formulate ML-POSC. In Section III, we derive the system
of HJB-FP equations from the viewpoint of the Pontryagin’s minimum principle. In Section IV, we propose FBSM
to ML-POSC and prove the convergence. In Section V, we apply FBSM to the linear-quadratic-Gaussian (LQG)
problem. In Section VI, we verify the convergence of FBSM by numerical experiments. In Section VII, we discuss our
work. In Appendix I, we briefly review the Pontryagin’s minimum principle of the deterministic control. In Appendix
II, we show the Pontryagin’s minimum principle of MFSC. In Appendix III, we show the proof in the main text.
II. Memory-Limited Partially Observable Stochastic Control
In this section, we briefly review the formulation of ML-POSC [9], which is the stochastic optimal control problem
under incomplete information and memory limitation.
A. Problem Formulation
In this subsection, we formulate ML-POSC [9]. The state of the system xtRdxat time t[0, T ] evolves by the
following stochastic differential equation (SDE):
dxt=b(t, xt, ut)dt +σ(t, xt, ut)t,(1)
where x0obeys p0(x0), utRduis the control, and ωtRdωis the standard Wiener process. In ML-POSC, because
the controller cannot completely observe the state xt, the observation ytRdyis obtained instead of the state xt,
which evolves by the following SDE:
dyt=h(t, xt)dt +γ(t)t,(2)
where y0obeys p0(y0), and νtRdνis the standard Wiener process. Furthermore, because the controller cannot
completely memorize the observation history y0:t:= {yτ|τ[0, t]}, the observation history y0:tis compressed into
the finite-dimensional memory ztRdz, which evolves by the following SDE:
dzt=c(t, zt, vt)dt +κ(t, zt, vt)dyt+η(t, zt, vt)t,(3)
where z0obeys p0(z0), vtRdvis the control, and ξtRdξis the standard Wiener process. The controller determines
the state control utand the memory control vtbased on the memory ztas follows:
ut=u(t, zt), vt=v(t, zt).(4)
The objective function of ML-POSC is given by the following expected cumulative cost function:
J[u, v] := Ep(x0:T,y0:T,z0:T;u,v)"ZT
0
f(t, xt, ut, vt)dt +g(xT)#,(5)
where fis the cost function, gis the terminal cost function, p(x0:T, y0:T, z0:T;u, v) is the probability of x0:T,y0:T,
and z0:Tgiven uand vas parameters, and Ep[·] is the expectation with respect to the probability p.
ML-POSC is the problem to find the optimal control functions uand vthat minimize the expected cumulative
cost function J[u, v] as follows:
u, v= argmin
u,v
J[u, v].(6)
B. Problem Reformulation
Although the formulation of ML-POSC in the previous subsection is intuitive, it is inconvenient for further
mathematical investigations. In order to resolve this problem, we reformulate ML-POSC in this subsection. The
formulation in this subsection is simpler and more general than that in the previous subsection.
We first define the extended state stas follows:
st:= xt
ztRds,(7)
Fig. 1. Schematic diagram of the relationship between the backward dynamics, the optimal control function, and the forward dynamics
in (a) COSC, (b) ML-POSC, (c) deterministic control, and (d) MFSC. w,p,λ, and sare the solutions of the HJB equation, the FP
equation, the adjoint equation, and the state equation, respectively. uis the optimal control function. (a) In COSC, because the optimal
control function udepends only on the HJB equation w, it can be obtained by solving the HJB equation wbackward in time from the
terminal condition, which is called the value iteration method. (b) In ML-POSC, because the optimal control function udepends on the
FP equation pas well as the HJB equation w, it cannot be obtained by the value iteration method. In this paper, we propose FBSM
to ML-POSC, which computes the HJB equation wand the FP equation palternately. Because the coupling of the HJB equation w
and the FP equation pis limited only to the optimal control function u, the convergence of FBSM is guaranteed in ML-POSC. (c)
In deterministic control, because the coupling of the adjoint equation λand the state equation sis not limited to the optimal control
function u, the convergence of FBSM is not guaranteed. (d) In MFSC, because the coupling of the HJB equation wand the FP equation
pis not limited to the optimal control function u, the convergence of FBSM is not guaranteed.
where ds=dx+dz. The extended state stevolves by the following SDE:
dst=˜
b(t, st,˜ut)dt + ˜σ(t, st,˜ut)d˜ωt,(8)
where s0obeys p0(s0), ˜utRd˜uis the control, and ˜ωtRd˜ωis the standard Wiener process. ML-POSC determines
the control ˜utRd˜ubased on the memory ztas follows:
˜ut= ˜u(t, zt).(9)
The extended state SDE (8) includes the previous SDEs (1), (2), (3) as a special case because they can be represented
as follows:
dst=b(t, xt, ut)
c(t, zt, vt) + κ(t, zt, vt)h(t, xt)dt +σ(t, xt, ut)O O
O κ(t, zt, vt)γ(t)η(t, zt, vt)
t
t
t
,(10)
Fig. 2. The relationship between the Bellman’s dynamic programming principle (top) and the Pontryagin’s minimum principle (bottom)
on the state space (left) and on the probability density function space (right). The left-hand side corresponds to the deterministic control,
which is briefly reviewed in Appendix I. The right-hand side corresponds to ML-POSC and MFSC, which are shown in Section III and
Appendix II, respectively. The conventional approach in ML-POSC [9] and MFSC [21], [22] can be interpreted as the conversion from the
Bellman’s dynamic programming principle into the Pontryagin’s minimum principle on the probability density function space.
where p0(s0) = p0(x0)p0(z0).
The objective function of ML-POSC is given by the following expected cumulative cost function:
J[˜u] := Ep(s0:T;˜u)"ZT
0
˜
f(t, st,˜ut)dt + ˜g(sT)#.(11)
where ˜
fis the cost function, and ˜gis the terminal cost function. It is obvious that this objective function (11) is
more general than that in the previous subsection (5).
ML-POSC is the problem to find the optimal control function ˜uthat minimizes the expected cumulative cost
function J[˜u] as follows:
˜u= argmin
˜u
J[˜u].(12)
In the following sections, we mainly consider the formulation of this subsection because it is simpler and more
general than that in the previous subsection. Moreover, we omit ˜
·for the notational simplicity.
III. Pontryagin’s Minimum Principle
If the control utis determined based on the extended state st, i.e., ut=u(t, st), ML-POSC is the same problem
with COSC of the extended state, and its optimality conditions can be obtained in the conventional way [10]. In
reality, however, because ML-POSC determines the control utbased only on the memory zt, i.e., ut=u(t, zt), its
optimality conditions cannot be obtained in the similar way as COSC. In the previous work [9], the optimality
conditions of ML-POSC were obtained by employing a mathematical technique of MFSC [21], [22].
In this section, we obtain the optimality conditions of ML-POSC by employing the Pontryagin’s minimum principle
[10], [14], [15], [16] on the probability density function space (Figure 2). The conventional approach in ML-POSC
[9] and MFSC [21], [22] can be interpreted as the conversion from the Bellman’s dynamic programming principle to
the Pontryagin’s minimum principle on the probability density function space.
In Appendix I, we briefly review the Pontryagin’s minimum principle of the deterministic control. In this section,
we obtain the optimality conditions of ML-POSC in the similar way as Appendix I. Furthermore, in Appendix II, we
obtain the optimality conditions of MFSC in the similar way as Appendix I. MFSC is more general than ML-POSC
except for the partial observability. Especially, the expected Hamiltonian is non-linear with respect to the probability
density function in MFSC while it is linear in ML-POSC.
A. Hamiltonian
Before we show the optimality conditions, we define the Hamiltonian as follows:
H(t, s, u, w) := f(t, s, u) + Luw(t, s),(13)
where w: [0, T ]×RdsR, and Luis the backward diffusion operator, which is defined as follows:
Luw(t, s) :=
ds
X
i=1
bi(t, s, u)w(t, s)
si
+1
2
ds
X
i,j=1
Dij (t, s, u)2w(t, s)
sisj
,(14)
where D(t, s, u) := σ(t, s, u)σ>(t, s, u). The extended state SDE (8) can be converted into the following Fokker-Planck
(FP) equation:
p(t, s)
t =L
up(t, s),(15)
where p(0, s) = p0(s), and L
uis the forward diffusion operator, which is defined as follows:
L
up(t, s) :=
ds
X
i=1
(bi(t, s, u)p(t, s))
si
+1
2
ds
X
i,j=1
2(Dij (t, s, u)p(t, s))
sisj
.(16)
We note that L
uis the conjugate of Luas follows:
Zw(t, s)L
up(t, s)ds =Zp(t, s)Luw(t, s)ds, (17)
The following lemma shows the relationship between the objective function J[u] and the Hamiltonian H(t, s, u, w),
which is significant for the optimality conditions.
Lemma 1: Let u: [0, T ]×RdzRduand u0: [0, T ]×RdzRdube the arbitrary control functions, and let p
and p0be the probability density functions of the extended state driven by uand u0, respectively. Then J[u]J[u0]
satisfies the following equation:
J[u]J[u0] = ZT
0Ep(t,s)[H(t, s, u, w0)] Ep(t,s)[H(t, s, u0, w0)]dt, (18)
where w0is the solution of the following Hamilton-Jacobi-Bellman (HJB) equation:
w0(t, s)
t =H(t, s, u0, w0),(19)
where w0(T, s) = g(s).
Proof: The proof is shown in Appendix III-A.
B. Necessary Condition
We show the necessary condition of the optimal control function of ML-POSC that corresponds to the Pontryagin’s
minimum principle on the probability density function space.
Theorem 1: In ML-POSC, the optimal control function usatisfies the following equation:
u(t, z) = argmin
u
Ep
t(x|z)[H(t, s, u, w)] , a.s. t[0, T ],zRdz,(20)
where p
t(x|z) := p(t, s)/Rp(t, s)dx is the conditional probability density function of the state xgiven the memory
z, and p(t, s) is the solution of the FP equation:
p(t, s)
t =L
up(t, s),(21)
where p(0, s) = p0(s). w(t, s) is the solution of the following HJB equation:
w(t, s)
t =H(t, s, u, w),(22)
where w(T, s) = g(s).
Proof: The proof is shown in Appendix III-B.
In the Pontryagin’s minimum principle of the deterministic control, the state equation (70) and the adjoint equation
(71) can be expressed by the derivatives of the Hamiltonian (Appendix I). Similarly, the system of HJB-FP equations
(21) and (22) can be expressed by the variations of the expected Hamiltonian
¯
H(t, p, u, w) := Ep(s)[H(t, s, u, w)] ,(23)
as follows:
p(t, s)
t =δ¯
H(t, p, u, w)
δw (s),(24)
摘要:

Pontryagin'sMinimumPrincipleandForward-BackwardSweepMethodfortheSystemofHJB-FPEquationsinMemory-LimitedPartiallyObservableStochasticControlTakehiroTottori1andTetsuyaJ.Kobayashi1;2;3;4AbstractMemory-limitedpartiallyobservablestochasticcontrol(ML-POSC)isthestochasticoptimalcontrolproblemunderincomplet...

展开>> 收起<<
Pontryagins Minimum Principle and Forward-Backward Sweep Method for the System of HJB-FP Equations in Memory-Limited Partially Observable Stochastic Control.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:9.14MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注