Pontryagin’s Minimum Principle and Forward-Backward Sweep
Method for the System of HJB-FP Equations in
Memory-Limited Partially Observable Stochastic Control
Takehiro Tottori1and Tetsuya J. Kobayashi1,2,3,4
Abstract
Memory-limited partially observable stochastic control (ML-POSC) is the stochastic optimal control problem
under incomplete information and memory limitation. In order to obtain the optimal control function of ML-POSC,
a system of the forward Fokker-Planck (FP) equation and the backward Hamilton-Jacobi-Bellman (HJB) equation
needs to be solved. In this work, we firstly show that the system of HJB-FP equations can be interpreted via
the Pontryagin’s minimum principle on the probability density function space. Based on this interpretation, we
then propose the forward-backward sweep method (FBSM) to ML-POSC, which has been used in the Pontryagin’s
minimum principle. FBSM is an algorithm to compute the forward FP equation and the backward HJB equation
alternately. Although the convergence of FBSM is generally not guaranteed, it is guaranteed in ML-POSC because
the coupling of HJB-FP equations is limited to the optimal control function in ML-POSC.
I. Introduction
In many practical applications of the stochastic optimal control theory, several constraints need to be considered.
Especially in small devices [1], [2] and in biological systems [3], [4], [5], [6], [7], [8], incomplete information and
memory limitation become predominant because their sensors are extremely noisy and their memory resources are
severely limited. In order to account these constraints, memory-limited partially observable stochastic control (ML-
POSC) has recently been proposed [9]. Because ML-POSC formulates the noisy observation and the limited memory
explicitly, ML-POSC can directly take incomplete information and memory limitation into account in the stochastic
optimal control problem.
However, ML-POSC cannot be solved in the similar way as the conventional stochastic control, which is also
called completely observable stochastic control (COSC). In COSC, the optimal control function depends only on the
Hamilton-Jacobi-Bellman (HJB) equation, which is a time-backward partial differential equation given the terminal
condition (Figure 1(a)) [10]. Therefore, the optimal control function of COSC can be obtained by solving the HJB
equation backward in time from the terminal condition, which is called the value iteration method [11], [12], [13]. In
contrast, the optimal control function of ML-POSC depends not only on the HJB equation but also on the Fokker-
Planck (FP) equation, which is a time-forward partial differential equation given the initial condition (Figure 1(b))
[9]. Because the HJB equation and the FP equation interact with each other through the optimal control function
in ML-POSC, the optimal control function of ML-POSC cannot be obtained by the value iteration method.
In order to propose an algorithm to ML-POSC, we firstly show that the system of HJB-FP equations can be
interpreted via the Pontryagin’s minimum principle on the probability density function space. The Pontryagin’s
minimum principle is one of the most representative approaches to the deterministic control, which converts the
optimal control problem into the two-point boundary value problem of the forward state equation and the backward
adjoint equation [14], [15], [16]. We show that the system of HJB-FP equations is an extension of the system of the
state and adjoint equations from the deterministic control to the stochastic control.
The system of HJB-FP equations also appears in the mean-field stochastic control (MFSC) [17], [18], [19].
Although the relationship between the system of HJB-FP equations and the Pontryagin’s minimum principle has
been mentioned briefly in MFSC [20], [21], [22], its details have not yet been investigated. We resolve this problem
by deriving the system of HJB-FP equations in the similar way as the Pontryagin’s minimum principle.
We then propose the forward-backward sweep method (FBSM) to ML-POSC. FBSM is an algorithm to compute
the forward FP equation and the backward HJB equation alternately, which can be interpreted as an extension of
the value iteration method. FBSM has been proposed in the Pontryagin’s minimum principle of the deterministic
control, which computes the forward state equation and the backward adjoint equation alternately [23], [24], [25].
Because FBSM is easy to implement, it has been used in many applications [26], [27]. However, the convergence of
FBSM is not guaranteed in the deterministic control except for special cases [28], [29] because the coupling of the
1Department of Mathematical Informatics, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo
113-8654, Japan
2Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan
3Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo, Tokyo
113-8654, Japan
4Universal Biology Institute, The University of Tokyo, Tokyo 113-8654, Japan
arXiv:2210.13040v3 [math.OC] 8 Nov 2022