Safety Embedded Stochastic Optimal Control of Networked
Multi-Agent Systems via Barrier States
Lin Song1, Pan Zhao1, Neng Wan1, and Naira Hovakimyan1
Abstract— This paper presents a novel approach for achiev-
ing safe stochastic optimal control in networked multi-agent
systems (MASs). The proposed method incorporates barrier
states (BaSs) into the system dynamics to embed safety con-
straints. To accomplish this, the networked MAS is factorized
into multiple subsystems, and each one is augmented with BaSs
for the central agent. The optimal control law is obtained
by solving the joint Hamilton-Jacobi-Bellman (HJB) equation
on the augmented subsystem, which guarantees safety via the
boundedness of the BaSs. The BaS-based optimal control tech-
nique yields safe control actions while maintaining optimality.
The safe optimal control solution is approximated using path
integrals. To validate the effectiveness of the proposed approach,
numerical simulations are conducted on a cooperative UAV
team in two different scenarios.
I. INTRODUCTION
Optimal control has achieved remarkable success in both
theory and applications [1], [2]. Obtaining optimal control
usually requires solving a nonlinear, second-order partial
differential equation (PDE), known as Hamilton-Jacobi-
Bellman (HJB) equation. Stochastic optimal control (SOC)
problems involve solving the control problem by minimizing
expected costs [3]. By applying an exponential transfor-
mation to the value function [4], a linear-form HJB PDE
is obtained, enabling related research including linearly-
solvable optimal control (LSOC) [5] and path-integral control
(PIC) [3], [6]. The benefits of LSOC problems include
compositionality [7], [8] and the path-integral representation
of the optimal control solution. However, solving SOC
problems in large-scale systems is challenging due to the
curse of dimensionality [9]. To overcome computational
challenges, many approximation-based approaches have been
developed, such as path-integral (PI) formulation [10], value
function approximation [11], and policy approximation [12].
In [13], a PI approach is used to approximate optimal control
actions on multi-agent systems (MASs), and the optimal path
distribution is predicted using the graphical model inference
approach. A distributed PIC algorithm is proposed in [14],
in which a networked MAS is partitioned into multiple
subsystems, and local optimal control actions are determined
using local observations. However, these approaches seldom
consider safety in the problem formulation, which may limit
their real-world applications.
*This work is supported by Air Force Office of Scientific Research (AF-
SOR) (award #FA9550-21-1-0411) and National Aeronautics and Space Ad-
ministration (NASA) (awards #80NSSC22M0070 and #80NSSC17M0051).
1Lin Song, Pan Zhao, Neng Wan, and Naira Hovakimyan are with the
Department of Mechanical Science and Engineering, University of Illinois at
Urbana-Champaign, Urbana, IL 61801 USA {linsong2, panzhao2,
nengwan2, nhovakim}@illinois.edu
Safety refers to ensuring that a system’s states remain
within appropriate regions at all times for deterministic
systems, or with a high probability for stochastic systems.
Reachability analysis is a formal verification approach used
to prove safety and performance guarantees for dynamical
systems [15], [16]. Hamilton-Jacobi (HJ) reachability anal-
ysis identifies the initial states that the system needs to
avoid as well as the associated optimal control for the sake
of remaining safe [17]. However, computing the reachable
set in reachability analysis is typically expensive, making
it challenging to apply to multi-agent and high-dimensional
systems. To enable safe optimal control, safety metrics
can be incorporated into the optimal control framework,
either as objectives or constraints. In [18], temporal logic
specifications are used as constraints for safety enforcement
in optimal control development. The control barrier function
(CBF) is a potent tool that can be used to enforce system
safety by solving optimal control with constraints in a min-
imally invasive fashion [19]. CBF-based methods have also
been extended to stochastic systems with high-probability
guarantees [20]–[22]. A multi-agent CBF framework that
generates collision-free controllers is discussed in [23], [24].
Furthermore, guaranteed safety-constraint satisfaction in the
network system is achieved in [25] under a valid assume-
guarantee contract, with CBFs implemented onto subsys-
tems. However, implementing CBFs as safety filters into
the optimal control inputs may hinder ultimate optimality
and be typically reactive to given constraints. Additionally,
the feasibility of the quadratic programming (QP) introduced
by CBF-based methods was not always guaranteed until the
recent work in [26]. The barrier state (BaS) method is a novel
methodology studied in [27], where the stability analysis
of a BaS-augmented system encodes both stabilization and
safety of the original system, and thus potential conflicts be-
tween control objectives and safety enforcement are avoided.
In [28], discrete BaS (DBaS) is employed with differential
dynamic programming (DDP) in trajectory optimization, and
it has been shown that bounded DBaS implies the generation
of safe trajectories. The DBaSs have also been integrated into
importance sampling to improve sample efficiency in safety-
constrained sampling-based control problems in [29].
Compared to CBF-based methods that solve constrained
optimization problems to determine certified-safe control
actions, BaS-based safe control formulates the problem
without explicit constraints; the safety notion is embedded
in the solution boundedness, which prevents potential con-
flicts between control performance and safety requirements.
However, the methodology of addressing safety issues with-
1
arXiv:2210.03855v2 [eess.SY] 3 Apr 2023