4
words based on the fact that any infinite word that satisfies a
scLTL formula ϕcontains a finite “good” prefix such that all
infinite words that contain the prefix satisfy ϕ[2].
Example 1 (Reach-avoid Specification): Consider a robot
that navigates a workspace W={Xgoal, O1, . . . , Oc}, where
Xgoal ⊂Xis a set of goal states that the robot would like to
reach and O1, . . . , Oc⊂Xare obstacles that the robot needs
to avoid. The set of atomic propositions is given by AP =
{x∈Xgoal, x ∈O1, . . . , x ∈Oc}, where xis the state of
the robot. Then, a reach-avoid specification can be expressed
as ϕ=ϕliveness ∧ϕsafety, where ϕliveness =♦[0,H](x∈Xgoal)
requires the robot to reach the goal Xgoal in Htime steps and
ϕsafety =[0,H]Vi=1,...,c ¬(x∈Oi)specifies to avoid all the
obstacles during the time horizon H. Let ξ=x(0)x(1) . . . x(H)
be a trajectory of the robot, then the reach-avoid specification
ϕis interpreted as:
ξ|=ϕliveness ⇐⇒ ∃k∈ {0,...H}, x(k)∈Xgoal,
ξ|=ϕsafety ⇐⇒ ∀k∈ {0,...H},∀i∈ {1, . . . , c}, x(k)6∈ Oi.
E. Neural Network
To account for the stochastic behavior of a robot, we aim to
design a state-feedback neural network NN :X→Uthat can
achieve temporal motion and task specifications ϕ. An F-layer
Rectified Linear Unit (ReLU) NN is specified by composing
Flayer functions (or just layers). A layer lwith ilinputs and
oloutputs is specified by a weight matrix W(l)∈Rol×iland
a bias vector b(l)∈Rolas follows:
Lθ(l):z7→ max{W(l)z+b(l),0},(3)
where the max function is taken element-wise, and θ(l),
(W(l), b(l))for brevity. Thus, an F-layer ReLU NN is speci-
fied by Flayer functions {Lθ(l):l= 1, . . . , F }whose input
and output dimensions are composable: that is, they satisfy
il=ol−1,l= 2, . . . , F . Specifically:
NNθ(x)=(Lθ(F)◦Lθ(F−1) ◦ ··· ◦ Lθ(1) )(x),(4)
where we index a ReLU NN function by a list of parameters
θ,(θ(1), . . . , θ(F)). As a common practice, we allow the
output layer Lθ(F)to omit the max function. For simplicity
of notation, we drop the superscript θin NNθwhenever the
dependence on θis obvious.
F. Main Problem
We consider training a finite set (or a library) of ReLU NNs
(during the offline phase) and designing a selection algorithm
(during the online phase) that can select the correct NNs once
the exact task T= (t, ϕ, W, X0)is revealed at runtime. Before
formalizing the problem under consideration, we introduce the
following notion of neural network composition.
Definition II.1. Given a set (or a library) of neural networks
NN ={NN1,NN2,...,NNd}along with an activation map
Γ : X→ {1, . . . , d}, the composed neural network NN[NN,Γ]
is defined as: NN[NN,Γ](x) = NNΓ(x)(x).
In other words, the activation map Γselects the NN that
needs to be activated at each state x∈X. In addition
to achieving the motion and task specifications, the neural
network needs to minimize a given cost functional J. The
cost functional Jis defined as:
J(NN[NN,Γ]) = ZX
c(x, NN[NN,Γ](x))dµNN(x),(5)
where c:X×U→Ris a state-action cost function and µNN
is the distribution of states induced by the nominal dynamics
fin (2) under the control of NN[NN,Γ]. As an example, the
cost functional can be a controller’s energy J(NN[NN,Γ]) =
RX||NN[NN,Γ](x)||2dµNN(x).
Let ξx
NN[NN,Γ] be a closed-loop trajectory of a robot that
starts from the state x∈X0and evolves under the composed
neural network NN[NN,Γ]. We define the problem of interest
as follows:
Problem II.2. Given a cost functional J, train a library of
ReLU neural networks NN, and compute an activation map
Γat runtime when a task T= (t, ϕ, W, X0)is revealed,
such that the composed neural network minimizes the cost
J(NN[NN,Γ])and satisfies the specification ϕwith probability
at least p, i.e., Pr ξx
NN[NN,Γ] |=ϕ≥pfor any x∈X0.
G. Overview of the Neurosymbolic Framework
Our approach to designing the NN-based planner NN[NN,Γ]
can be split into two stages: offline training and runtime
selection. During the offline training phase, our algorithm
obtains a library of networks NN. At runtime, and to fulfill
unseen tasks using a finite set of neural networks NN, our
neurosymbolic framework bridges ideas from symbolic LTL-
based planning and machine learning. Similar to symbolic
LTL-based planning, our framework uses a hierarchical ap-
proach that consists of a “high-level” discrete planner and a
“low-level” continuous controller [8], [9], [12]. The “high-
level” discrete planner focuses on ensuring the satisfaction of
the LTL specification. At the same time, the “low-level” con-
trollers compute control actions that steer the robot to satisfy
the “high-level” plan. Unlike symbolic LTL-based planners,
our framework uses neural networks as low-level controllers,
thanks to their ability to handle complex nonlinear dynamic
constraints. In particular, the “high-level” planner chooses the
activation map Γto activate particular neural networks.
Nevertheless, to ensure the correctness of the proposed
framework, it is essential to ensure that each neural network in
NN satisfies some “formal” property. This “formal” property
allows the high-level planner to abstract the capabilities of
each of the neural networks in NN and hence choose the
correct activation map Γ. To that end, in Section III, we formu-
late the sub-problem of “formal NN training” that guarantees
the trained NNs satisfy certain formal properties, and solve
it efficiently by introducing a NN weight projection operator.
The solution to the formal training is used in Section IV-A
to obtain the library of networks NN offline. The associated
formal property of each NN is used in Section IV-B to design
the activation map Γ.
III. FORMAL TRAINING OF NNS
In this section, we study the sub-problem of training NNs
that are guaranteed to obey certain behaviors. In addition to the