1 Neurosymbolic Motion and Task Planning for Linear Temporal Logic Tasks

2025-04-28 0 0 7.18MB 18 页 10玖币
侵权投诉
1
Neurosymbolic Motion and Task Planning for
Linear Temporal Logic Tasks
Xiaowu Sun, Graduate Student Member, IEEE and Yasser Shoukry, Senior Member, IEEE
Abstract—This paper presents a neurosymbolic framework
to solve motion planning problems for mobile robots involving
temporal goals. The temporal goals are described using temporal
logic formulas such as Linear Temporal Logic (LTL) to capture
complex tasks. The proposed framework trains Neural Network
(NN)-based planners that enjoy strong correctness guarantees
when applying to unseen tasks, i.e., the exact task (including
workspace, LTL formula, and dynamic constraints of a robot) is
unknown during the training of NNs. Our approach to achieving
theoretical guarantees and computational efficiency is based
on two insights. First, we incorporate a symbolic model into
the training of NNs such that the resulting NN-based planner
inherits the interpretability and correctness guarantees of the
symbolic model. Moreover, the symbolic model serves as a
discrete “memory”, which is necessary for satisfying temporal
logic formulas. Second, we train a library of neural networks
offline and combine a subset of the trained NNs into a single NN-
based planner at runtime when a task is revealed. In particular,
we develop a novel constrained NN training procedure, named
formal NN training, to enforce that each neural network in
the library represents a “symbol” in the symbolic model. As
a result, our neurosymbolic framework enjoys the scalability
and flexibility benefits of machine learning and inherits the
provable guarantees from control-theoretic and formal-methods
techniques. We demonstrate the effectiveness of our framework
in both simulations and on an actual robotic vehicle, and show
that our framework can generalize to unknown tasks where state-
of-the-art meta-reinforcement learning techniques fail.
Index Terms—Formal methods, neural networks, meta-
reinforcement learning.
I. INTRODUCTION
DEVELOPING intelligent machines with a considerable
level of cognition dates to the early 1950s. With the
current rise of machine learning (ML) techniques, robotic
platforms are witnessing a breakthrough in their cognition.
Nevertheless, regardless of how many environments they were
trained (or programmed) to consider, such intelligent machines
will always face new environments which the human designer
failed to examine during the training phase. To circumvent the
lack of autonomous systems to adapt to new environments,
several researchers asked whether we could build autonomous
agents that can learn how to learn. In other words, while
conventional machine learning focuses on designing agents
that can perform one task, the so-called meta-learning aims
instead to solve the problem of designing agents that can
generalize to different tasks that were not considered during
the design or the training of these agents. For example, in
The authors are with the Department of Electrical Engineering and Com-
puter Science, University of California, Irvine, CA 92697, USA (e-mail:
{xiaowus,yshoukry}@uci.edu).
the context of meta-Reinforcement Learning (meta-RL), given
data collected from a multitude of tasks (e.g., changes in the
environments, goals, and robot dynamics), meta-RL aims to
combine all such experiences and use them to design agents
that can quickly adapt to unseen tasks. While the current
successes of meta-RL are undeniable, significant drawbacks
of meta-RL in its current form are (i) the lack of formal
guarantees on its ability to generalize to unseen tasks, (ii)
the lack of formal guarantees with regards to its safety and
(iii) the lack of interpretability due to the use of black-box
deep learning techniques.
In this paper, we focus on the problem of designing Neu-
ral Network (NN)-based task and motion planners that are
guaranteed to generalize to unseen tasks, enjoy strong safety
guarantees, and are interpretable. We consider agents who
need to accomplish temporal goals captured by temporal logic
formulas such as Linear Temporal Logic (LTL) [1], [2]. The
use of LTL in task and motion planning has been widely
studied (e.g., [3]–[14]) due to the ability of LTL formulas
to capture complex goals such as “eventually visit region A
followed by a visit to region B or region C while always
avoiding hitting obstacle D.” On the one hand, motion and task
planning using symbolic techniques enjoy the guarantees of
satisfying task specifications in temporal logic. Nevertheless,
these algorithms need an explicit model of the dynamic con-
straints of the robot and suffer from computational complexity
whenever such dynamic constraints are highly nonlinear and
complex. On the other hand, machine learning approaches
are capable of training NN planners without the explicit
knowledge of the dynamic constraints and scale favorably to
highly nonlinear and complex dynamics. Nevertheless, these
data-driven approaches suffer from the lack of safety and
generalization guarantees. Therefore, in this work, we aim to
design a novel neurosymbolic framework for motion and task
planning by combining the benefits of symbolic control and
machine learning techniques.
At the heart of the proposed framework is using a symbolic
model to guide the training of NNs and restricting the behavior
of NNs to “symbols” in the symbolic model. Specifically,
our framework consists of offline (or training) and online
(or runtime) phases. During the offline phase, we assume
access to a “nominal” simulator that approximates the dynamic
constraints of a robot. We assume no knowledge of the
exact task (e.g., workspace, LTL formula, and exact dynamic
constraints of a robot). We use this information to train a
“library” of NNs through a novel NN training procedure,
named formal NN training, which enforces each trained NN
to represent a continuous piece-wise affine (CPWA) function
arXiv:2210.05180v1 [cs.RO] 11 Oct 2022
2
from a chosen family of CPWA functions. The exact task
becomes available only during the online (or runtime) phase.
Given the dynamic constraints of a robot, we compute a finite-
state Markov decision process (MDP) as our symbolic model.
Thanks to the formal NN training procedure, the symbolic
model can be constructed so that each of the trained NNs in
the library represents a transition in the MDP (and hence a
symbol in this MDP). By analyzing this symbolic model, our
framework selects NNs from the library and combines them
into a single NN-based planner to perform the task and motion
planning.
In summary, the main contributions of this article are:
1) We propose a neurosymbolic framework that integrates
machine learning and symbolic techniques in training
NN-based planners for an agent to accomplish unseen
tasks. Thanks to the use of a symbolic model, the
resulting NN-based planners are guaranteed to satisfy the
temporal goals described in linear temporal logic formu-
las, which cannot be satisfied by existing NN training
algorithms.
2) We develop a formal training algorithm that restricts
the trained NNs to specific local behavior. The training
procedure combines classical gradient descent training
of NNs with a novel NN weight projection operator
that modifies the NN weights as little as possible to
ensure the trained NN belongs to a chosen family of
CPWA functions. We provide theoretical guarantees on
the proposed NN weight projection operator in terms of
correctness and upper bounds on the error between the
NN before and after the projection.
3) We provide a theoretical analysis of the overall neu-
rosymbolic framework. We show theoretical guarantees
that govern the correctness of the resulting NN-based
planners when generalizing to unseen tasks, including
unknown workspaces, unknown temporal logic formulas,
and uncertain dynamic constraints.
4) We pursue the high performance of the proposed frame-
work in fast adaptation to unseen tasks with efficient
training. For example, we accelerate the training of NNs
by employing ideas from transfer learning and construct-
ing the symbolic model using a data-driven approach. We
validate the effectiveness of the proposed framework on
an actual robotic vehicle and demonstrate that our frame-
work can generalize to unknown tasks where state-of-the-
art meta-RL techniques are known to fail (e.g., when the
tasks are chosen from across homotopy classes [15]).
The remainder of the paper is organized as follows. After the
problem formulation in Section II, we present the formal NN
training algorithm in Section III. In Section IV, we introduce
the neurosymbolic framework that uses the formal NN training
algorithm to obtain a library of NNs and combines them
into a single NN-based planner at runtime. In Section V, we
provide theoretical guarantees of the proposed framework. In
Section VI, we present some key elements for performance
improvement while maintaining the same theoretical guaran-
tees. Experimental results are given in Section VII, and all
proofs can be found in the appendix.
Comparison with the preliminary results: A preliminary
version of this article was presented in [16]. In [16], we
confined our goal to generating collision-free trajectories,
whereas in this work we consider agents that need to satisfy
general temporal logic formulas such as LTL. Also, we allow
the temporal logic formulas and the exact robot dynamics to
be unknown during the training of NNs. In this article, we
present for the first time the formal NN training algorithm (see
Section III). Moreover, we present a theoretical analysis of the
proposed framework (see Section V). All the speedup tech-
niques in Section VI, the implementation of our framework
on an actual robotic vehicle, and the performance comparison
with meta-RL algorithms are also new in this article.
Related work: The literature on the safe design of ML-
based motion and task planners can be classified according to
three broad approaches, namely (i) incorporating safety in the
training of ML-based planners, (ii) post-training verification of
ML models, and (iii) online validation of safety and control
intervention. Representative examples of the first approach
include reward-shaping [17], [18], Bayesian and robust regres-
sion [19]–[21], and policy optimization with constraints [22]–
[24]. Unfortunately, these approaches do not provide provable
guarantees about the safety of the trained ML-based planners.
To provide strong safety and reliability guarantees, several
works in the literature focus on applying formal verification
techniques (e.g., model checking) to verify pre-trained ML
models against formal safety properties. Representative exam-
ples of this approach include the use of SMT-like solvers [25]–
[30] and hybrid-system verification [31]–[33]. However, these
techniques only assess a given ML-based planner’s safety
rather than design or train a safe agent.
Due to the lack of safety guarantees on the resulting
ML-based planners, researchers proposed several techniques
to restrict the output of the ML models to a set of safe
control actions. Such a set of safe actions can be obtained
through Hamilton-Jacobi analysis [34], [35] and barrier cer-
tificates [36]–[42]. Unfortunately, methods of this type suffer
from being computationally expensive, specific to certain
controller structures, or requiring assumptions on the system
model. Other techniques in this domain include synthesizing
a safety layer (shield) based on model predictive control
with the assumption of safe terminal sets [43]–[45], logically-
constrained reinforcement learning [46]–[48], and Lyapunov
methods [49]–[51] that focus on providing stability guarantees
rather than safety or general temporal logic guarantees.
The idea of learning neurosymbolic models is studied in
works [52]–[54] that use NNs to guide the synthesis of control
policies represented as short programs. The algorithms in [52]–
[54] train a NN controller, project it to the space of program
languages, analyze the short programs, and lift the programs
back to the space of NNs for further training. These works
focus on tasks given during the training of NNs, and the final
controller is a short program. Another line of related work is
reported in [55], [56], which study the problem of extracting a
finite-state controller from a recurrent neural network. Unlike
the above works, we consider temporal logic specifications and
unseen tasks, and our final planner is NNs in tandem with a
finite-state MDP.
3
II. PROBLEM FORMULATION
A. Notations
Let R,R+,Nbe the set of real numbers, positive real
numbers, and natural numbers, respectively. For a non-empty
set S, let 2Sbe the power set of S,1Sbe the indicator
function of S, and Int(S)be the interior of S. Furthermore,
we use Snto denote the set of all finite sequences of length
nNof elements in S. The product of two sets is defined
as S1×S2:= {(s1, s2)|s1S1, s2S2}. Let ||x|| be the
Euclidean norm of a vector xRn,||A|| be the induced 2-
norm of a matrix ARm×n, and ||A||max = max
i,j |Aij |be the
max norm of a matrix A. Any Borel space Xis assumed to
be endowed with a Borel σ-algebra denoted by B(X).
B. Assumptions and Information Structure
We consider a meta-RL setting that aims to train neural
networks for controlling a robot to achieve tasks that were
unseen during training. To be specific, we denote a task by
a tuple T= (t, ϕ, W, X0), where tcaptures the dynamic
constraints of a robot (see Section II-C), ϕis a Linear
Temporal Logic (LTL) formula that defines the mission for a
robot to accomplish (see Section II-D), Wis a workspace (or
an environment) in which a robot operates, and X0contains
the initials states of a robot. Furthermore, we use Jto denote
a cost functional of controllers, and the cost of using a neural
network NN is given by J(NN)(see Section II-F).
During training, we assume the availability of the cost
functional Jand an approximation of the dynamical model
t(see Section II-C for details). The mission specification ϕ,
the workspace W, and the set of initial states X0are un-
known during training and only become available at runtime.
Despite the limited knowledge of tasks during training, we
aim to design provably correct NNs for unseen tasks Twhile
minimizing some given cost J.
C. Dynamical Model
We consider robotic systems that can be modeled as
stochastic, discrete-time, nonlinear dynamical systems with a
transition probability of the form:
Pr(x0A|x, u) = ZA
t(dx0|x, u),(1)
where states of a robot xXand control actions uU
are from continuous state and action spaces XRnand
URm, respectively. In (1), we use t:B(X)×X×U
[0,1] to denote a stochastic kernel that assigns to any state
xXand action uUa probability measure t(·|x, u).
Then, Pr(x0A|x, u)is the probability of reaching a subset
A∈ B(X)in one time step from state xXunder
action uU. We assume that tconsists of a priori known
nominal model fand an unknown model-error gcapturing
unmodeled dynamics. As a well-studied technique to learn
unknown functions from data, we assume the model-error
gcan be learned by a Gaussian Process (GP) regression
model GP(µg, σ2
g), where µgand σ2
gare the posterior mean
and variance functions, respectively [57]. Hence, we can re-
write (1) as:
Pr(x0A|x, u) = f(x, u) + ZA
g(dx0|x, u),(2)
which is an integral of normal distribution and hence can be
easily computed.
We assume the nominal model fis given during the NN
training phase, while the model-error gis evaluated at runtime,
and hence the exact stochastic kernel tonly becomes known
at runtime. This allows us to apply the trained NN to various
robotic systems with different dynamics captured by the model
error g.
Remark: We note that our algorithm does not require
the knowledge of the function fin a closed-form/symbolic
representation. Access to a simulator would suffice.
D. Temporal Logic Specification and Workspace
A well-known weakness of RL and meta-RL algorithms
is the difficulty in designing reward functions that capture
the exact intent of designers [46], [47], [58]. Agent behavior
that scores high according to a user-defined reward function
may not be aligned with the user’s intention, which is often
referred to as “specification gaming” [59]. To that end, we
adopt the representation of an agent’s mission in temporal
logic specifications, which have been extensively demonstrated
the capability to capture complex behaviors of robotic systems.
In particular, we consider mission specifications defined
in either bounded linear temporal logic (BLTL) [1] or syn-
tactically co-safe linear temporal logic (scLTL) [2]. Let AP
be a finite set of atomic propositions that describe a robotic
system’s states with respect to a workspace W. For example,
these atomic propositions can describe the location of a robot
with respect to the obstacles to avoid and the goal location
to achieve. Given AP , any BLTL formula can be generated
according to the following grammar:
ϕ:= σ| ¬ϕ|ϕ1ϕ2|ϕ1U[k1,k2]ϕ2
where σAP and time steps k1< k2. Given the above gram-
mar, we can define ϕ1ϕ2=¬(¬ϕ1¬ϕ2),false =ϕ¬ϕ,
and true =¬false. Furthermore, the bounded-time eventu-
ally operator can be derived as [k1,k2]ϕ=true U[k1,k2]ϕ
and the bounded-time always operator is given by [k1,k2]ϕ=
¬[k1,k2]¬ϕ.
Given a set of atomic propositions AP , the corresponding
alphabet is defined as A:= 2AP , and a finite (infinite) word ω
is a finite (infinite) sequence of letters from the alphabet A, i.e.,
ω=ω(0)ω(1) . . . ω(H)AH+1. The satisfaction of a word ω
to a specification ϕcan be determined based on the semantics
of BLTL [1]. Given a robotic system and an alphabet A, let L:
XAbe a labeling function that assigns to each state xX
the subset of atomic propositions L(x)Athat evaluate true
at x. Then, a robotic system’s trajectory ξsatisfies a specifica-
tion ϕ, denoted by ξ|=ϕ, if the corresponding word satisfies
ϕ, i.e., L(ξ)|=ϕ, where ξ=x(0)x(1) . . . x(H)XH+1
and L(ξ) = L(x(0))L(x(1)). . . L(x(H))AH+1. Similarly,
we can consider scLTL specifications interpreted over infinite
4
words based on the fact that any infinite word that satisfies a
scLTL formula ϕcontains a finite “good” prefix such that all
infinite words that contain the prefix satisfy ϕ[2].
Example 1 (Reach-avoid Specification): Consider a robot
that navigates a workspace W={Xgoal, O1, . . . , Oc}, where
Xgoal Xis a set of goal states that the robot would like to
reach and O1, . . . , OcXare obstacles that the robot needs
to avoid. The set of atomic propositions is given by AP =
{xXgoal, x O1, . . . , x Oc}, where xis the state of
the robot. Then, a reach-avoid specification can be expressed
as ϕ=ϕliveness ϕsafety, where ϕliveness =[0,H](xXgoal)
requires the robot to reach the goal Xgoal in Htime steps and
ϕsafety =[0,H]Vi=1,...,c ¬(xOi)specifies to avoid all the
obstacles during the time horizon H. Let ξ=x(0)x(1) . . . x(H)
be a trajectory of the robot, then the reach-avoid specification
ϕis interpreted as:
ξ|=ϕliveness ⇒ ∃k∈ {0,...H}, x(k)Xgoal,
ξ|=ϕsafety ⇒ ∀k∈ {0,...H},i∈ {1, . . . , c}, x(k)6∈ Oi.
E. Neural Network
To account for the stochastic behavior of a robot, we aim to
design a state-feedback neural network NN :XUthat can
achieve temporal motion and task specifications ϕ. An F-layer
Rectified Linear Unit (ReLU) NN is specified by composing
Flayer functions (or just layers). A layer lwith ilinputs and
oloutputs is specified by a weight matrix W(l)Rol×iland
a bias vector b(l)Rolas follows:
Lθ(l):z7→ max{W(l)z+b(l),0},(3)
where the max function is taken element-wise, and θ(l),
(W(l), b(l))for brevity. Thus, an F-layer ReLU NN is speci-
fied by Flayer functions {Lθ(l):l= 1, . . . , F }whose input
and output dimensions are composable: that is, they satisfy
il=ol1,l= 2, . . . , F . Specifically:
NNθ(x)=(Lθ(F)Lθ(F1) ◦ ··· ◦ Lθ(1) )(x),(4)
where we index a ReLU NN function by a list of parameters
θ,(θ(1), . . . , θ(F)). As a common practice, we allow the
output layer Lθ(F)to omit the max function. For simplicity
of notation, we drop the superscript θin NNθwhenever the
dependence on θis obvious.
F. Main Problem
We consider training a finite set (or a library) of ReLU NNs
(during the offline phase) and designing a selection algorithm
(during the online phase) that can select the correct NNs once
the exact task T= (t, ϕ, W, X0)is revealed at runtime. Before
formalizing the problem under consideration, we introduce the
following notion of neural network composition.
Definition II.1. Given a set (or a library) of neural networks
NN ={NN1,NN2,...,NNd}along with an activation map
Γ : X→ {1, . . . , d}, the composed neural network NN[NN,Γ]
is defined as: NN[NN,Γ](x) = NNΓ(x)(x).
In other words, the activation map Γselects the NN that
needs to be activated at each state xX. In addition
to achieving the motion and task specifications, the neural
network needs to minimize a given cost functional J. The
cost functional Jis defined as:
J(NN[NN,Γ]) = ZX
c(x, NN[NN,Γ](x))NN(x),(5)
where c:X×URis a state-action cost function and µNN
is the distribution of states induced by the nominal dynamics
fin (2) under the control of NN[NN,Γ]. As an example, the
cost functional can be a controller’s energy J(NN[NN,Γ]) =
RX||NN[NN,Γ](x)||2NN(x).
Let ξx
NN[NN,Γ] be a closed-loop trajectory of a robot that
starts from the state xX0and evolves under the composed
neural network NN[NN,Γ]. We define the problem of interest
as follows:
Problem II.2. Given a cost functional J, train a library of
ReLU neural networks NN, and compute an activation map
Γat runtime when a task T= (t, ϕ, W, X0)is revealed,
such that the composed neural network minimizes the cost
J(NN[NN,Γ])and satisfies the specification ϕwith probability
at least p, i.e., Pr ξx
NN[NN,Γ] |=ϕpfor any xX0.
G. Overview of the Neurosymbolic Framework
Our approach to designing the NN-based planner NN[NN,Γ]
can be split into two stages: offline training and runtime
selection. During the offline training phase, our algorithm
obtains a library of networks NN. At runtime, and to fulfill
unseen tasks using a finite set of neural networks NN, our
neurosymbolic framework bridges ideas from symbolic LTL-
based planning and machine learning. Similar to symbolic
LTL-based planning, our framework uses a hierarchical ap-
proach that consists of a “high-level” discrete planner and a
“low-level” continuous controller [8], [9], [12]. The “high-
level” discrete planner focuses on ensuring the satisfaction of
the LTL specification. At the same time, the “low-level” con-
trollers compute control actions that steer the robot to satisfy
the “high-level” plan. Unlike symbolic LTL-based planners,
our framework uses neural networks as low-level controllers,
thanks to their ability to handle complex nonlinear dynamic
constraints. In particular, the “high-level” planner chooses the
activation map Γto activate particular neural networks.
Nevertheless, to ensure the correctness of the proposed
framework, it is essential to ensure that each neural network in
NN satisfies some “formal” property. This “formal” property
allows the high-level planner to abstract the capabilities of
each of the neural networks in NN and hence choose the
correct activation map Γ. To that end, in Section III, we formu-
late the sub-problem of “formal NN training” that guarantees
the trained NNs satisfy certain formal properties, and solve
it efficiently by introducing a NN weight projection operator.
The solution to the formal training is used in Section IV-A
to obtain the library of networks NN offline. The associated
formal property of each NN is used in Section IV-B to design
the activation map Γ.
III. FORMAL TRAINING OF NNS
In this section, we study the sub-problem of training NNs
that are guaranteed to obey certain behaviors. In addition to the
摘要:

1NeurosymbolicMotionandTaskPlanningforLinearTemporalLogicTasksXiaowuSun,GraduateStudentMember,IEEEandYasserShoukry,SeniorMember,IEEEAbstract—Thispaperpresentsaneurosymbolicframeworktosolvemotionplanningproblemsformobilerobotsinvolvingtemporalgoals.Thetemporalgoalsaredescribedusingtemporallogicformul...

展开>> 收起<<
1 Neurosymbolic Motion and Task Planning for Linear Temporal Logic Tasks.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:7.18MB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注