1 Neurosymbolic Motion and Task Planning for Linear Temporal Logic Tasks

2025-04-28 0 0 7.18MB 18 页 10玖币

侵权投诉

Neurosymbolic Motion and Task Planning for

Linear Temporal Logic Tasks

Xiaowu Sun, Graduate Student Member, IEEE and Yasser Shoukry, Senior Member, IEEE

Abstract—This paper presents a neurosymbolic framework

to solve motion planning problems for mobile robots involving

temporal goals. The temporal goals are described using temporal

logic formulas such as Linear Temporal Logic (LTL) to capture

complex tasks. The proposed framework trains Neural Network

(NN)-based planners that enjoy strong correctness guarantees

when applying to unseen tasks, i.e., the exact task (including

workspace, LTL formula, and dynamic constraints of a robot) is

unknown during the training of NNs. Our approach to achieving

theoretical guarantees and computational efﬁciency is based

on two insights. First, we incorporate a symbolic model into

the training of NNs such that the resulting NN-based planner

inherits the interpretability and correctness guarantees of the

symbolic model. Moreover, the symbolic model serves as a

discrete “memory”, which is necessary for satisfying temporal

logic formulas. Second, we train a library of neural networks

ofﬂine and combine a subset of the trained NNs into a single NN-

based planner at runtime when a task is revealed. In particular,

we develop a novel constrained NN training procedure, named

formal NN training, to enforce that each neural network in

the library represents a “symbol” in the symbolic model. As

a result, our neurosymbolic framework enjoys the scalability

and ﬂexibility beneﬁts of machine learning and inherits the

provable guarantees from control-theoretic and formal-methods

techniques. We demonstrate the effectiveness of our framework

in both simulations and on an actual robotic vehicle, and show

that our framework can generalize to unknown tasks where state-

of-the-art meta-reinforcement learning techniques fail.

Index Terms—Formal methods, neural networks, meta-

reinforcement learning.

I. INTRODUCTION

DEVELOPING intelligent machines with a considerable

level of cognition dates to the early 1950s. With the

current rise of machine learning (ML) techniques, robotic

platforms are witnessing a breakthrough in their cognition.

Nevertheless, regardless of how many environments they were

trained (or programmed) to consider, such intelligent machines

will always face new environments which the human designer

failed to examine during the training phase. To circumvent the

lack of autonomous systems to adapt to new environments,

several researchers asked whether we could build autonomous

agents that can learn how to learn. In other words, while

conventional machine learning focuses on designing agents

that can perform one task, the so-called meta-learning aims

instead to solve the problem of designing agents that can

generalize to different tasks that were not considered during

the design or the training of these agents. For example, in

The authors are with the Department of Electrical Engineering and Com-

puter Science, University of California, Irvine, CA 92697, USA (e-mail:

{xiaowus,yshoukry}@uci.edu).

the context of meta-Reinforcement Learning (meta-RL), given

data collected from a multitude of tasks (e.g., changes in the

environments, goals, and robot dynamics), meta-RL aims to

combine all such experiences and use them to design agents

that can quickly adapt to unseen tasks. While the current

successes of meta-RL are undeniable, signiﬁcant drawbacks

of meta-RL in its current form are (i) the lack of formal

guarantees on its ability to generalize to unseen tasks, (ii)

the lack of formal guarantees with regards to its safety and

(iii) the lack of interpretability due to the use of black-box

deep learning techniques.

In this paper, we focus on the problem of designing Neu-

ral Network (NN)-based task and motion planners that are

guaranteed to generalize to unseen tasks, enjoy strong safety

guarantees, and are interpretable. We consider agents who

need to accomplish temporal goals captured by temporal logic

formulas such as Linear Temporal Logic (LTL) [1], [2]. The

use of LTL in task and motion planning has been widely

studied (e.g., [3]–[14]) due to the ability of LTL formulas

to capture complex goals such as “eventually visit region A

followed by a visit to region B or region C while always

avoiding hitting obstacle D.” On the one hand, motion and task

planning using symbolic techniques enjoy the guarantees of

satisfying task speciﬁcations in temporal logic. Nevertheless,

these algorithms need an explicit model of the dynamic con-

straints of the robot and suffer from computational complexity

whenever such dynamic constraints are highly nonlinear and

complex. On the other hand, machine learning approaches

are capable of training NN planners without the explicit

knowledge of the dynamic constraints and scale favorably to

highly nonlinear and complex dynamics. Nevertheless, these

data-driven approaches suffer from the lack of safety and

generalization guarantees. Therefore, in this work, we aim to

design a novel neurosymbolic framework for motion and task

planning by combining the beneﬁts of symbolic control and

machine learning techniques.

At the heart of the proposed framework is using a symbolic

model to guide the training of NNs and restricting the behavior

of NNs to “symbols” in the symbolic model. Speciﬁcally,

our framework consists of ofﬂine (or training) and online

(or runtime) phases. During the ofﬂine phase, we assume

access to a “nominal” simulator that approximates the dynamic

constraints of a robot. We assume no knowledge of the

exact task (e.g., workspace, LTL formula, and exact dynamic

constraints of a robot). We use this information to train a

“library” of NNs through a novel NN training procedure,

named formal NN training, which enforces each trained NN

to represent a continuous piece-wise afﬁne (CPWA) function

arXiv:2210.05180v1 [cs.RO] 11 Oct 2022

from a chosen family of CPWA functions. The exact task

becomes available only during the online (or runtime) phase.

Given the dynamic constraints of a robot, we compute a ﬁnite-

state Markov decision process (MDP) as our symbolic model.

Thanks to the formal NN training procedure, the symbolic

model can be constructed so that each of the trained NNs in

the library represents a transition in the MDP (and hence a

symbol in this MDP). By analyzing this symbolic model, our

framework selects NNs from the library and combines them

into a single NN-based planner to perform the task and motion

planning.

In summary, the main contributions of this article are:

1) We propose a neurosymbolic framework that integrates

machine learning and symbolic techniques in training

NN-based planners for an agent to accomplish unseen

tasks. Thanks to the use of a symbolic model, the

resulting NN-based planners are guaranteed to satisfy the

temporal goals described in linear temporal logic formu-

las, which cannot be satisﬁed by existing NN training

algorithms.

2) We develop a formal training algorithm that restricts

the trained NNs to speciﬁc local behavior. The training

procedure combines classical gradient descent training

of NNs with a novel NN weight projection operator

that modiﬁes the NN weights as little as possible to

ensure the trained NN belongs to a chosen family of

CPWA functions. We provide theoretical guarantees on

the proposed NN weight projection operator in terms of

correctness and upper bounds on the error between the

NN before and after the projection.

3) We provide a theoretical analysis of the overall neu-

rosymbolic framework. We show theoretical guarantees

that govern the correctness of the resulting NN-based

planners when generalizing to unseen tasks, including

unknown workspaces, unknown temporal logic formulas,

and uncertain dynamic constraints.

4) We pursue the high performance of the proposed frame-

work in fast adaptation to unseen tasks with efﬁcient

training. For example, we accelerate the training of NNs

by employing ideas from transfer learning and construct-

ing the symbolic model using a data-driven approach. We

validate the effectiveness of the proposed framework on

an actual robotic vehicle and demonstrate that our frame-

work can generalize to unknown tasks where state-of-the-

art meta-RL techniques are known to fail (e.g., when the

tasks are chosen from across homotopy classes [15]).

The remainder of the paper is organized as follows. After the

problem formulation in Section II, we present the formal NN

training algorithm in Section III. In Section IV, we introduce

the neurosymbolic framework that uses the formal NN training

algorithm to obtain a library of NNs and combines them

into a single NN-based planner at runtime. In Section V, we

provide theoretical guarantees of the proposed framework. In

Section VI, we present some key elements for performance

improvement while maintaining the same theoretical guaran-

tees. Experimental results are given in Section VII, and all

proofs can be found in the appendix.

Comparison with the preliminary results: A preliminary

version of this article was presented in [16]. In [16], we

conﬁned our goal to generating collision-free trajectories,

whereas in this work we consider agents that need to satisfy

general temporal logic formulas such as LTL. Also, we allow

the temporal logic formulas and the exact robot dynamics to

be unknown during the training of NNs. In this article, we

present for the ﬁrst time the formal NN training algorithm (see

Section III). Moreover, we present a theoretical analysis of the

proposed framework (see Section V). All the speedup tech-

niques in Section VI, the implementation of our framework

on an actual robotic vehicle, and the performance comparison

with meta-RL algorithms are also new in this article.

Related work: The literature on the safe design of ML-

based motion and task planners can be classiﬁed according to

three broad approaches, namely (i) incorporating safety in the

training of ML-based planners, (ii) post-training veriﬁcation of

ML models, and (iii) online validation of safety and control

intervention. Representative examples of the ﬁrst approach

include reward-shaping [17], [18], Bayesian and robust regres-

sion [19]–[21], and policy optimization with constraints [22]–

[24]. Unfortunately, these approaches do not provide provable

guarantees about the safety of the trained ML-based planners.

To provide strong safety and reliability guarantees, several

works in the literature focus on applying formal veriﬁcation

techniques (e.g., model checking) to verify pre-trained ML

models against formal safety properties. Representative exam-

ples of this approach include the use of SMT-like solvers [25]–

[30] and hybrid-system veriﬁcation [31]–[33]. However, these

techniques only assess a given ML-based planner’s safety

rather than design or train a safe agent.

Due to the lack of safety guarantees on the resulting

ML-based planners, researchers proposed several techniques

to restrict the output of the ML models to a set of safe

control actions. Such a set of safe actions can be obtained

through Hamilton-Jacobi analysis [34], [35] and barrier cer-

tiﬁcates [36]–[42]. Unfortunately, methods of this type suffer

from being computationally expensive, speciﬁc to certain

controller structures, or requiring assumptions on the system

model. Other techniques in this domain include synthesizing

a safety layer (shield) based on model predictive control

with the assumption of safe terminal sets [43]–[45], logically-

constrained reinforcement learning [46]–[48], and Lyapunov

methods [49]–[51] that focus on providing stability guarantees

rather than safety or general temporal logic guarantees.

The idea of learning neurosymbolic models is studied in

works [52]–[54] that use NNs to guide the synthesis of control

policies represented as short programs. The algorithms in [52]–

[54] train a NN controller, project it to the space of program

languages, analyze the short programs, and lift the programs

back to the space of NNs for further training. These works

focus on tasks given during the training of NNs, and the ﬁnal

controller is a short program. Another line of related work is

reported in [55], [56], which study the problem of extracting a

ﬁnite-state controller from a recurrent neural network. Unlike

the above works, we consider temporal logic speciﬁcations and

unseen tasks, and our ﬁnal planner is NNs in tandem with a

ﬁnite-state MDP.

II. PROBLEM FORMULATION

A. Notations

Let R,R+,Nbe the set of real numbers, positive real

numbers, and natural numbers, respectively. For a non-empty

set S, let 2Sbe the power set of S,1Sbe the indicator

function of S, and Int(S)be the interior of S. Furthermore,

we use Snto denote the set of all ﬁnite sequences of length

n∈Nof elements in S. The product of two sets is deﬁned

as S1×S2:= {(s1, s2)|s1∈S1, s2∈S2}. Let ||x|| be the

Euclidean norm of a vector x∈Rn,||A|| be the induced 2-

norm of a matrix A∈Rm×n, and ||A||max = max

i,j |Aij |be the

max norm of a matrix A. Any Borel space Xis assumed to

be endowed with a Borel σ-algebra denoted by B(X).

B. Assumptions and Information Structure

We consider a meta-RL setting that aims to train neural

networks for controlling a robot to achieve tasks that were

unseen during training. To be speciﬁc, we denote a task by

a tuple T= (t, ϕ, W, X0), where tcaptures the dynamic

constraints of a robot (see Section II-C), ϕis a Linear

Temporal Logic (LTL) formula that deﬁnes the mission for a

robot to accomplish (see Section II-D), Wis a workspace (or

an environment) in which a robot operates, and X0contains

the initials states of a robot. Furthermore, we use Jto denote

a cost functional of controllers, and the cost of using a neural

network NN is given by J(NN)(see Section II-F).

During training, we assume the availability of the cost

functional Jand an approximation of the dynamical model

t(see Section II-C for details). The mission speciﬁcation ϕ,

the workspace W, and the set of initial states X0are un-

known during training and only become available at runtime.

Despite the limited knowledge of tasks during training, we

aim to design provably correct NNs for unseen tasks Twhile

minimizing some given cost J.

C. Dynamical Model

We consider robotic systems that can be modeled as

stochastic, discrete-time, nonlinear dynamical systems with a

transition probability of the form:

Pr(x0∈A|x, u) = ZA

t(dx0|x, u),(1)

where states of a robot x∈Xand control actions u∈U

are from continuous state and action spaces X⊂Rnand

U⊂Rm, respectively. In (1), we use t:B(X)×X×U→

[0,1] to denote a stochastic kernel that assigns to any state

x∈Xand action u∈Ua probability measure t(·|x, u).

Then, Pr(x0∈A|x, u)is the probability of reaching a subset

A∈ B(X)in one time step from state x∈Xunder

action u∈U. We assume that tconsists of a priori known

nominal model fand an unknown model-error gcapturing

unmodeled dynamics. As a well-studied technique to learn

unknown functions from data, we assume the model-error

gcan be learned by a Gaussian Process (GP) regression

model GP(µg, σ2

g), where µgand σ2

gare the posterior mean

and variance functions, respectively [57]. Hence, we can re-

write (1) as:

Pr(x0∈A|x, u) = f(x, u) + ZA

g(dx0|x, u),(2)

which is an integral of normal distribution and hence can be

easily computed.

We assume the nominal model fis given during the NN

training phase, while the model-error gis evaluated at runtime,

and hence the exact stochastic kernel tonly becomes known

at runtime. This allows us to apply the trained NN to various

robotic systems with different dynamics captured by the model

error g.

Remark: We note that our algorithm does not require

the knowledge of the function fin a closed-form/symbolic

representation. Access to a simulator would sufﬁce.

D. Temporal Logic Speciﬁcation and Workspace

A well-known weakness of RL and meta-RL algorithms

is the difﬁculty in designing reward functions that capture

the exact intent of designers [46], [47], [58]. Agent behavior

that scores high according to a user-deﬁned reward function

may not be aligned with the user’s intention, which is often

referred to as “speciﬁcation gaming” [59]. To that end, we

adopt the representation of an agent’s mission in temporal

logic speciﬁcations, which have been extensively demonstrated

the capability to capture complex behaviors of robotic systems.

In particular, we consider mission speciﬁcations deﬁned

in either bounded linear temporal logic (BLTL) [1] or syn-

tactically co-safe linear temporal logic (scLTL) [2]. Let AP

be a ﬁnite set of atomic propositions that describe a robotic

system’s states with respect to a workspace W. For example,

these atomic propositions can describe the location of a robot

with respect to the obstacles to avoid and the goal location

to achieve. Given AP , any BLTL formula can be generated

according to the following grammar:

ϕ:= σ| ¬ϕ|ϕ1∨ϕ2|ϕ1U[k1,k2]ϕ2

where σ∈AP and time steps k1< k2. Given the above gram-

mar, we can deﬁne ϕ1∧ϕ2=¬(¬ϕ1∨¬ϕ2),false =ϕ∧¬ϕ,

and true =¬false. Furthermore, the bounded-time eventu-

ally operator can be derived as ♦[k1,k2]ϕ=true U[k1,k2]ϕ

and the bounded-time always operator is given by [k1,k2]ϕ=

¬♦[k1,k2]¬ϕ.

Given a set of atomic propositions AP , the corresponding

alphabet is deﬁned as A:= 2AP , and a ﬁnite (inﬁnite) word ω

is a ﬁnite (inﬁnite) sequence of letters from the alphabet A, i.e.,

ω=ω(0)ω(1) . . . ω(H)∈AH+1. The satisfaction of a word ω

to a speciﬁcation ϕcan be determined based on the semantics

of BLTL [1]. Given a robotic system and an alphabet A, let L:

X→Abe a labeling function that assigns to each state x∈X

the subset of atomic propositions L(x)∈Athat evaluate true

at x. Then, a robotic system’s trajectory ξsatisﬁes a speciﬁca-

tion ϕ, denoted by ξ|=ϕ, if the corresponding word satisﬁes

ϕ, i.e., L(ξ)|=ϕ, where ξ=x(0)x(1) . . . x(H)∈XH+1

and L(ξ) = L(x(0))L(x(1)). . . L(x(H))∈AH+1. Similarly,

we can consider scLTL speciﬁcations interpreted over inﬁnite

words based on the fact that any inﬁnite word that satisﬁes a

scLTL formula ϕcontains a ﬁnite “good” preﬁx such that all

inﬁnite words that contain the preﬁx satisfy ϕ[2].

Example 1 (Reach-avoid Speciﬁcation): Consider a robot

that navigates a workspace W={Xgoal, O1, . . . , Oc}, where

Xgoal ⊂Xis a set of goal states that the robot would like to

reach and O1, . . . , Oc⊂Xare obstacles that the robot needs

to avoid. The set of atomic propositions is given by AP =

{x∈Xgoal, x ∈O1, . . . , x ∈Oc}, where xis the state of

the robot. Then, a reach-avoid speciﬁcation can be expressed

as ϕ=ϕliveness ∧ϕsafety, where ϕliveness =♦[0,H](x∈Xgoal)

requires the robot to reach the goal Xgoal in Htime steps and

ϕsafety =[0,H]Vi=1,...,c ¬(x∈Oi)speciﬁes to avoid all the

obstacles during the time horizon H. Let ξ=x(0)x(1) . . . x(H)

be a trajectory of the robot, then the reach-avoid speciﬁcation

ϕis interpreted as:

ξ|=ϕliveness ⇐⇒ ∃k∈ {0,...H}, x(k)∈Xgoal,

ξ|=ϕsafety ⇐⇒ ∀k∈ {0,...H},∀i∈ {1, . . . , c}, x(k)6∈ Oi.

E. Neural Network

To account for the stochastic behavior of a robot, we aim to

design a state-feedback neural network NN :X→Uthat can

achieve temporal motion and task speciﬁcations ϕ. An F-layer

Rectiﬁed Linear Unit (ReLU) NN is speciﬁed by composing

Flayer functions (or just layers). A layer lwith ilinputs and

oloutputs is speciﬁed by a weight matrix W(l)∈Rol×iland

a bias vector b(l)∈Rolas follows:

Lθ(l):z7→ max{W(l)z+b(l),0},(3)

where the max function is taken element-wise, and θ(l),

(W(l), b(l))for brevity. Thus, an F-layer ReLU NN is speci-

ﬁed by Flayer functions {Lθ(l):l= 1, . . . , F }whose input

and output dimensions are composable: that is, they satisfy

il=ol−1,l= 2, . . . , F . Speciﬁcally:

NNθ(x)=(Lθ(F)◦Lθ(F−1) ◦ ··· ◦ Lθ(1) )(x),(4)

where we index a ReLU NN function by a list of parameters

θ,(θ(1), . . . , θ(F)). As a common practice, we allow the

output layer Lθ(F)to omit the max function. For simplicity

of notation, we drop the superscript θin NNθwhenever the

dependence on θis obvious.

F. Main Problem

We consider training a ﬁnite set (or a library) of ReLU NNs

(during the ofﬂine phase) and designing a selection algorithm

(during the online phase) that can select the correct NNs once

the exact task T= (t, ϕ, W, X0)is revealed at runtime. Before

formalizing the problem under consideration, we introduce the

following notion of neural network composition.

Deﬁnition II.1. Given a set (or a library) of neural networks

NN ={NN1,NN2,...,NNd}along with an activation map

Γ : X→ {1, . . . , d}, the composed neural network NN[NN,Γ]

is deﬁned as: NN[NN,Γ](x) = NNΓ(x)(x).

In other words, the activation map Γselects the NN that

needs to be activated at each state x∈X. In addition

to achieving the motion and task speciﬁcations, the neural

network needs to minimize a given cost functional J. The

cost functional Jis deﬁned as:

J(NN[NN,Γ]) = ZX

c(x, NN[NN,Γ](x))dµNN(x),(5)

where c:X×U→Ris a state-action cost function and µNN

is the distribution of states induced by the nominal dynamics

fin (2) under the control of NN[NN,Γ]. As an example, the

cost functional can be a controller’s energy J(NN[NN,Γ]) =

RX||NN[NN,Γ](x)||2dµNN(x).

Let ξx

NN[NN,Γ] be a closed-loop trajectory of a robot that

starts from the state x∈X0and evolves under the composed

neural network NN[NN,Γ]. We deﬁne the problem of interest

as follows:

Problem II.2. Given a cost functional J, train a library of

ReLU neural networks NN, and compute an activation map

Γat runtime when a task T= (t, ϕ, W, X0)is revealed,

such that the composed neural network minimizes the cost

J(NN[NN,Γ])and satisﬁes the speciﬁcation ϕwith probability

at least p, i.e., Pr ξx

NN[NN,Γ] |=ϕ≥pfor any x∈X0.

G. Overview of the Neurosymbolic Framework

Our approach to designing the NN-based planner NN[NN,Γ]

can be split into two stages: ofﬂine training and runtime

selection. During the ofﬂine training phase, our algorithm

obtains a library of networks NN. At runtime, and to fulﬁll

unseen tasks using a ﬁnite set of neural networks NN, our

neurosymbolic framework bridges ideas from symbolic LTL-

based planning and machine learning. Similar to symbolic

LTL-based planning, our framework uses a hierarchical ap-

proach that consists of a “high-level” discrete planner and a

“low-level” continuous controller [8], [9], [12]. The “high-

level” discrete planner focuses on ensuring the satisfaction of

the LTL speciﬁcation. At the same time, the “low-level” con-

trollers compute control actions that steer the robot to satisfy

the “high-level” plan. Unlike symbolic LTL-based planners,

our framework uses neural networks as low-level controllers,

thanks to their ability to handle complex nonlinear dynamic

constraints. In particular, the “high-level” planner chooses the

activation map Γto activate particular neural networks.

Nevertheless, to ensure the correctness of the proposed

framework, it is essential to ensure that each neural network in

NN satisﬁes some “formal” property. This “formal” property

allows the high-level planner to abstract the capabilities of

each of the neural networks in NN and hence choose the

correct activation map Γ. To that end, in Section III, we formu-

late the sub-problem of “formal NN training” that guarantees

the trained NNs satisfy certain formal properties, and solve

it efﬁciently by introducing a NN weight projection operator.

The solution to the formal training is used in Section IV-A

to obtain the library of networks NN ofﬂine. The associated

formal property of each NN is used in Section IV-B to design

the activation map Γ.

III. FORMAL TRAINING OF NNS

In this section, we study the sub-problem of training NNs

that are guaranteed to obey certain behaviors. In addition to the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1NeurosymbolicMotionandTaskPlanningforLinearTemporalLogicTasksXiaowuSun,GraduateStudentMember,IEEEandYasserShoukry,SeniorMember,IEEEAbstractThispaperpresentsaneurosymbolicframeworktosolvemotionplanningproblemsformobilerobotsinvolvingtemporalgoals.Thetemporalgoalsaredescribedusingtemporallogicformul...

展开>> 收起<<

1 Neurosymbolic Motion and Task Planning for Linear Temporal Logic Tasks.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Neurosymbolic Motion and Task Planning for Linear Temporal Logic Tasks

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: