Policy-Guided Lazy Search with Feedback for Task and Motion Planning Mohamed Khodeir1Atharv Sonwane2Ruthrash Hari3Florian Shkurti1 Abstract PDDLStream solvers have recently emerged as

2025-05-02 0 0 4.25MB 8 页 10玖币
侵权投诉
Policy-Guided Lazy Search with Feedback for Task and Motion Planning
Mohamed Khodeir1Atharv Sonwane2Ruthrash Hari3Florian Shkurti1
Abstract PDDLStream solvers have recently emerged as
viable solutions for Task and Motion Planning (TAMP) problems,
extending PDDL to problems with continuous action spaces.
Prior work has shown how PDDLStream problems can be
reduced to a sequence of PDDL planning problems, which
can then be solved using off-the-shelf planners. However,
this approach can suffer from long runtimes. In this paper
we propose
LAZY
, a solver for PDDLStream problems that
maintains a single integrated search over action skeletons, which
gets progressively more geometrically informed, as samples of
possible motions are lazily drawn during motion planning. We
explore how learned models of goal-directed policies and current
motion sampling data can be incorporated in
LAZY
to adaptively
guide the task planner. We show that this leads to significant
speed-ups in the search for a feasible solution evaluated over
unseen test environments of varying numbers of objects, goals,
and initial conditions. We evaluate our TAMP approach by
comparing to existing solvers for PDDLStream problems on a
range of simulated 7DoF rearrangement/manipulation problems.
Code can be found at
https://rvl.cs.toronto.edu/
learning-based-tamp.
I. INTRODUCTION
Task and motion planning (TAMP) problems are challeng-
ing because they require reasoning about both discrete and
continuous decisions that are interdependent. TAMP solvers
typically decompose the problem by using a symbolic task
planner that searches over discrete abstract actions, such as
which object to interact with or what operations are applicable,
and a motion planner which attempts to find the continuous
parameters that ground those abstract actions, for instance
grasp poses and robot configurations. The motion planner
informs the task planner when backtracking is necessary.
Thus, the interplay between abstract task planning and low-
level motion planning has a significant effect on both runtime
and percentage of problems solved.
In this work, we provide a significantly improved PDDL-
Stream [
1
] solver (
LAZY
) for task and motion planning
problems, which learns to plan from experience and adapts
based on current execution data. The motion planner of
our solver provides feasibility updates to a priority/guidance
function that is used to inform action selection by the symbolic
task planner.
LAZY
plans optimistically and lazily (deferring
motion sampling until an action skeleton is found), and
maintains a single unified search tree, as opposed to solving
a sequence of PDDL problems over a growing set of facts,
as was done in [1] and its current variants.
1
Robot Vision and Learning Lab, University of Toronto Robotics Institute.
2Dept. of Computer Science, University of BITS Pilani
3
Dept. of Mathematical and Computational Sciences, University of Toronto
{m.khodeir, ruthrath.hari}@mail.utoronto.ca,
atharvs.twm@gmail.com, florian@cs.toronto.edu
Fig. 1: Top Left: Simulated evaluation tasks in Clutter environment. Top Right:
Real-world manipulation problems using two 7DoF robot arms. Bottom: A
flowchart illustrating the high level components of our approach.
A core component of our method is a goal-conditioned
policy over high-level actions, which we learn using behaviour
cloning on past planning demonstrations. This policy is treated
as a priority function, which guides the action skeleton
search performed by the task planner towards promising
abstract action sequences. While this can often eliminate the
need for backtracking altogether, the policy may still predict
geometrically infeasible actions in more challenging TAMP
problems. We therefore show how the predictions of this
priority function can be updated online in response to failed
samples in motion planning, allowing successive iterations of
the task planner to focus the search on more feasible action
sequences. The result is a policy-guided bi-level search for
TAMP problems, which improves online from experience and
past data, and demonstrates impressive planning performance
on unseen environments from a test distribution, while being
trained with only a few hundred demonstrations.
Our main contributions are: (1) A lazy search framework for
PDDLStream problems, which maintains a single search tree
over symbolic plan skeletons. (2) A method for incorporating
a learned policy over symbolic actions into sampling-based bi-
level search, and efficiently updating it online using feedback
from motion planning. (3) A concrete parametrization of
this goal conditioned policy as a Graph Attention Network
(GAT) which incorporates both high-level and low-level state.
We empirically evaluate our proposed method compared to
existing approaches for sampling-based TAMP, such as [
1
]
and show significant (37%) improvement in the number of
unseen problems solved within the allotted planning time.
II. BACKGROUND: PDDL AND PDDLSTREAM
We adopt PDDLStream [
1
] as the formalism for expressing
TAMP problems. A PDDLStream domain
(P,A,Ψ)
is
defined by predicates P, actions A, and streams Ψ.
arXiv:2210.14055v4 [cs.RO] 23 Aug 2023
At a high level, predicates are boolean valued n-ary
functions which indicate the presence of particular relations
among their variables. For instance, the predicate
isOn
may
indicate that the object in its first argument is on top of the
object in its second. When a predicate is applied to specific
objects (e.g. isOn(A, B)), we refer to it as a “fact”.
Actions define the legal state transitions in the planning
problem. They are defined by a set of parameters, a set of
preconditions which define facts on those parameters which
must hold in order for the action to be applicable, and effects
that determine which facts about the parameters are added
or removed following the application of the action.
The set of streams,
Ψ
, distinguishes a PDDLStream
domain from traditional PDDL. Streams are conditional
generators which yield objects that satisfy specific constraints
conditioned on their inputs. Formally, a stream,
s
, is defined
by input and output parameters
¯x
,
¯o
, a set of facts
domain(s)
,
and a set of facts
certified(s)
.
domain(s)
is the set of facts
that must evaluate to true for an input tuple
¯x
to be valid.
This ensures the correct types of objects (e.g., configurations,
poses etc.) are provided to the generators.
certified(s)
are
facts about
¯x
and
¯o
that will be true of any outputs
¯o
that the
stream generators produce. Streams can be applied recursively
to generate a potentially infinite set of objects and their
associated facts, starting from those in
I
. They can also be
thought of as declaratively specifying constraints between
their inputs and outputs. Finally, each stream comes with a
black-box procedure which, given input values
¯x
, produces
samples
¯o
which satisfy those constraints. We use the term
stream evaluation to refer to the act of querying this sampler.
The PDDLStream domain
(P,A,Ψ)
defines a language in
which to pose specific problems. An instance of a planning
problem in this domain is defined by specifying the initial
state
I
which is simply a set of facts using predicates
P
that
describe the initial scene, and the goal
G
.
I
and
G
implicitly
define a set of initial objects over which facts in those sets
are stated. A solution to a problem instance consists of a
sequence of action instances which result in a state in which
G
is satisfied. Note that many of the parameters in a solution
may need to be produced using the streams and initial objects.
Predicates in classical PDDL problems can be classified as
either “static” or “fluent” depending on whether they appear
in the effects of any action. Static predicates are used to define
types (e.g.
isTable(x)
) or immutable relations between
objects (e.g.
isSmaller(x, y)
). Fluent predicates, on the
other hand, are those which can be changed by actions (e.g.
isOn(x, y)
). By definition, streams are only allowed to
certify “static” predicates (e.g.
isGraspPose(x)
). There-
fore, in PDDLStream problems, we can further categorize
static predicates based on whether they are produced by
streams or are simply given in the initial conditions
I
. We
call the former stream-certified preconditions.
We use the notation
a
to refer to an “action skeleton”,
which is a sequence of discrete, high-level action instances
with continuous parameters left as variables (for instance,
grasp poses and placement poses). See Fig. 3 for an example
of a two-step action skeleton. We denote a specific assign-
ment/grounding of continuous parameters as
θ
, and refer to
the grounded plan as
a(
θ)
. Similarly, we use
a
,
θ
and
a(θ)
to refer to individual actions and their grounding.
III. OUR APPROACH
A. Lazy Bi-Level Search
Our overall framework is a bi-level search, similar to prior
work on task and motion planning ([
1
], [
2
]). In every iteration,
we search for an action skeleton
a
. This outer search for an
action skeleton is guided by a priority function
f
, which
assigns a lower value to more desirable actions. We describe
possible choices for how
f
is defined in section III-B and
elaborate on the details of skeleton search in section III-C.
Once an action skeleton is found in the outer search,
we perform the inner search for grounding its continuous
parameters
θ
. We refer to this as skeleton refinement, and
elaborate on it in section III-D. The overall procedure
terminates when refinement is successful, in which case a
complete trajectory is returned. Otherwise, the result of the
previous refinement is used to update the priority function
f
,
and the next iteration begins, yielding a potentially different
action skeleton. We refer to the process of incorporating the
result of refinement into the priority function used by the
outer search as feedback and detail a number of possible
implementations in section III-E. The search fails to solve a
given problem if the allotted planning time runs out before a
trajectory is found. This overall framework is summarized in
Algorithm 1 and illustrated in Figure 1.
B. Skeleton Search Routines and their Priority Functions
There are many possible choices for the skeleton
search
routine and its associated priority function
f
leading to
algorithms with different characteristics. In this work, we
consider two implementations of
search
: the first is a simple
best-first search and the second is a beam search. Intuitively,
decreasing the value of the beam width parameter in beam
search allows us to create greedier search algorithms at the
cost of potentially pruning out solution branches.
We also consider two implementations of
f
: first, the
familiar A* priority function (
f(n) = g(n) + h(n)
), which
we use to incorporate off-the-shelf domain-agnostic heuristics
from prior work [
3
]. Note that this option allows our algorithm
to work well without a learned policy, using existing domain-
agnostic search heuristics in place of
h
. We make use of this
for data collection, and as a baseline in evaluation.
Second, we build on ideas from Levin Tree Search
(LevinTS) [
4
] as a way to incorporate a policy to guide
the search while maintaining guarantees about completeness
and search effort of the symbolic planner that relate to
the quality of the policy. We assume that we are given a
policy
π(a|s, G)
which predicts a probability distribution
over applicable discrete actions (i.e. logical state transitions)
conditioned on a logical state
s∈ S
and goal
G ⊂ S
, where
Sis the set of all logical states.
We distinguish between a state in the search space and a
node in the search tree by using the symbol
s
to denote the
former and
n
to denote the latter. A node corresponds to a
摘要:

Policy-GuidedLazySearchwithFeedbackforTaskandMotionPlanningMohamedKhodeir1AtharvSonwane2RuthrashHari3FlorianShkurti1Abstract—PDDLStreamsolvershaverecentlyemergedasviablesolutionsforTaskandMotionPlanning(TAMP)problems,extendingPDDLtoproblemswithcontinuousactionspaces.PriorworkhasshownhowPDDLStreampro...

展开>> 收起<<
Policy-Guided Lazy Search with Feedback for Task and Motion Planning Mohamed Khodeir1Atharv Sonwane2Ruthrash Hari3Florian Shkurti1 Abstract PDDLStream solvers have recently emerged as.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:4.25MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注