At a high level, predicates are boolean valued n-ary
functions which indicate the presence of particular relations
among their variables. For instance, the predicate
isOn
may
indicate that the object in its first argument is on top of the
object in its second. When a predicate is applied to specific
objects (e.g. isOn(A, B)), we refer to it as a “fact”.
Actions define the legal state transitions in the planning
problem. They are defined by a set of parameters, a set of
preconditions which define facts on those parameters which
must hold in order for the action to be applicable, and effects
that determine which facts about the parameters are added
or removed following the application of the action.
The set of streams,
Ψ
, distinguishes a PDDLStream
domain from traditional PDDL. Streams are conditional
generators which yield objects that satisfy specific constraints
conditioned on their inputs. Formally, a stream,
s
, is defined
by input and output parameters
¯x
,
¯o
, a set of facts
domain(s)
,
and a set of facts
certified(s)
.
domain(s)
is the set of facts
that must evaluate to true for an input tuple
¯x
to be valid.
This ensures the correct types of objects (e.g., configurations,
poses etc.) are provided to the generators.
certified(s)
are
facts about
¯x
and
¯o
that will be true of any outputs
¯o
that the
stream generators produce. Streams can be applied recursively
to generate a potentially infinite set of objects and their
associated facts, starting from those in
I
. They can also be
thought of as declaratively specifying constraints between
their inputs and outputs. Finally, each stream comes with a
black-box procedure which, given input values
¯x
, produces
samples
¯o
which satisfy those constraints. We use the term
stream evaluation to refer to the act of querying this sampler.
The PDDLStream domain
(P,A,Ψ)
defines a language in
which to pose specific problems. An instance of a planning
problem in this domain is defined by specifying the initial
state
I
which is simply a set of facts using predicates
P
that
describe the initial scene, and the goal
G
.
I
and
G
implicitly
define a set of initial objects over which facts in those sets
are stated. A solution to a problem instance consists of a
sequence of action instances which result in a state in which
G
is satisfied. Note that many of the parameters in a solution
may need to be produced using the streams and initial objects.
Predicates in classical PDDL problems can be classified as
either “static” or “fluent” depending on whether they appear
in the effects of any action. Static predicates are used to define
types (e.g.
isTable(x)
) or immutable relations between
objects (e.g.
isSmaller(x, y)
). Fluent predicates, on the
other hand, are those which can be changed by actions (e.g.
isOn(x, y)
). By definition, streams are only allowed to
certify “static” predicates (e.g.
isGraspPose(x)
). There-
fore, in PDDLStream problems, we can further categorize
static predicates based on whether they are produced by
streams or are simply given in the initial conditions
I
. We
call the former stream-certified preconditions.
We use the notation
⃗a
to refer to an “action skeleton”,
which is a sequence of discrete, high-level action instances
with continuous parameters left as variables (for instance,
grasp poses and placement poses). See Fig. 3 for an example
of a two-step action skeleton. We denote a specific assign-
ment/grounding of continuous parameters as
⃗
θ
, and refer to
the grounded plan as
⃗a(⃗
θ)
. Similarly, we use
a
,
θ
and
a(θ)
to refer to individual actions and their grounding.
III. OUR APPROACH
A. Lazy Bi-Level Search
Our overall framework is a bi-level search, similar to prior
work on task and motion planning ([
1
], [
2
]). In every iteration,
we search for an action skeleton
⃗a
. This outer search for an
action skeleton is guided by a priority function
f
, which
assigns a lower value to more desirable actions. We describe
possible choices for how
f
is defined in section III-B and
elaborate on the details of skeleton search in section III-C.
Once an action skeleton is found in the outer search,
we perform the inner search for grounding its continuous
parameters
⃗
θ
. We refer to this as skeleton refinement, and
elaborate on it in section III-D. The overall procedure
terminates when refinement is successful, in which case a
complete trajectory is returned. Otherwise, the result of the
previous refinement is used to update the priority function
f
,
and the next iteration begins, yielding a potentially different
action skeleton. We refer to the process of incorporating the
result of refinement into the priority function used by the
outer search as feedback and detail a number of possible
implementations in section III-E. The search fails to solve a
given problem if the allotted planning time runs out before a
trajectory is found. This overall framework is summarized in
Algorithm 1 and illustrated in Figure 1.
B. Skeleton Search Routines and their Priority Functions
There are many possible choices for the skeleton
search
routine and its associated priority function
f
leading to
algorithms with different characteristics. In this work, we
consider two implementations of
search
: the first is a simple
best-first search and the second is a beam search. Intuitively,
decreasing the value of the beam width parameter in beam
search allows us to create greedier search algorithms at the
cost of potentially pruning out solution branches.
We also consider two implementations of
f
: first, the
familiar A* priority function (
f(n) = g(n) + h(n)
), which
we use to incorporate off-the-shelf domain-agnostic heuristics
from prior work [
3
]. Note that this option allows our algorithm
to work well without a learned policy, using existing domain-
agnostic search heuristics in place of
h
. We make use of this
for data collection, and as a baseline in evaluation.
Second, we build on ideas from Levin Tree Search
(LevinTS) [
4
] as a way to incorporate a policy to guide
the search while maintaining guarantees about completeness
and search effort of the symbolic planner that relate to
the quality of the policy. We assume that we are given a
policy
π(a|s, G)
which predicts a probability distribution
over applicable discrete actions (i.e. logical state transitions)
conditioned on a logical state
s∈ S
and goal
G ⊂ S
, where
Sis the set of all logical states.
We distinguish between a state in the search space and a
node in the search tree by using the symbol
s
to denote the
former and
n
to denote the latter. A node corresponds to a