Goal Recognition as a Deep Learning Task the GRNet Approach Mattia Chiari1Alfonso E. Gerevini1Luca Putelli1Francesco Percassi2Ivan Serina1 1Dipartimento di Ingegneria dellInformazione Universit a degli Studi di Brescia Via Branze 38 Brescia Italy

2025-05-01 0 0 570.36KB 13 页 10玖币

侵权投诉

Goal Recognition as a Deep Learning Task: the GRNet Approach

Mattia Chiari,1Alfonso E. Gerevini,1Luca Putelli,1Francesco Percassi,2Ivan Serina,1

1Dipartimento di Ingegneria dell’Informazione, Universit`

a degli Studi di Brescia, Via Branze 38, Brescia, Italy

2School of Computing and Engineering, University of Huddersﬁeld, Queensgate, Huddersﬁeld HD1 3DH, United Kingdom

{m.chiari017, alfonso.gerevini, luca.putelli1, ivan.serina}@unibs.it

f.percassi@hud.ac.uk

Abstract

In automated planning, recognising the goal of an agent from

a trace of observations is an important task with many ap-

plications. The state-of-the-art approaches to goal recogni-

tion rely on the application of planning techniques, which

requires a model of the domain actions and of the initial do-

main state (written, e.g., in PDDL). We study an alternative

approach where goal recognition is formulated as a classi-

ﬁcation task addressed by machine learning. Our approach,

called GRNet, is primarily aimed at making goal recognition

more accurate as well as faster by learning how to solve it in a

given domain. Given a planning domain speciﬁed by a set of

propositions and a set of action names, the goal classiﬁcation

instances in the domain are solved by a Recurrent Neural Net-

work (RNN). A run of the RNN processes a trace of observed

actions to compute how likely it is that each domain propo-

sition is part of the agent’s goal, for the problem instance

under consideration. These predictions are then aggregated

to choose one of the candidate goals. The only information

required as input of the trained RNN is a trace of action la-

bels, each one indicating just the name of an observed action.

An experimental analysis conﬁrms that GRNet achieves good

performance in terms of both goal classiﬁcation accuracy and

runtime, obtaining better performance w.r.t. a state-of-the-art

goal recognition system over the considered benchmarks.

Introduction

Goal Recognition is the task of recognising the goal that

an agent is trying to achieve from observations about the

agent’s behaviour in the environment (Van-Horenbeke and

Peer 2021; Geffner 2018). Typically, such observations con-

sist of a trace (sequence) of executed actions in an agent’s

plan to achieve the goal, or a trace of world states gener-

ated by the agent’s actions, while an agent goal is speciﬁed

by a set of propositions. Goal recognition has been studied

in AI for many years, and it is an important task in sev-

eral ﬁelds, including human-computer interactions (Batrinca

et al. 2016), computer games (Min et al. 2016), network

security (Mirsky et al. 2019), smart homes (Harman and

Simoens 2019), ﬁnancial applications (Borrajo, Gopalakr-

ishnan, and Potluru 2020), and others.

In the literature, several systems to solve goal recogni-

tion problems have been proposed (Meneguzzi and Pereira

2021). The state-of-the-art approach is based on transform-

ing a plan recognition problem into one or more plan gen-

eration problems solved by classical planning algorithms

(Ram´

ırez and Geffner 2009; Pereira, Oren, and Meneguzzi

2020; Sohrabi, Riabov, and Udrea 2016). In order to per-

form planning, this approach requires domain knowledge

consisting of a model of each agent’s action speciﬁed as a set

of preconditions and effects, and a description of an initial

state of the world, in which the agent perform the actions.

The computational efﬁciency (runtime) largely depends on

the planning algorithm performance, which could be inade-

quate in a context demanding fast goal recognition (e.g., in

real-time/online applications).1

In this paper, we investigate an alternative approach in

which the goal recognition problem is formulated as a clas-

siﬁcation task, addressed through machine learning, where

each candidate goal (a set of propositions) of the problem

can be seen as a value class. The primary aim is making

goal recognition more accurate as well as faster by learn-

ing how to solve it in a given domain. Given a planning do-

main speciﬁed by a set of propositions and a set of action

names, we tackle the goal classiﬁcation instances in the do-

main through a Recurrent Neural Network (RNN). A run of

our RNN processes a trace of observed actions to compute

how likely it is that each domain proposition is part of the

agent’s goal, for the problem instance under considerations.

These predictions are then aggregated through a goal selec-

tion mechanism to choose one of the candidate goals.

The proposed approach, that we call GRNet, is generally

faster than the model-based approach to goal recognition

based on planning, since running a trained neural network

can be much faster than plan generation. Moreover, GRNet

operates with minimal information, since the only informa-

tion required as input for the trained RNN is a trace of action

labels (each one indicating just the name of an observed ac-

tion), and the initial state can be completely unknown.

The RNN is trained only once for a given domain, i.e.,

the same trained network can be used to solve a large set of

goal recognition instances in the domain. On the other hand,

as usual in supervised learning, a (possibly large) dataset

of solved goal recognition instances for the domain under

consideration is needed for the training. When such data are

unavailable or scarse, they can be synthetized via planning.

1Deciding plan existence in classical planning is PSPACE-

complete (Bylander 1994).

arXiv:2210.02377v2 [cs.AI] 25 Oct 2022

In such a case, the resulting overall system can be seen as

a combined approach (model-based for generating training

data, and model-free for the goal classiﬁcation task) that out-

performs the pure model-based approach in terms of both

classiﬁcation accuracy and classiﬁcation runtime. Indeed, an

experimental analysis presented in the paper conﬁrms that

GRNet achieves good performance in terms of both goal

classiﬁcation accuracy and runtime, obtaining consistently

better performance with respect to a state-of-the-art goal

recognition system over a class of benchmarks in six plan-

ning domains.

In rest of paper, after giving background and preliminaries

about goal recognition and LSTM networks, we describe the

GRNet approach; then we present the experimental results;

ﬁnally, we discuss related work and give the conclusions.

Preliminaries

We describe the problem goal of recognition, starting with

its relation to activity/plan recognition, and we give the es-

sential background on Long Short-Term Memory networks

and the attention mechanism.

Activity, Goal and Plan Recognition

Activity, plan, and goal recognition are related tasks (Geib

and Pynadath 2007). Since in the literature sometime they

are not clearly distinguished, we begin with an informal def-

inition of them following (Van-Horenbeke and Peer 2021).

Activity recognition concerns analyzing temporal se-

quences of (typically low-level) data generated by humans,

or other autonomous agents acting in an environment, to

identify the corresponding activity that they are perform-

ing. For instance, data can be collected from wearable sen-

sors, accelerometers, or images to recognize human activi-

ties such as running, cooking, driving, etc. (Vrigkas, Nikou,

and Kakadiaris 2015; Jobanputra, Bavishi, and Doshi 2019).

Goal recognition (GR) can be deﬁned as the problem of

identifying the intention (goal) of an agent from observa-

tions about the agent behaviour in an environment. These

observations can be represented as an ordered sequence of

discrete actions (each one possibly identiﬁed by activity

recognition), while the agent’s goal can be expressed either

as a set of propositions or a probability distribution over al-

ternative sets of propositions (each oven forming a distinct

candidate goal).

Finally, plan recognition is more general than GR and

concerns both recognising the goal of an agent and identi-

fying the full ordered set of actions (plan) that have been, or

will be, performed by the agent in order to reach that goal;

as GR, typically plan recognition takes as input a set of ob-

served actions performed by the agent (Carberry 2001).

Model-based and Model-free Goal Recognition

In the approach to GR known as “goal recognition over a do-

main theory” (Ram´

ırez and Geffner 2010; Van-Horenbeke

and Peer 2021; Santos et al. 2021; Sohrabi, Riabov, and

Udrea 2016), the available knowledge consists of an under-

lying model of the behaviour of the agent and its environ-

ment. Such a model represents the agent/environment states

and the set of actions Athat the agent can perform; typi-

cally it is speciﬁed by a planning language such as PDDL

(McDermott et al. 1998). The states of the agent and envi-

ronment are formalised as subsets of a set of propositions F,

called ﬂuents or facts, and each domain action in Ais mod-

eled by a set of preconditions and a set of effects, both over

F. An instance of the GR problem in a given domain is then

speciﬁed by:

• an initial state Iof the agent and environment (I⊆F);

• a sequence O=hobs1, .., obsniof observations (n≥1),

where each obsiis an action in Aperformed by the agent;

• and a set G={G1, .., Gm}(m≥1) of possible goals of

the agent, where each Giis a set of ﬂuents over F.

The observations form a trace of the full sequence πof ac-

tions performed by the agent to achieve the goal. Such a plan

trace is a selection of (possibly non-consecutive) actions in

π, ordered as in π. Solving a GR instance consists of identi-

fying the G∗in Gthat is the (hidden) goal of the agent.

The approach based on a model of the agent’s actions and

of the agent/environment states, that we call model-based

goal recognition (MBGR), deﬁnes GR as a reasoning task

addressable by automated planning techniques (Meneguzzi

and Pereira 2021; Ghallab, Nau, and Traverso 2016).

An alternative approach to MBGR is model-free goal

recognition (MFGR) (Geffner 2018; Borrajo, Gopalakrish-

nan, and Potluru 2020). In this approach, GR is formulated

as a classiﬁcation task addressed through machine learning.

The domain speciﬁcation consists of a ﬂuent set F, and a set

of possible actions A, where each action a∈Ais speciﬁed

by just a label (a unique identiﬁer for each action).

A MFGR instance for a domain is speciﬁed by an observa-

tion sequence Oformed by action labels and, as in MBGR,

a goal set Gformed by subsets of F. MFGR requires mini-

mal information about the domain actions, and can operate

without the speciﬁcation of an initial state, that can be com-

pletely unknown. Moreover, since running a learned classi-

ﬁcation model is usually fast, a MFGR system is expected

to run faster than a MBGR system based on planning algo-

rithms. On the other hand, MFGR needs a data set of solved

GR instances from which learning a classiﬁcation model for

the new GR instances of the domain.

Example 1 As a running example, we will use a very sim-

ple GR instance in the well-known BLOCKSWORLD domain.

In this domain the agent has the goal of building one or

more stacks of blocks, and only one block may be moved at a

time. The agent can perform four types of actions: Pick-Up

a block from the table, Put-Down a block on the table,

Stack a block on top of another one, and Unstack a block

that is on another one. We assume that a GR instance in

the domain involves at most 22 blocks. In BLOCKSWORLD

there are three types of facts (predicates): On, that has

two blocks as arguments, plus On-Table and Clear that

have one argument. Therefore, the ﬂuent set Fconsists of

22 ×21 + 22 + 22 = 506 propositions. The goal set Gof

the instance example consists of the two goals G1={(On

Block F Block C),(On Block C Block B)}and G2=

{(On Block G Block H),(On Block H Block F)}; the

observation sequence Ois h(Pick-Up Block C),(Stack

Block C Block B),(Pick-Up Block F)i.

LSTM and Attention Mechanism

A Long Short-Term Memory Network (LSTM) is a partic-

ular kind of Recurrent Neural Network (RNN). This kind

of deep learning architecture is particularly suitable for

processing sequential data like signals or text documents

(Hochreiter and Schmidhuber 1997). With respect to the

standard RNN, LSTM deals with typical issues such as van-

ishing gradient and long-term dependencies, obtaining bet-

ter predictive performance (Gers, Schmidhuber, and Cum-

mins 2000). Let x1, x2. . . xmbe an input time series where

xt∈Rdis the feature vector representing the t-th element

of the series, and dis the dimension of each feature vec-

tor of the sequence. The long and short term memory states

ct∈RNand ht∈RNat time step tof the series, respec-

tively, are computed recursively considering the values at the

previous time step t−1as follows:

ˆct= tanh(Wc[ht−1, xt] + bc)it=σ(Wi[ht−1, xt] + bi)

ft=σ(Wf[ht−1, xt] + bf)ct=it∗ˆct+ft∗ct−1

ot=σ(Wo[ht−1, xt] + bo)ht= tanh(ct)∗ot

where σdenotes the sigmoid activation function and ∗cor-

responds to the element-wise product; Wf,Wi,Wo,Wc

∈R(N+d)×Nare the weight matrices and bf,bi,bo,bc

∈RNare the bias vectors; the vectors in square brackets are

concatenated. Weight matrices and bias vectors are typically

initialized with the Glorot uniform initializer (Glorot and

Bengio 2010), and they are shared by all the cells in the

LSTM layer. h0and c0are initialized as zero vectors.

The attention mechanism (Bahdanau, Cho, and Bengio

2015) is another layer which computes weights represent-

ing the contribution of each element of the sequence, and

provides a representation of the sequence (also called the

context vector) as the weighted average of the outputs (ht)

of the LSTM cells, improving the predictive performance

with respect to the base LSTM networks. In our system, we

use the so-called word attention introduced by Yang et al.

(2016) in the context of text classiﬁcation.

Goal Recognition through GRNet

In this section we present our approach to goal recognition

based on deep learning, GRNet.GRNet is depicted in Fig-

ure 1 consists of two main components. The ﬁrst compo-

nent takes as input the observations of the GR instance to

solve, and gives as output a score (between 0 and 1) for

each proposition in the domain proposition set F. This com-

ponent, called Domain Component, is general in the sense

that it can be used for every GR instance over F(training

is performed once for each domain). The second compo-

nent, called Instance Component, takes as input the propo-

sition ranks generated by the domain component for a GR

instance, and uses them to select a goal from the candidate

goal set G.

The Domain Component of GRNet

Given a sequence of observations, represented on the left

side of Figure 1, each action aicorresponding to an ob-

servation is encoded as a vector eiof real numbers by an

embedding layer (Bengio et al. 2003).2In Figure 1, the ob-

served actions are displayed from top to bottom in the or-

der in which they are executed by the agent. The embedding

layer is initialised with random weights, and trained at the

same time with the rest of the domain component.

The index of each observed action is simply the result of

an arbitrary order of the actions that is computed in the pre-

processing phase, only once for the domain under consider-

ation. Please note that two actions aiand ajconsecutively

observed may not be consecutive actions in the full plan of

the agent (the full plan may contain any number of actions

between aiand aj).

The sequence of embedding vectors is then fed to a LSTM

Neural Network, and the result of the output of each cell is

processed by the Attention Mechanism. After computing a

weight for the contribution of each cell, this layer provides

a so-called context-vector that summarizes the information

contained in the trace of plan. The context vector is then

passed to a feed-forward layer, which has Noutput neurons

with sigmoid activation function. Nis the number of the

domain ﬂuents (propositions) that can appear in any goal of

Gfor any GR instance in the domain; for our experiments N

was set to the size of the domain ﬂuent set F, i.e., N=|F|.

The output of the i-th neuron oicorresponds to the i-th ﬂuent

fi(ﬂuents are lexically ordered), and the activation value of

oigives a rank for fibeing true in the agent’s goal (with rank

equal to one meaning that fiis true in the goal). In other

words, our network is trained as a multi-label classiﬁcation

problem, where each domain ﬂuent can be considered as a

different binary class. As loss function, we used standard

binary crossentropy.

As shown in Figure 1, the dimension of the input and

output of our neural networks depend only on the selected

domain and some basic information, such as the maximum

number of possible output facts that we want to consider.

The dimension of the embedding vectors, the dimension of

the LSTM layer and other hyperaparameters of the networks

are selected using the Bayesian-optimisation approach pro-

vided by the Optuna framework (Akiba et al. 2019), with a

validation set formed by 20% of the training set while the re-

maining 80% is used for training the network. More details

about the hyperparameters are given in the Supplementary

Material.

The Instance Speciﬁc Component of GRNet

After the training and optimisation phases of the domain

component, the resulting network can be used to solve

any goal recognition instance in the domain through the

instance-speciﬁc component of our system (right part of Fig-

ure 1). Such component performs an evaluation of the can-

didate goals in Gof the GR instance, using the output of

the domain component fed by the observations of the GR

instance. To choose the most probable goal in G(solving

the multi-class classiﬁcation task associated to the GR in-

stance), we designed a simple score function that indicates

how likely it is that Gis the correct goal, according to the

2https://keras.io/api/layers/core layers/embedding/

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GoalRecognitionasaDeepLearningTask:theGRNetApproachMattiaChiari,1AlfonsoE.Gerevini,1LucaPutelli,1FrancescoPercassi,2IvanSerina,11DipartimentodiIngegneriadell'Informazione,UniversitadegliStudidiBrescia,ViaBranze38,Brescia,Italy2SchoolofComputingandEngineering,UniversityofHudderseld,Queensgate,Hudde...

展开>> 收起<<

Goal Recognition as a Deep Learning Task the GRNet Approach Mattia Chiari1Alfonso E. Gerevini1Luca Putelli1Francesco Percassi2Ivan Serina1 1Dipartimento di Ingegneria dellInformazione Universit a degli Studi di Brescia Via Branze 38 Brescia Italy.pdf

共13页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Goal Recognition as a Deep Learning Task the GRNet Approach Mattia Chiari1Alfonso E. Gerevini1Luca Putelli1Francesco Percassi2Ivan Serina1 1Dipartimento di Ingegneria dellInformazione Universit a degli Studi di Brescia Via Branze 38 Brescia Italy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: