Active Predictive Coding A Uniﬁed Neural Framework for Learning Hierarchical World Models for Perception and Planning Rajesh P. N. Rao Dimitrios C. Gklezakos Vishwas Sathish

2025-04-30 0 0 2.92MB 15 页 10玖币

侵权投诉

Active Predictive Coding: A Uniﬁed Neural Framework for Learning

Hierarchical World Models for Perception and Planning

Rajesh P. N. Rao Dimitrios C. Gklezakos Vishwas Sathish

Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle

{rao,gklezd,vsathish}@cs.washington.edu

Abstract

Predictive coding has emerged as a prominent

model of how the brain learns through predic-

tions, anticipating the importance accorded to

predictive learning in recent AI architectures

such as transformers. Here we propose a new

framework for predictive coding called active

predictive coding which can learn hierarchical

world models and solve two radically different

open problems in AI: (1) how do we learn com-

positional representations, e.g., part-whole hier-

archies, for equivariant vision? and (2) how do

we solve large-scale planning problems, which

are hard for traditional reinforcement learning,

by composing complex action sequences from

primitive policies? Our approach exploits hyper-

networks, self-supervised learning and reinforce-

ment learning to learn hierarchical world mod-

els that combine task-invariant state transition

networks and task-dependent policy networks at

multiple abstraction levels. We demonstrate the

viability of our approach on a variety of vision

datasets (MNIST, FashionMNIST, Omniglot) as

well as on a scalable hierarchical planning prob-

lem. Our results represent, to our knowledge,

the ﬁrst demonstration of a uniﬁed solution to

the part-whole learning problem posed by Hin-

ton, the nested reference frames problem posed

by Hawkins, and the integrated state-action hi-

erarchy learning problem in reinforcement learn-

ing.

1 INTRODUCTION

Predictive coding (Rao and Ballard, 1999; Friston and

Kiebel, 2009; Keller and Mrsic-Flogel, 2018; Jiang et al.,

2021) has received growing attention in recent years as a

model of how the brain learns models of the world through

prediction and self-supervised learning. In predictive cod-

ing, feedback connections from a higher to a lower level

of a cortical neural network (e.g., the visual cortex) con-

vey predictions of lower level responses and the predic-

tion errors are conveyed via feedforward connections to

correct the higher level estimates, completing a prediction-

error-correction cycle. Such a model has provided expla-

nations for a wide variety of neural and cognitive phenom-

ena (Keller and Mrsic-Flogel, 2018; Jiang and Rao, 2022).

The layered architecture of the cortex is remarkably similar

across cortical areas (Mountcastle, 1978), hinting at a com-

mon computational principle, with superﬁcial layers (lay-

ers 2-4) receiving and processing sensory information and

deeper layers (layer 5) conveying outputs to motor centers

(Sherman and Guillery, 2013). The traditional predictive

coding model focused primarily on learning visual hierar-

chical representations and did not acknowledge the impor-

tant role of actions in learning internal world models.

In this paper, we introduce Active Predictive Coding, a new

model of predictive coding that combines state and action

networks at different abstract levels to learn hierarchical in-

ternal models. The model provides a uniﬁed framework for

solving three important AI problems as discussed below.

2 RELATED WORK

Part-Whole Learning Problem. Hinton and colleagues

have posed the problem of how neural networks can learn

to parse visual scenes into part-whole hierarchies by dy-

namically allocating nodes in a parse tree. They have ex-

plored networks that use a group of neurons to represent

not only the presence of an object but also parameters such

as position and orientation (Sabour et al., 2017; Kosiorek

et al., 2019; Hinton et al., 2018; Hinton, 2021), seeking

to overcome the inability of deep convolutional neural net-

works (CNNs) (Krizhevsky et al., 2012) which are unable

to explain the images they classify in the way humans do,

in terms of objects, parts and their locations. A major open

question is how neural networks can learn to create parse

trees on-the-ﬂy using learned part-whole representations.

Reference Frames Problem. In a parallel line of re-

search, Hawkins and colleagues (Hawkins, 2021; Lewis

et al., 2019) have taken inspiration from the cortex and

“grid cells” to propose that the brain uses object-centered

arXiv:2210.13461v1 [cs.LG] 23 Oct 2022

Active Predictive Coding: A Uniﬁed Neural Framework for Learning Hierarchical World Models for Perception and Planning

reference frames to represent objects, spatial environments

and even abstract concepts. The question of how such refer-

ence frames can be learned and used in a nested manner for

hierarchical recognition and reasoning has remained open.

Integrated State-Action Hierarchy Learning Problem.

A considerable literature exists on hierarchical reinforce-

ment learning (see (Hutsebaut-Buysse et al., 2022) for a

recent survey), where the goal is to make traditional rein-

forcement learning (RL) algorithms more efﬁcient through

state and/or action abstraction. A particularly popular ap-

proach is to use options (Sutton et al., 1999), which are ab-

stract actions which can be selected in particular states (in

the option’s initiation set) and whose execution executes a

sequence of primitive actions as prescribed by the option’s

lower level policy. A major problem that has received re-

cent attention (e.g., Bacon et al., 2016) is learning options

from interactions with the environment. The broader prob-

lem of simultaneously learning state and action abstraction

hierarchies has remained relatively less explored.

3 CONTRIBUTIONS OF THE PAPER

The proposed active predictive coding (APC) model ad-

dresses all three problems above in a uniﬁed manner us-

ing state/action embeddings and hypernetworks (Ha et al.,

2017) to dynamically generate and generalize over state

and action networks at multiple hierarchical levels. The

APC model contributes to a number of lines of research

which have not been connected before:

Perception, Predictive Coding, and Reference Frame

Learning: APC extends predictive coding and related neu-

roscience models of brain function (Rao and Ballard, 1999;

Friston and Kiebel, 2009; Jiang et al., 2021) to hierarchical

sensory-motor inference and learning, and connects these

to learning nested reference frames (Hawkins, 2021) for

perception and cognition.

Attention Networks: APC extends previous hard attention

approaches such as the Recurrent Attention Model (RAM)

(Mnih et al., 2014) and Attend-Infer-Repeat (AIR) (Eslami

et al., 2016) by learning structured hierarchical strategies

for sampling the visual scene.

Hierarchical Planning and Reinforcement Learn-

ing: APC contributes to hierarchical planning/reinforce-

ment learning research (Hutsebaut-Buysse et al., 2022;

Botvinick et al., 2009) by proposing a new way of simulta-

neously learning abstract macro-actions or options (Sutton

et al., 1999) and abstract states.

General Applicability in AI: When applied to vision, the

APC model learns to hierarchically represent and parse im-

ages into parts and locations. When applied to RL prob-

lems, the model can exploit hypernets to (a) deﬁne a state

hierarchy, not merely through state aggregation, but by ab-

stracting transition dynamics at multiple levels, and (b) po-

tentially generalize learned hierarchical states and actions

(options) to novel scenarios via interpolation/extrapolation

Figure 1: Active Predictive Coding Generative Mod-

ule: (A) Canonical generative module for the APC model.

The lower level functions are generated via hypernetworks

based on the current higher level state and action embed-

ding vectors. All functions (in boxes) are implemented

as recurrent neural networks (RNNs). Arrows with circu-

lar terminations denote generation of function parameters

(here, neural network weights and biases). (B) Two-level

model used in this paper. (C) Generation of states and ac-

tions in the 2-level model based on past states and actions.

in the input embedding space of the hypernetworks.

Our approach brings us closer towards a neural solution to

an important challenge in both AI and cognitive science

(Lake et al., 2017): how can neural networks learn hier-

archical compositional representations that allow new con-

cepts to be created, recognized and learned?

4 ACTIVE PREDICTIVE CODING

MODEL

The APC model implements a hierarchical version of the

traditional Partially Observable Markov Decision Process

(POMDP) (Kaelbling et al., 1998; Rao, 2010), with each

level employing a state network and an action network.

Figure 1A shows the canonical generative module for the

APC model. The module consists of (1) a higher level state

embedding vector r(i+1) at level i+ 1, which uses a func-

tion Hi

s(implemented as a hypernetwork (Ha et al., 2017))1

to generate a lower level state transition function fi

s(imple-

mented as an RNN), and (2) a higher level action embed-

ding vector a(i+1), which uses a function (hypernetwork)

ato generate a lower level option/policy function fi

a(im-

plemented as an RNN). The state and action networks at the

lower level are generated independently (by the higher level

state/action embedding vectors) but exchange information

1See Supplementary Materials for an alternate neural imple-

mentation of the APC model using higher-level embedding-based

inputs to lower-level RNNs.

Rajesh P. N. Rao, Dimitrios C. Gklezakos, Vishwas Sathish

horizontally within each level as shown in Figure 1C: the

state network generates the next state prediction based on

the current state and action while the action network gener-

ates the current action based on the current state and previ-

ous action. In our current implementation, the lower level

RNNs execute for a ﬁxed number of time steps before re-

turning control back to the higher level.2For the present

paper, we focus on a two-level model (with a top level and

bottom level) as shown in Figures 1B-C.

4.1 Inference in the Active Predictive Coding Model

Inference involves estimating the state and action vectors

at multiple levels based on the sequence of inputs produced

by interacting with the environment in the context of a par-

ticular task or goal. The top level runs for T2steps (referred

to as “macro-steps”). For each macro-step, the bottom level

runs for T1“micro-steps”. As shown in Figure 1C, Fs, Fa

are the top level state and action networks respectively, and

Rt, Atare the recurrent activity vectors of these networks

(i.e., the top level state and action embedding vectors) at

macro-step t. We use the notation f(; θ)to denote a net-

work parameterized by θ={Wl, bl}L

l=1, the weight matri-

ces and biases for all the layers. The bottom level state and

action RNNs are denoted by fs(; θs)and fa(; θa), while

their activity vectors are denoted by rt,τ and at,τ respec-

tively (tranges over macro-steps, τover micro-steps).

4.1.1 Higher-Level Inference and Reference Frame

Generation

At each macro-step t, the top level state RNN Fspro-

duces a new state embedding vector Rtbased on the previ-

ous state and action embedding vectors. This higher level

state Rtdeﬁnes a new “reference frame” for the lower

level to operate over as follows: Rtis fed as input to the

state hypernetwork Hsto generate the lower level param-

eters θs(t) = Hs(Rt)specifying a dynamically generated

bottom-level state RNN characterizing the state transition

dynamics locally (e.g., local parts and their transformations

in vision (see Section 5), navigation dynamics in a local re-

gion of a building (see Section 6)). Figure 2 (a) illustrates

this top-down generation process.

The current state Rtis also input to the action/policy RNN

Fawhich outputs an action embedding vector At(a macro-

action/option/sub-goal) appropriate for the current task/-

goal given current state Rt(Figure 2 (a)). This embedding

vector Atis used as input to a non-linear function, imple-

mented by the hypernetwork Ha, to dynamically generate

the parameters θa(t) = Ha(At)of the lower-level action

RNN, which implements a policy to generate primitive ac-

tions suitable for achieving the sub-goal associated with

At. Since reinforcement learning or planning require ex-

ploration, the output of an action network at any level can

2Future implementations will explore the use of termination

functions (Sutton et al., 1999; Eslami et al., 2016) to allow a vari-

able number of time steps at each level and for each input.

Figure 2: Inference in the Active Predictive Coding

Model: (a) Dynamic generation of bottom-level state RNN

fsand action RNN fa(“sub-programs”) from top-level

state vector Rtand action vector At. This diagram elab-

orates the one in Figure 1C. (b) Update of top-level state

Rtand action Atbased on feedback (via networks ρsand

ρa) upon bottom-level sub-program termination.

be regarded as the mean value of a Gaussian with ﬁxed vari-

ance (Mnih et al., 2014) or as a categorical representation

(Hafner et al., 2021) to sample an action.

4.1.2 Lower-Level Inference and Interaction with the

Higher Level

At the beginning of each micro-step, the higher-level state

Rtis used to initialize the bottom-level state vector via

a small feedforward network Initsto produce rt,0=

Inits(Rt). Each micro-step proceeds in a manner similar

to a macro-step. The bottom-level action RNN, which was

generated by higher-level state Rt, produces the current ac-

tion at,τ based on the current lower level state and previ-

ous action (Figure 2 (a) lower right). This action (e.g., eye

movement or body movement) results in a new input being

generated by the environment for the bottom (and possibly

higher) state network.

To predict this new input, the lower-level state vector rt,τ

is fed to a generic decoder network Dto generate the pre-

diction ˆ

It,τ . This predicted input is compared to the ac-

tual input to generate a prediction error t,τ =It,τ −ˆ

It,τ .

Following the predictive coding model (Rao and Ballard,

1999), the prediction error is used to update the state vector

via the state network: rt,τ +1 =fs(rt,τ , t,τ , at,τ ;θ(s)(t))

(Figure 2 (a) lower left).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ActivePredictiveCoding:AUniedNeuralFrameworkforLearningHierarchicalWorldModelsforPerceptionandPlanningRajeshP.N.RaoDimitriosC.GklezakosVishwasSathishPaulG.AllenSchoolofComputerScienceandEngineering,UniversityofWashington,Seattlefrao,gklezd,vsathishg@cs.washington.eduAbstractPredictivecodinghasemerg...

展开>> 收起<<

Active Predictive Coding A Uniﬁed Neural Framework for Learning Hierarchical World Models for Perception and Planning Rajesh P. N. Rao Dimitrios C. Gklezakos Vishwas Sathish.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Active Predictive Coding A Uniﬁed Neural Framework for Learning Hierarchical World Models for Perception and Planning Rajesh P. N. Rao Dimitrios C. Gklezakos Vishwas Sathish

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: