Augmentative Topology Agents For Open-ended Learning Muhammad Umair Nasir Michael Beukman Steven James and Christopher Cleghorn Abstract In this work we tackle the problem of open-

2025-05-02 0 0 418.68KB 12 页 10玖币

侵权投诉

Augmentative Topology Agents For Open-ended Learning

Muhammad Umair Nasir, Michael Beukman, Steven James and Christopher Cleghorn

Abstract— In this work, we tackle the problem of open-

ended learning by introducing a method that simulta-

neously evolves agents and increasingly challenging en-

vironments. Unlike previous open-ended approaches that

optimize agents using a ﬁxed neural network topology, we

hypothesize that generalization can be improved by allow-

ing agents’ controllers to become more complex as they

encounter more difﬁcult environments. Our method, Aug-

mentative Topology EPOET (ATEP), extends the Enhanced

Paired Open-Ended Trailblazer (EPOET) algorithm by

allowing agents to evolve their own neural network struc-

tures over time, adding complexity and capacity as neces-

sary. Empirical results demonstrate that ATEP results in

general agents capable of solving more environments than a

ﬁxed-topology baseline. We also investigate mechanisms for

transferring agents between environments and ﬁnd that a

species-based approach further improves the performance

and generalization of agents.

I. INTRODUCTION

Machine learning has successfully been used to solve

numerous problems, such as classifying images [1], writ-

ing news articles [2, 3] or solving games like Atari [4]

or chess [5]. While impressive, these approaches still

largely follow a traditional paradigm where a human

speciﬁes a task that is subsequently solved by the

agent. In most cases, this is the end of the agent’s

learning—once it can solve the required task, no further

progression takes place.

Open-ended learning is a research ﬁeld that takes a

different view: rather than converge to a speciﬁc goal,

the aim is to obtain an increasingly growing set of

diverse and interesting behaviors [6, 7]. One approach

is to allow both the agents, as well as the environments,

to change, evolve and improve over time [8, 9]. This

has the potential to discover a large collection of useful

and reusable skills [10], as well as interesting and novel

environments [11]. Open-ended learning is also a much

more promising way to obtain truly general agents than

the traditional single task-oriented paradigm [12].

The concept of open-ended evolution has been a

part of artiﬁcial life (ALife) research for decades now,

spawning numerous artiﬁcial worlds [13, 14, 15, 16, 17].

These worlds consist of agents with various goals, such

*University of the Witwatersrand, South Africa

as survival, predation, or reproduction. Recently, open-

ended algorithms have received renewed interest [7],

with Stanley et al. [6] proposing the paradigm as a path

towards the goal of human-level artiﬁcial intelligence.

A major breakthrough in open-ended evolution was

that of NeuroEvolution of Augmenting Topologies

(NEAT) [18], which was capable of efﬁciently solving

complex reinforcement learning tasks. Its key idea was

to allow the structure of the network to evolve alongside

the weights, starting with a simple network and adding

complexity as the need arises.

This inspired future research about open-endedly

evolving networks indeﬁnitely [17]. Speciﬁcally, novelty

search [19], used the idea of novelty to drive evolution,

instead of traditional objective-based techniques.

This in turn led to the emergence of quality diversity

(QD) algorithms [20, 21, 22, 23], which are based on

combining novelty with an objective sense of progress,

where the goal is to obtain a collection of diverse and

high-performing individuals.

While QD has successfully been used in numer-

ous domains, such as robotic locomotion [24, 21, 25],

video game playing [22] and procedural content genera-

tion [26, 27], it still is not completely open-ended. One

reason for this is that the search space for phenotypical

behavior characteristics (or behavioral descriptors) re-

mains ﬁxed [21]. A second reason is that in many cases,

the environment remains ﬁxed, which limits the open-

endedness of the algorithm [9]. A way to circumvent

this is to co-evolve problems and solutions, as is done

by Minimal Criterion Coevolution (MCC) [8].

This co-evolutionary pressure allowed more complex

mazes to develop, and better agents to solve them

emerged, giving rise to an open-ended process.

However, MCC had some limits; for instance, it only

allows new problems if they are solvable by individuals

in the current population. This leads to only slight

increases in difﬁculty, and complexity which only arises

randomly. Taking this into account, Paired Open-ended

Trailblazer (POET) [9] builds upon MCC, but instead

allows the existence of unsolvable environments, if it

was likely that some individuals could quickly learn to

solve these environments. POET further innovates by

transferring agents between different environments, to

arXiv:2210.11442v2 [cs.AI] 11 Oct 2023

increase the likelihood of solving hard problems. While

POET obtained state of the art results, its diversity slows

down as it evolves for longer. Enhanced POET [28]

adds improved algorithmic components to the base

POET method, resulting in superior performance and

less stagnation. Enhanced POET, however, uses agents

with ﬁxed topology neural network controllers. While

this approach works well for simple environments, it

has an eventual limit on the complexity of tasks it can

solve: at some point of complexity, the ﬁxed topology

agents may not have sufﬁcient capacity to solve the

environments.

To address this issue, we propose Augmentative Topol-

ogy Enhanced POET (ATEP), which uses NEAT to

evolve agents with variable, and potentially unbounded,

network topologies. We argue that ﬁxed-topology agents

will cease to solve environments after a certain level of

complexity and empirically show that ATEP outperforms

Enhanced POET (EPOET) in a standard benchmark

domain. Finally, we ﬁnd that using NEAT results in im-

proved exploration and better generalization compared

to Enhanced POET.

II. RELATED WORK

POET [9] and EPOET [28] are the founding algo-

rithms of the ﬁeld of open-ended reinforcement learning,

building upon prior approaches such as MCC [8]. This

has led to an explosion of new use cases such as

PINSKY [29, 30], which uses POET on 2D Atari games.

This approach extends POET to generate 2D Atari video

game levels alongside agents that solve these levels.

Quessy and Richardson [10] uses unsupervised skill

discovery [31, 32, 33] in the context of POET to discover

a large repertoire of useful skills. Meier and Mujika [34]

also investigate unsupervised skill discovery through

reward functions learned by neural networks. Other uses

of POET include the work by Zhou and Vanschoren

[35], who obtain diverse skills in a 3D locomotion task.

POET has also been shown to aid in evolving robot

morphologies [36] and avoiding premature convergence

which is often the result when using handcrafted curric-

ula. Norstein et al. [37] use MAP-Elites [21] to open-

endedly create a structured repertoire of various terrains

and virtual creatures.

Adversarial approaches are commonly adopted when

developing open-ended algorithms. Dennis et al. [38]

propose PAIRED, a learning algorithm where an ad-

versary would produce an environment based on the

difference between the performance of an antagonist and

a protagonist agent. Domain randomization [39], priori-

tized level replay [40] and Adversarially Compounding

Complexity by Editing Levels (ACCEL) [41] adopt

a similar adversarial approach, where teacher agents

produce environments and student agents solve them.

Several domains and benchmarks have been proposed

with the aim of encouraging research into open-ended,

general agents. Team et al. [12] introduce the XLand en-

vironment, where a single agent is trained on 700k3D

games, including single and multi-agent games, resulting

in zero-shot generalization on holdout test environments.

Barthet et al. [42] introduced an autoencoder [43, 44]

and CPPN-NEAT based open-ended evolutionary al-

gorithm to evolve Minecraft [45, 46] buildings. They

showed how differences in the training of the autoen-

coders can affect the evolution and generated structures.

Fan et al. [47] create a Minecraft-based environment,

MineDojo, which has numerous open-ended tasks. They

also introduced MineCLIP as an effective language-

conditioned reward function that plays the role of an

automatic metric for generation tasks. Gan et al. [48]

introduce the Open-ended Physics Environment (OPEn)

to test learning representations, and tested many RL-

based agents. Their results indicate that agents that make

use of unsupervised contrastive representation learning,

and impact-driven learning for exploration, achieve the

best result.

A related ﬁeld is that of lifelong, or continual learning.

Here, there is often only one agent, which is in an envi-

ronment with multiple tasks [49, 50]. The agent can then

continuously improve as it experiences new settings. A

challenge here is how to transfer knowledge between

different tasks, while preventing catastrophic forgetting,

i.e. where an agent performs worse on previously learned

tasks after ﬁne-tuning on new ones [51, 52]. In par-

ticular, Rusu et al. [53] introduce Progressive Neural

Networks where, for each new task, the existing network

is frozen and extra capacity is added, which can learn

in this new task. This allows the model to leverage

lateral connections and transfer information from pre-

vious tasks, while not diminishing the performance on

previously learned tasks. Other works attempt to keep

the agent actions unchanged on these old tasks. von

Oswald et al. [52] do this by learning a model that

transforms a task identiﬁer into a neural network. For

new tasks, the loss incentivises the model to output

similar weights for already learned task identiﬁers. Li

and Hoiem [54] use a similar technique, where prior

outputs from the model should not change signiﬁcantly

when learning a new task.

III. ENHANCED POET

Since our method is heavily based on EPOET, we

brieﬂy describe this method, as well as the original

POET algorithm. POET focuses on evolving pairs of

agents and environments in an attempt to create spe-

cialist agents that solve particular environments. POET

uses the 2D Bipedal Walker Hardcore environ-

ment from OpenAI Gym [55] as a benchmark. The

ﬁrst environment is a ﬂat surface, and as evolution

progresses, the environments become harder with the

addition of more obstacles. POET also transfers agents

across environments, which can prevent stagnation and

leverage experience gained on one environment as a step

towards solving another. An Environment-Agent (EA)

pair is eligible to reproduce when the agent crosses a

preset reward threshold on this environment. The next

generation of environments is formed by mutating the

current population and selecting only those environ-

ments that are neither too easy nor too hard. Finally,

environments are ranked by novelty, and only the most

novel children pass through to the next generation. More

information about the hyperparameters of POET is listed

in the supplementary material.

EPOET improves upon POET by adding in two

algorithmic improvements: (1) a general method of

evaluating the novelty of challenges and (2) an improved

approach to deciding when agents should transfer to new

environments. In the original POET, the way to evaluate

novelty was to compare the environment characteriza-

tion (EC) of different environments. This is obtained by

using some ﬁxed, domain-speciﬁc static features, such

as the roughness of the terrain. This inherently limits the

exploration of the algorithm, as it is restricted to explore

within these preset conﬁnes. Enhanced POET introduces

an improved EC, Performance of All Transferred Agents

EC (PATA-EC), which is based on the performance

of different agents in the environment. Secondly, the

original transfer mechanism in POET was generally

inefﬁcient, as it increased the required computation (as

each agent needed to be ﬁne-tuned), and resulted in

subpar transfers as it was too easy to qualify for transfer.

Enhanced POET makes this process more strict, only

transferring very promising agents.

Enhanced POET also improves upon the environmen-

tal encoding used in the original algorithm, which was

ﬁxed and thus had a limited number of unique and

diverse environments it could represent. The solution to

this problem is to use a more expressive encoding in

the form of compositional pattern producing networks

(CPPNs) [56]. A CPPN is a speciﬁc neural network,

which can take in x,ycoordinates and produce a speciﬁc

pattern when evaluated across an entire region. These

CPPNs are evolved using the NEAT [18] algorithm,

which increases the complexity of the environments as

evolution progresses.

Lastly, the authors introduce Accumulated Number of

Novel Environments Created and Solved (ANNECS),

a metric for open-ended learning that, intuitively, de-

scribes the amount of interesting new content that is

generated by the algorithm. ANNECS counts the number

of environments that satisfy two constraints: (1) it must

neither be too easy nor too hard and (2) it must be

eventually solved by some agents in the future. Thus,

if the ANNECS metric increases as time goes on, it

indicates that the algorithm is continually producing

novel and interesting environments.

IV. OPEN-ENDEDLY EVOLVING THE TOPOLOGY OF

AGENTS

Many of the approaches introduced in prior work

have been implemented using a ﬁxed topology approach

in conjunction with optimizers such as evolutionary

strategies (ES) [57], V-MPO [58] (a modiﬁed version of

maximum a posteriori optimization [59] which relies on

value functions) and Proximal Policy Optimization [60],

which motivates us to explore NEAT and the beneﬁts it

brings to the open-ended learning framework. We ﬁrst

describe the use of NEAT in Section IV-A and describe

the overall approach in Section IV-B.

A. NeuroEvolution of augmenting topologies

We leverage NeuroEvolution of Augmenting Topolo-

gies (NEAT) to evolve the structure of an agent’s con-

troller.

NEAT starts with a population of simple neural

networks (NNs), where the input neurons are directly

connected to the output neurons without any hidden

layers. Crossover is performed between two parents and

the resulting children are mutated by adding connections

and nodes, or perturbing weights. In this way, the NN

will gradually be complexiﬁed.

One of the major problem to overcome is the Per-

mutations or Competing Convention Problem [61, 62].

Competing conventions describes the case in which the

crossover of networks that represent the same solution

but are encoded differently (e.g. a different ordering

of neurons) can lead to a loss of information and

a signiﬁcantly worse child. NEAT addresses this by

introducing a method to keep track of the historic origin

of a gene by using the innovation number. Using this

innovation number, identical genes from two parents

can be aligned, while genes that only occur in one

(denoted excess or disjoint genes depending on their

position) can be inherited from the ﬁtter parent. Finally,

NEAT introduces speciation [63], where individuals with

similar topologies are grouped together, and share a

ﬁtness. This protects innovation and ensures diversity.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AugmentativeTopologyAgentsForOpen-endedLearningMuhammadUmairNasir,MichaelBeukman,StevenJamesandChristopherCleghornAbstract—Inthiswork,wetackletheproblemofopen-endedlearningbyintroducingamethodthatsimulta-neouslyevolvesagentsandincreasinglychallengingen-vironments.Unlikepreviousopen-endedapproachesth...

展开>> 收起<<

Augmentative Topology Agents For Open-ended Learning Muhammad Umair Nasir Michael Beukman Steven James and Christopher Cleghorn Abstract In this work we tackle the problem of open-.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Augmentative Topology Agents For Open-ended Learning Muhammad Umair Nasir Michael Beukman Steven James and Christopher Cleghorn Abstract In this work we tackle the problem of open-

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: