Augmentative Topology Agents For Open-ended Learning Muhammad Umair Nasir Michael Beukman Steven James and Christopher Cleghorn Abstract In this work we tackle the problem of open-

2025-05-02 0 0 418.68KB 12 页 10玖币
侵权投诉
Augmentative Topology Agents For Open-ended Learning
Muhammad Umair Nasir, Michael Beukman, Steven James and Christopher Cleghorn
Abstract In this work, we tackle the problem of open-
ended learning by introducing a method that simulta-
neously evolves agents and increasingly challenging en-
vironments. Unlike previous open-ended approaches that
optimize agents using a fixed neural network topology, we
hypothesize that generalization can be improved by allow-
ing agents’ controllers to become more complex as they
encounter more difficult environments. Our method, Aug-
mentative Topology EPOET (ATEP), extends the Enhanced
Paired Open-Ended Trailblazer (EPOET) algorithm by
allowing agents to evolve their own neural network struc-
tures over time, adding complexity and capacity as neces-
sary. Empirical results demonstrate that ATEP results in
general agents capable of solving more environments than a
fixed-topology baseline. We also investigate mechanisms for
transferring agents between environments and find that a
species-based approach further improves the performance
and generalization of agents.
I. INTRODUCTION
Machine learning has successfully been used to solve
numerous problems, such as classifying images [1], writ-
ing news articles [2, 3] or solving games like Atari [4]
or chess [5]. While impressive, these approaches still
largely follow a traditional paradigm where a human
specifies a task that is subsequently solved by the
agent. In most cases, this is the end of the agent’s
learning—once it can solve the required task, no further
progression takes place.
Open-ended learning is a research field that takes a
different view: rather than converge to a specific goal,
the aim is to obtain an increasingly growing set of
diverse and interesting behaviors [6, 7]. One approach
is to allow both the agents, as well as the environments,
to change, evolve and improve over time [8, 9]. This
has the potential to discover a large collection of useful
and reusable skills [10], as well as interesting and novel
environments [11]. Open-ended learning is also a much
more promising way to obtain truly general agents than
the traditional single task-oriented paradigm [12].
The concept of open-ended evolution has been a
part of artificial life (ALife) research for decades now,
spawning numerous artificial worlds [13, 14, 15, 16, 17].
These worlds consist of agents with various goals, such
*University of the Witwatersrand, South Africa
as survival, predation, or reproduction. Recently, open-
ended algorithms have received renewed interest [7],
with Stanley et al. [6] proposing the paradigm as a path
towards the goal of human-level artificial intelligence.
A major breakthrough in open-ended evolution was
that of NeuroEvolution of Augmenting Topologies
(NEAT) [18], which was capable of efficiently solving
complex reinforcement learning tasks. Its key idea was
to allow the structure of the network to evolve alongside
the weights, starting with a simple network and adding
complexity as the need arises.
This inspired future research about open-endedly
evolving networks indefinitely [17]. Specifically, novelty
search [19], used the idea of novelty to drive evolution,
instead of traditional objective-based techniques.
This in turn led to the emergence of quality diversity
(QD) algorithms [20, 21, 22, 23], which are based on
combining novelty with an objective sense of progress,
where the goal is to obtain a collection of diverse and
high-performing individuals.
While QD has successfully been used in numer-
ous domains, such as robotic locomotion [24, 21, 25],
video game playing [22] and procedural content genera-
tion [26, 27], it still is not completely open-ended. One
reason for this is that the search space for phenotypical
behavior characteristics (or behavioral descriptors) re-
mains fixed [21]. A second reason is that in many cases,
the environment remains fixed, which limits the open-
endedness of the algorithm [9]. A way to circumvent
this is to co-evolve problems and solutions, as is done
by Minimal Criterion Coevolution (MCC) [8].
This co-evolutionary pressure allowed more complex
mazes to develop, and better agents to solve them
emerged, giving rise to an open-ended process.
However, MCC had some limits; for instance, it only
allows new problems if they are solvable by individuals
in the current population. This leads to only slight
increases in difficulty, and complexity which only arises
randomly. Taking this into account, Paired Open-ended
Trailblazer (POET) [9] builds upon MCC, but instead
allows the existence of unsolvable environments, if it
was likely that some individuals could quickly learn to
solve these environments. POET further innovates by
transferring agents between different environments, to
arXiv:2210.11442v2 [cs.AI] 11 Oct 2023
increase the likelihood of solving hard problems. While
POET obtained state of the art results, its diversity slows
down as it evolves for longer. Enhanced POET [28]
adds improved algorithmic components to the base
POET method, resulting in superior performance and
less stagnation. Enhanced POET, however, uses agents
with fixed topology neural network controllers. While
this approach works well for simple environments, it
has an eventual limit on the complexity of tasks it can
solve: at some point of complexity, the fixed topology
agents may not have sufficient capacity to solve the
environments.
To address this issue, we propose Augmentative Topol-
ogy Enhanced POET (ATEP), which uses NEAT to
evolve agents with variable, and potentially unbounded,
network topologies. We argue that fixed-topology agents
will cease to solve environments after a certain level of
complexity and empirically show that ATEP outperforms
Enhanced POET (EPOET) in a standard benchmark
domain. Finally, we find that using NEAT results in im-
proved exploration and better generalization compared
to Enhanced POET.
II. RELATED WORK
POET [9] and EPOET [28] are the founding algo-
rithms of the field of open-ended reinforcement learning,
building upon prior approaches such as MCC [8]. This
has led to an explosion of new use cases such as
PINSKY [29, 30], which uses POET on 2D Atari games.
This approach extends POET to generate 2D Atari video
game levels alongside agents that solve these levels.
Quessy and Richardson [10] uses unsupervised skill
discovery [31, 32, 33] in the context of POET to discover
a large repertoire of useful skills. Meier and Mujika [34]
also investigate unsupervised skill discovery through
reward functions learned by neural networks. Other uses
of POET include the work by Zhou and Vanschoren
[35], who obtain diverse skills in a 3D locomotion task.
POET has also been shown to aid in evolving robot
morphologies [36] and avoiding premature convergence
which is often the result when using handcrafted curric-
ula. Norstein et al. [37] use MAP-Elites [21] to open-
endedly create a structured repertoire of various terrains
and virtual creatures.
Adversarial approaches are commonly adopted when
developing open-ended algorithms. Dennis et al. [38]
propose PAIRED, a learning algorithm where an ad-
versary would produce an environment based on the
difference between the performance of an antagonist and
a protagonist agent. Domain randomization [39], priori-
tized level replay [40] and Adversarially Compounding
Complexity by Editing Levels (ACCEL) [41] adopt
a similar adversarial approach, where teacher agents
produce environments and student agents solve them.
Several domains and benchmarks have been proposed
with the aim of encouraging research into open-ended,
general agents. Team et al. [12] introduce the XLand en-
vironment, where a single agent is trained on 700k3D
games, including single and multi-agent games, resulting
in zero-shot generalization on holdout test environments.
Barthet et al. [42] introduced an autoencoder [43, 44]
and CPPN-NEAT based open-ended evolutionary al-
gorithm to evolve Minecraft [45, 46] buildings. They
showed how differences in the training of the autoen-
coders can affect the evolution and generated structures.
Fan et al. [47] create a Minecraft-based environment,
MineDojo, which has numerous open-ended tasks. They
also introduced MineCLIP as an effective language-
conditioned reward function that plays the role of an
automatic metric for generation tasks. Gan et al. [48]
introduce the Open-ended Physics Environment (OPEn)
to test learning representations, and tested many RL-
based agents. Their results indicate that agents that make
use of unsupervised contrastive representation learning,
and impact-driven learning for exploration, achieve the
best result.
A related field is that of lifelong, or continual learning.
Here, there is often only one agent, which is in an envi-
ronment with multiple tasks [49, 50]. The agent can then
continuously improve as it experiences new settings. A
challenge here is how to transfer knowledge between
different tasks, while preventing catastrophic forgetting,
i.e. where an agent performs worse on previously learned
tasks after fine-tuning on new ones [51, 52]. In par-
ticular, Rusu et al. [53] introduce Progressive Neural
Networks where, for each new task, the existing network
is frozen and extra capacity is added, which can learn
in this new task. This allows the model to leverage
lateral connections and transfer information from pre-
vious tasks, while not diminishing the performance on
previously learned tasks. Other works attempt to keep
the agent actions unchanged on these old tasks. von
Oswald et al. [52] do this by learning a model that
transforms a task identifier into a neural network. For
new tasks, the loss incentivises the model to output
similar weights for already learned task identifiers. Li
and Hoiem [54] use a similar technique, where prior
outputs from the model should not change significantly
when learning a new task.
III. ENHANCED POET
Since our method is heavily based on EPOET, we
briefly describe this method, as well as the original
POET algorithm. POET focuses on evolving pairs of
agents and environments in an attempt to create spe-
cialist agents that solve particular environments. POET
uses the 2D Bipedal Walker Hardcore environ-
ment from OpenAI Gym [55] as a benchmark. The
first environment is a flat surface, and as evolution
progresses, the environments become harder with the
addition of more obstacles. POET also transfers agents
across environments, which can prevent stagnation and
leverage experience gained on one environment as a step
towards solving another. An Environment-Agent (EA)
pair is eligible to reproduce when the agent crosses a
preset reward threshold on this environment. The next
generation of environments is formed by mutating the
current population and selecting only those environ-
ments that are neither too easy nor too hard. Finally,
environments are ranked by novelty, and only the most
novel children pass through to the next generation. More
information about the hyperparameters of POET is listed
in the supplementary material.
EPOET improves upon POET by adding in two
algorithmic improvements: (1) a general method of
evaluating the novelty of challenges and (2) an improved
approach to deciding when agents should transfer to new
environments. In the original POET, the way to evaluate
novelty was to compare the environment characteriza-
tion (EC) of different environments. This is obtained by
using some fixed, domain-specific static features, such
as the roughness of the terrain. This inherently limits the
exploration of the algorithm, as it is restricted to explore
within these preset confines. Enhanced POET introduces
an improved EC, Performance of All Transferred Agents
EC (PATA-EC), which is based on the performance
of different agents in the environment. Secondly, the
original transfer mechanism in POET was generally
inefficient, as it increased the required computation (as
each agent needed to be fine-tuned), and resulted in
subpar transfers as it was too easy to qualify for transfer.
Enhanced POET makes this process more strict, only
transferring very promising agents.
Enhanced POET also improves upon the environmen-
tal encoding used in the original algorithm, which was
fixed and thus had a limited number of unique and
diverse environments it could represent. The solution to
this problem is to use a more expressive encoding in
the form of compositional pattern producing networks
(CPPNs) [56]. A CPPN is a specific neural network,
which can take in x,ycoordinates and produce a specific
pattern when evaluated across an entire region. These
CPPNs are evolved using the NEAT [18] algorithm,
which increases the complexity of the environments as
evolution progresses.
Lastly, the authors introduce Accumulated Number of
Novel Environments Created and Solved (ANNECS),
a metric for open-ended learning that, intuitively, de-
scribes the amount of interesting new content that is
generated by the algorithm. ANNECS counts the number
of environments that satisfy two constraints: (1) it must
neither be too easy nor too hard and (2) it must be
eventually solved by some agents in the future. Thus,
if the ANNECS metric increases as time goes on, it
indicates that the algorithm is continually producing
novel and interesting environments.
IV. OPEN-ENDEDLY EVOLVING THE TOPOLOGY OF
AGENTS
Many of the approaches introduced in prior work
have been implemented using a fixed topology approach
in conjunction with optimizers such as evolutionary
strategies (ES) [57], V-MPO [58] (a modified version of
maximum a posteriori optimization [59] which relies on
value functions) and Proximal Policy Optimization [60],
which motivates us to explore NEAT and the benefits it
brings to the open-ended learning framework. We first
describe the use of NEAT in Section IV-A and describe
the overall approach in Section IV-B.
A. NeuroEvolution of augmenting topologies
We leverage NeuroEvolution of Augmenting Topolo-
gies (NEAT) to evolve the structure of an agent’s con-
troller.
NEAT starts with a population of simple neural
networks (NNs), where the input neurons are directly
connected to the output neurons without any hidden
layers. Crossover is performed between two parents and
the resulting children are mutated by adding connections
and nodes, or perturbing weights. In this way, the NN
will gradually be complexified.
One of the major problem to overcome is the Per-
mutations or Competing Convention Problem [61, 62].
Competing conventions describes the case in which the
crossover of networks that represent the same solution
but are encoded differently (e.g. a different ordering
of neurons) can lead to a loss of information and
a significantly worse child. NEAT addresses this by
introducing a method to keep track of the historic origin
of a gene by using the innovation number. Using this
innovation number, identical genes from two parents
can be aligned, while genes that only occur in one
(denoted excess or disjoint genes depending on their
position) can be inherited from the fitter parent. Finally,
NEAT introduces speciation [63], where individuals with
similar topologies are grouped together, and share a
fitness. This protects innovation and ensures diversity.
摘要:

AugmentativeTopologyAgentsForOpen-endedLearningMuhammadUmairNasir,MichaelBeukman,StevenJamesandChristopherCleghornAbstract—Inthiswork,wetackletheproblemofopen-endedlearningbyintroducingamethodthatsimulta-neouslyevolvesagentsandincreasinglychallengingen-vironments.Unlikepreviousopen-endedapproachesth...

展开>> 收起<<
Augmentative Topology Agents For Open-ended Learning Muhammad Umair Nasir Michael Beukman Steven James and Christopher Cleghorn Abstract In this work we tackle the problem of open-.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:418.68KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注