increase the likelihood of solving hard problems. While
POET obtained state of the art results, its diversity slows
down as it evolves for longer. Enhanced POET [28]
adds improved algorithmic components to the base
POET method, resulting in superior performance and
less stagnation. Enhanced POET, however, uses agents
with fixed topology neural network controllers. While
this approach works well for simple environments, it
has an eventual limit on the complexity of tasks it can
solve: at some point of complexity, the fixed topology
agents may not have sufficient capacity to solve the
environments.
To address this issue, we propose Augmentative Topol-
ogy Enhanced POET (ATEP), which uses NEAT to
evolve agents with variable, and potentially unbounded,
network topologies. We argue that fixed-topology agents
will cease to solve environments after a certain level of
complexity and empirically show that ATEP outperforms
Enhanced POET (EPOET) in a standard benchmark
domain. Finally, we find that using NEAT results in im-
proved exploration and better generalization compared
to Enhanced POET.
II. RELATED WORK
POET [9] and EPOET [28] are the founding algo-
rithms of the field of open-ended reinforcement learning,
building upon prior approaches such as MCC [8]. This
has led to an explosion of new use cases such as
PINSKY [29, 30], which uses POET on 2D Atari games.
This approach extends POET to generate 2D Atari video
game levels alongside agents that solve these levels.
Quessy and Richardson [10] uses unsupervised skill
discovery [31, 32, 33] in the context of POET to discover
a large repertoire of useful skills. Meier and Mujika [34]
also investigate unsupervised skill discovery through
reward functions learned by neural networks. Other uses
of POET include the work by Zhou and Vanschoren
[35], who obtain diverse skills in a 3D locomotion task.
POET has also been shown to aid in evolving robot
morphologies [36] and avoiding premature convergence
which is often the result when using handcrafted curric-
ula. Norstein et al. [37] use MAP-Elites [21] to open-
endedly create a structured repertoire of various terrains
and virtual creatures.
Adversarial approaches are commonly adopted when
developing open-ended algorithms. Dennis et al. [38]
propose PAIRED, a learning algorithm where an ad-
versary would produce an environment based on the
difference between the performance of an antagonist and
a protagonist agent. Domain randomization [39], priori-
tized level replay [40] and Adversarially Compounding
Complexity by Editing Levels (ACCEL) [41] adopt
a similar adversarial approach, where teacher agents
produce environments and student agents solve them.
Several domains and benchmarks have been proposed
with the aim of encouraging research into open-ended,
general agents. Team et al. [12] introduce the XLand en-
vironment, where a single agent is trained on 700k3D
games, including single and multi-agent games, resulting
in zero-shot generalization on holdout test environments.
Barthet et al. [42] introduced an autoencoder [43, 44]
and CPPN-NEAT based open-ended evolutionary al-
gorithm to evolve Minecraft [45, 46] buildings. They
showed how differences in the training of the autoen-
coders can affect the evolution and generated structures.
Fan et al. [47] create a Minecraft-based environment,
MineDojo, which has numerous open-ended tasks. They
also introduced MineCLIP as an effective language-
conditioned reward function that plays the role of an
automatic metric for generation tasks. Gan et al. [48]
introduce the Open-ended Physics Environment (OPEn)
to test learning representations, and tested many RL-
based agents. Their results indicate that agents that make
use of unsupervised contrastive representation learning,
and impact-driven learning for exploration, achieve the
best result.
A related field is that of lifelong, or continual learning.
Here, there is often only one agent, which is in an envi-
ronment with multiple tasks [49, 50]. The agent can then
continuously improve as it experiences new settings. A
challenge here is how to transfer knowledge between
different tasks, while preventing catastrophic forgetting,
i.e. where an agent performs worse on previously learned
tasks after fine-tuning on new ones [51, 52]. In par-
ticular, Rusu et al. [53] introduce Progressive Neural
Networks where, for each new task, the existing network
is frozen and extra capacity is added, which can learn
in this new task. This allows the model to leverage
lateral connections and transfer information from pre-
vious tasks, while not diminishing the performance on
previously learned tasks. Other works attempt to keep
the agent actions unchanged on these old tasks. von
Oswald et al. [52] do this by learning a model that
transforms a task identifier into a neural network. For
new tasks, the loss incentivises the model to output
similar weights for already learned task identifiers. Li
and Hoiem [54] use a similar technique, where prior
outputs from the model should not change significantly
when learning a new task.
III. ENHANCED POET
Since our method is heavily based on EPOET, we
briefly describe this method, as well as the original
POET algorithm. POET focuses on evolving pairs of