Efficient Learning of Locomotion Skills through the Discovery of
Diverse Environmental Trajectory Generator Priors
Shikha Surana*1, Bryan Lim*1, Antoine Cully1
Abstract— Data-driven learning based methods have recently
been particularly successful at learning robust locomotion con-
trollers for a variety of unstructured terrains. Prior work has
shown that incorporating good locomotion priors in the form
of trajectory generators (TGs) is effective at efficiently learning
complex locomotion skills. However, defining a good, single TG
as tasks/environments become increasingly more complex re-
mains a challenging problem as it requires extensive tuning and
risks reducing the effectiveness of the prior. In this paper, we
present Evolved Environmental Trajectory Generators (EETG),
a method that learns a diverse set of specialised locomotion
priors using Quality-Diversity algorithms while maintaining
a single policy within the Policies Modulating TG (PMTG)
architecture. The results demonstrate that EETG enables a
quadruped robot to successfully traverse a wide range of
environments, such as slopes, stairs, rough terrain, and balance
beams. Our experiments show that learning a diverse set of
specialized TG priors is significantly (5 times) more efficient
than using a single, fixed prior when dealing with a wide range
of environments.
I. INTRODUCTION
Legged robots [1], [2], [3] have tremendous potential for
societal impact as they can be used for applications involving
a wide range of environments such as rough, cluttered and
unstructured terrain. From search and rescue and inspection
work [4] to carrying heavy payloads, legged robots have the
potential to undertake many of the physical activities humans
and animals are capable of that are dangerous and unhealthy.
However, legged robots are also underactuated high-
dimensional systems with many constraints making them
challenging to control. Recently, reinforcement learning (RL)
approaches [5], [6], [7], [8], [9], [10] have become compet-
itive to more conventional model-based optimization meth-
ods [11], [12], [13], [14], demonstrating state-of-art locomo-
tion abilities both in simulation and in the real-world [5], [6].
These learnt controllers are especially robust when evaluated
across many different environments and perturbations. De-
spite these significant advances, learning based approaches
in robotics are notoriously known for being sample inefficient
and usually requires a large amount of data [15], [16].
Researchers have tried to address this problem in a number of
different ways. For example, improving the sample efficiency
of the underlying RL algorithm used [17] or using fast,
highly parallel simulators [18], [8]. Another effective way
is to incorporate useful priors in the learning process.
Policies Modulating Trajectory Generator (PMTG) [19]
is one such method which incorporates a parameterized
*Equal Contribution
1Imperial College London, United Kingdom. {ss5721, bwl116,
a.cully}@ic.ac.uk
Trajectory Generator (TG) as a prior, separate to the learnt
policy. PMTG makes learning complex locomotion tasks
easier and demonstrates that a good locomotion prior can
significantly help the efficiency of reinforcement learning
(RL) methods [19]. Lee et al. [6] also used this PMTG
framework when demonstrating state-of-the-art locomotion
across wide range of environments in the real-world, further
demonstrating the effectiveness of the TG prior for locomo-
tion.
However, there are still some questions surrounding the
PMTG method. How are the parameters of the TG defined?
What parameters make a good prior? The parameters of
the TGs used in prior work are usually defined manually
by engineers based on intuition of the locomotion task of
interest. For instance, a forward swinging TG motion is
useful when learning to walk forward [19]. On the other
hand, for more complex tasks such as learning to walk across
a diversity of difficult environments, a more generic and
unbiased TG motion of stepping up and down in place had
to be used [6]. While this TG choice proved effective, this
could reduce the effectiveness of the prior in helping learning
and could indicate that the policy still has to do the bulk of
the work as it has to deal with the different environments.
For example, a good TG prior for ascending steps would
differ from that of descending steps. In this paper, we address
this by learning good priors for tasks instead of manually
crafting them. More importantly, we also learn a diverse set
of specialized priors using Quality-Diversity (QD) algorithms
rather than using just a single prior.
The main contribution of our work is a novel framework,
Evolved Environmental Trajectory Generators (EETG) for
discovering a diverse set of specialized Trajectory Generators
(TGs) which act as priors for more efficient learning. We
demonstrate in our experiments that our method enables a
simulated A1 quadruped robot to learn dynamic locomotion
behaviors over diverse environment types such as slopes, un-
even terrain, and steps. Our experiments show that EETG is
as good as learning individual TGs and policies across all
environments while being significantly more efficient. Our
work demonstrates that learning a diverse set of TG prior
is more effective than a single fixed TG especially when
dealing with many tasks and environments.
II. RELATED WORK
Legged Locomotion. Locomotion controllers have tra-
ditionally been designed using a modular control frame-
work. This framework breaks down the difficult control
problem into smaller sub-problems. Each sub-problem makes
arXiv:2210.04819v2 [cs.NE] 22 Jun 2023