obstacles or people, etc.) and be robust to variations in the environment, its dynamics, and unseen
situations that can emerge in the real world.
In this article, we quantitatively study and report on the performance of a set of state-of-the-art
reinforcement learning approaches in the context of continuous control. We systematically evaluate
RL agents (or “controllers”) on their performance (i.e., the ability to accomplish the task specified
by the environment’s reward signal) as well as their robustness [
35
,
7
,
14
,
16
,
18
], which entails
a bounded form of generalisability. To do so, we used an open-source RL safety benchmarking
suite [
34
]. First, we empirically compare the control policies produced by both traditional and robust
RL agents at baseline and then when a variety of disturbances are injected into the environment.
What we observe is that both the traditional and robust RL agents are more robust to disturbances
injected through the actions of the agent while disturbances injected at the level of the observations and
dynamics of the agent cause much more rapid destabilisation. We also note that traditional “vanilla”
agents show similar performance to the robust RL agents even when disturbances are injected, despite
not being explicitly designed with this purpose in mind. By leveraging open-source simulations and
implementations, we hope that this work and our insights can provide a basis for further research into
safe and robust RL, especially for robot control.
2 Background
In RL, an agent, in our case, a robot, performs an action and receives feedback (reward) from the
environment on how well it is doing at the environment’s task, perceives the updated state of the
environment resulting from the action taken and repeats the process, learning over time to improve the
actions it takes to maximise reward collection (and this to correctly perform the task). The resulting
behaviour is called the agent’s policy and maps the environment’s state to actions [
28
]. While early
RL research was demonstrated in the context of grid worlds and games, in recent years, we have seen
a growing interest in physics-based simulation for robot learning [
8
,
11
,
19
,
6
]. For simplicity and
reproducibility reasons, however, many of these simulators are still fully deterministic (and prone to
be exploited by the agents).
In this study, we deliberately inject disturbances at different points of the RL learning and control
interaction loop to emulate the conditions an agent might encounter in the real world. For the sake
of brevity, the results reported in Sections 4 pertain to the classical cart-pole stabilisation task. In
the Supplementary Material we include results for the more complex tasks of quadrotor trajectory
tracking and stabilisation.
2.1 Injecting Disturbances in Robotic Environments
We systematically inject each of the disturbances in Figure 2 in one of three possible sites: observations,
actions, and dynamics of the environment that the RL agent interacts with.
Observation/state Disturbances
Observation/state disturbances occur when the robot’s sensors
cannot perceive the exact state of the robot. This is a very common problem in robotics and is tackled
with state estimation methods [
1
]. In the case of the cart-pole, this disturbance is four-dimensional—
as is the state—and is measured in metres in the first dimension, radians in the second, metres per
second in the third, and radians per second in the fourth. This disturbance is implemented by directly
modifying the state observed by the system. For the quadrotor task in the Supplementary Material,
observation disturbance is similarly added to the six-dimensional drone’s true state.
Action Disturbances
Action disturbances occur when the actuation of the robot’s motors is not
exactly as the control output specifies, resulting in a difference between the actual and expected action.
For example, action delays are often neglected or coarsely modeled in simple simulations. In the case
of the cart-pole, this disturbance is a one-dimensional force (in Newtons) in the
𝑥
-direction directly
applied to the slider-to-cart joint. For the quadrotor task, action disturbances are similarly added to
the UAV’s commanded individual motor thrusts.
External Dynamics Disturbances
External dynamics disturbances are disturbances directly ap-
plied to the robot that can be thought of as environmental factors such as wind or other external forces.
In the case of the cart-pole, this disturbance is two-dimensional and implemented as a tapping force
2