strategy. This technique is additionally well-suited to be used
with photorealistic, data-driven simulation engines such as
FlightMare [17] and FlightGoggles [18], or SLAM and state
estimation pipelines that directly produce 3D dense/mesh
reconstructions of the environment, such as Kimera [19].
Contributions.
In summary, our work presents the fol-
lowing contributions: a) We introduce a new data aug-
mentation strategy, an extension of our previous work [6],
enabling efficient and robust learning of a sensorimotor
policy, capable to generate control actions directly from
raw sensor measurements (e.g., images) instead of state
estimates. Our approach is grounded in the output feedback
RTMPC
framework theory, unlike previous methods that rely
on handcrafted heuristics, and leverages a 3D mesh of the
environment to generate augmented data. b) We demonstrate
our methodology in the context of visuomotor policy learning
for an aerial robot, showing that it can track a trajectory
from raw images with high robustness (
>90%
success
rate) after a
single demonstration
, despite sensory noise and
disturbances. c) We open-source our framework, available at
https://github.com/andretag.
II. RELATED WORKS
Sensorimotor policy learning by imitating model-based
experts.
Learning a sensorimotor policy by imitating the
output of a model-based expert algorithm bypasses the
computational cost of planning [8], control [6] and state
estimation [10], with the potential of achieving increased
robustness [7], [20] and reduced latency with respect to
conventional autonomy pipelines. The sensory input typically
employed is based on raw images [3], [12], [21] or pre-
processed representations, such as feature tracks [7], depth
maps [8], or intermediate layers of a CNN [22]. Vision is
often complemented with proprioceptive information, such
as the one provided by IMUs [7] or wheel encoders [21].
These approaches showcase the advantages of sensorimotor
policy learning but do not leverage any data augmentation
strategy. Consequently, they query the expert many times
during the data collection phase, increasing time, number of
demonstrations and computational effort to obtain the policy.
Data augmentation for visuomotor and sensorimotor
learning.
Traditional data augmentation strategies for vi-
suomotor policy learning have focused on increasing a
policy’s generalization ability by applying perturbations [23]
or noise [24] directly in image space, without modifying
the corresponding action. These methods do not directly
address covariate shift issues caused by process uncertainties.
The self-driving literature has developed a second class of
visuomotor data augmentation strategies [15], [16], [25]
capable of compensating covariate-shift issues. This class
relies instead on first generating different views from synthetic
cameras [15], [16] or data-driven simulators [25] and then
computing a corresponding action via a handcrafted controller.
These methods, however, rely on heuristics to establish
relevant augmented views and the corresponding control
action. Our work provides a more general methodology for
data augmentation and demonstrates it on a quadrotor system
(higher-dimensional than the planar self-driving car models).
Output feedback RTMPC.
Model predictive control [26]
leverages a model of the system dynamics to generate actions
that take into account state and actuation constraints. This
is achieved by solving a constrained optimization problem
along a predefined temporal horizon, using the model to
predict the effects of future actions. Robust variants of MPC,
such as
RTMPC
, usually assume that the system is subject
to additive, bounded process uncertainty (e.g., disturbances,
model errors). As a consequence, they modify nominal plans
by either a) assuming a worst-case disturbance [27], [28],
or b) employing an auxiliary (ancillary) controller. This
controller maintains the system within some distance (“cross-
section” of a tube) from the nominal plan regardless of the
realization of the disturbances [29], [30]. Output feedback
RTMPC
[31]–[33] also accounts for the effects of sensing
uncertainty (e.g., sensing noise, imperfect state estimation).
Our work relies on an output feedback
RTMPC
[32], [33]
to generate demonstrations and a data augmentation strategy.
However, thanks to the proposed imitation learning strategy,
our approach does not require solving the optimization
problem online, reducing the onboard computational cost.
III. PROBLEM STATEMENT
Our goal is to generate a deep neural network sensorimotor
policy
πθ
, with parameters
θ
, to control a mobile robot (e.g.,
multirotor). The policy needs to be capable of tracking a
desired reference trajectory given high-dimensional noisy
sensor measurements, and has the form
ut=πθ(ot,Xdes
t),(1)
where
t
denotes the discrete time index,
ut
represents the
deterministic control actions, and
ot= (It,oother,t)
the high-
dimensional, noisy sensor measurements, comprised of an
image
It
captured by an onboard camera, and other noisy
measurements
oother,t
(e.g., attitude, velocity). The
N+ 1
steps of the reference trajectory are written as
Xdes
t=
{xdes
0|t,...,xdes
N|t}
, where
xdes
i|t
indicates the desired state at the
future time
t+i
, as given at the current time
t
, and
N > 0
represents the total number of given future desired states. Our
objective is to efficiently learn the policy parameters
ˆ
θ∗
by
leveraging
IL
and demonstrations provided by a model-based
controller (expert).
System model.
We assume available a model of the dy-
namics of the robot, described by a set of linear (e.g., via
linearization), discrete and time-invariant equations:
xt+1 =Axt+But+wt(2)
where matrices
A∈Rnx×nx
and
B∈Rnx×nu
represent
the system dynamics,
xt∈X⊂Rnx
represents the state
of the system, and
ut∈U⊂Rnu
represents the control
inputs. The system is subject to state and inputs constraint
X
and
U
, assumed to be convex polytopes containing the
origin [31]. The quantity
wt∈W⊂Rnx
in
(2)
captures
time-varying additive process uncertainties. This includes
disturbances and model errors that the system may encounter