Output Feedback Tube MPC-Guided Data Augmentation for Robust Efficient Sensorimotor Policy Learning Andrea Tagliabue Jonathan P. How

2025-04-29 0 0 1.94MB 9 页 10玖币
侵权投诉
Output Feedback Tube MPC-Guided Data Augmentation for
Robust, Efficient Sensorimotor Policy Learning
Andrea Tagliabue, Jonathan P. How
Abstract Imitation learning (IL) can generate computa-
tionally efficient sensorimotor policies from demonstrations
provided by computationally expensive model-based sensing
and control algorithms. However, commonly employed IL
methods are often data-inefficient, requiring the collection of
a large number of demonstrations and producing policies with
limited robustness to uncertainties. In this work, we combine
IL with an output feedback robust tube model predictive
controller (RTMPC) to co-generate demonstrations and a data
augmentation strategy to efficiently learn neural network-based
sensorimotor policies. Thanks to the augmented data, we reduce
the computation time and the number of demonstrations needed
by IL, while providing robustness to sensing and process
uncertainty. We tailor our approach to the task of learning
a trajectory tracking visuomotor policy for an aerial robot,
leveraging a 3D mesh of the environment as part of the data
augmentation process. We numerically demonstrate that our
method can learn a robust visuomotor policy from a single
demonstration—a two-orders of magnitude improvement in
demonstration efficiency compared to existing IL methods.
I. INTRODUCTION
Imitation learning [1]–[3] is increasingly employed to gen-
erate computationally-efficient policies from computationally-
expensive model-based sensing [4], [5] and control [6]–[8]
algorithms for onboard deployment. The key to this method
is to leverage the inference speed of deep neural networks,
which are trained to imitate a set of task-relevant expert
demonstrations collected from the model-based algorithms.
This approach has been used to generate efficient sensorimotor
policies [7], [9]–[12] capable of producing control commands
from raw sensory data, bypassing the computational cost of
control and state estimation, while providing benefits in terms
of latency and robustness. Such sensorimotor policies have
demonstrated impressive performance on a variety of tasks,
including agile flight [7] and driving [13].
However, one of the fundamental limitations of existing
IL
methods employed to produce sensorimotor policies (e.g.,
Behavior Cloning (BC) [1], DAgger [2]) is the overall number
of demonstrations that must be collected from the model-
based algorithm. This inefficiency hinders the possibility
of generating policies by directly collecting data from the
real robot, requiring the user to rely on accurate simulators.
Furthermore, the need to repeatedly query a computationally
expensive expert introduces high computational costs during
training, requiring expensive training equipment. The funda-
mental cause of these inefficiencies is the need to achieve
robustness to sensing noise, disturbances, and modeling errors
encountered during deployment, which can cause deviations
All the authors are with the MIT Department of Aeronautics and
Astronautics. {atagliab, jhow}@mit.edu




Fig. 1: Proposed visuomotor policy learning approach. We collect demonstrations from
an output feedback robust tube MPC, which accounts for the effects of process and
sensing uncertainty via its tube section Z. We use the tube to obtain a data augmentation
strategy, employing a 3D mesh of the environment created from images Itcaptured by
an onboard camera during the demonstration collection phase. The augmented synthetic
images I+
tcorrespond to states sampled inside the tube, and the corresponding actions
are obtained via the ancillary controller, part of the tube MPC framework.
of the policy’s state distribution from its training distribution—
an issue known as covariate shift.
Strategies based on matching the disturbances between the
training and deployment domains (e.g., Domain Randomiza-
tion (
DR
) [14]) improve the robustness of the learned policy,
but do not address the demonstration and computational
efficiency challenges. Data augmentation approaches based on
generating synthetic sensory measurements and correspond-
ing stabilizing actions have shown promising results [15],
[16]. However, they rely on handcrafted heuristics for the
augmentation procedure and have been mainly studied on
lower-dimensional tasks (e.g., steering a 2D Dubins car).
In this work, we propose a new framework that leverages
a model-based robust controller, an output feedback
RTMPC
,
to provide robust demonstrations and a corresponding data
augmentation strategy for efficient sensorimotor policy learn-
ing. Specifically, we extend our previous work where we show
that
RTMPC
can be leveraged to generate augmented data
that enables efficient learning of a motor control policy [6].
This new work, summarized in Figure 1, employs an output
feedback variant of
RTMPC
and a model of the sensory
observations to design a data augmentation strategy that also
accounts for the uncertainty caused by sensing imperfections
during the demonstration collection phase. This approach
enables efficient, robust sensorimotor learning from a few
demonstrations and relaxes our previous assumption [6] that
state information is available at training and deployment time.
We tailor our method to the context of learning a visuo-
motor policy, capable of robustly tracking given trajectories
while using images from an onboard camera to control the
position of an aerial robot. Our approach leverages a 3D mesh
of the environment reconstructed from images to generate
the observation model needed for the proposed augmentation
arXiv:2210.10127v1 [cs.RO] 18 Oct 2022
strategy. This technique is additionally well-suited to be used
with photorealistic, data-driven simulation engines such as
FlightMare [17] and FlightGoggles [18], or SLAM and state
estimation pipelines that directly produce 3D dense/mesh
reconstructions of the environment, such as Kimera [19].
Contributions.
In summary, our work presents the fol-
lowing contributions: a) We introduce a new data aug-
mentation strategy, an extension of our previous work [6],
enabling efficient and robust learning of a sensorimotor
policy, capable to generate control actions directly from
raw sensor measurements (e.g., images) instead of state
estimates. Our approach is grounded in the output feedback
RTMPC
framework theory, unlike previous methods that rely
on handcrafted heuristics, and leverages a 3D mesh of the
environment to generate augmented data. b) We demonstrate
our methodology in the context of visuomotor policy learning
for an aerial robot, showing that it can track a trajectory
from raw images with high robustness (
>90%
success
rate) after a
single demonstration
, despite sensory noise and
disturbances. c) We open-source our framework, available at
https://github.com/andretag.
II. RELATED WORKS
Sensorimotor policy learning by imitating model-based
experts.
Learning a sensorimotor policy by imitating the
output of a model-based expert algorithm bypasses the
computational cost of planning [8], control [6] and state
estimation [10], with the potential of achieving increased
robustness [7], [20] and reduced latency with respect to
conventional autonomy pipelines. The sensory input typically
employed is based on raw images [3], [12], [21] or pre-
processed representations, such as feature tracks [7], depth
maps [8], or intermediate layers of a CNN [22]. Vision is
often complemented with proprioceptive information, such
as the one provided by IMUs [7] or wheel encoders [21].
These approaches showcase the advantages of sensorimotor
policy learning but do not leverage any data augmentation
strategy. Consequently, they query the expert many times
during the data collection phase, increasing time, number of
demonstrations and computational effort to obtain the policy.
Data augmentation for visuomotor and sensorimotor
learning.
Traditional data augmentation strategies for vi-
suomotor policy learning have focused on increasing a
policy’s generalization ability by applying perturbations [23]
or noise [24] directly in image space, without modifying
the corresponding action. These methods do not directly
address covariate shift issues caused by process uncertainties.
The self-driving literature has developed a second class of
visuomotor data augmentation strategies [15], [16], [25]
capable of compensating covariate-shift issues. This class
relies instead on first generating different views from synthetic
cameras [15], [16] or data-driven simulators [25] and then
computing a corresponding action via a handcrafted controller.
These methods, however, rely on heuristics to establish
relevant augmented views and the corresponding control
action. Our work provides a more general methodology for
data augmentation and demonstrates it on a quadrotor system
(higher-dimensional than the planar self-driving car models).
Output feedback RTMPC.
Model predictive control [26]
leverages a model of the system dynamics to generate actions
that take into account state and actuation constraints. This
is achieved by solving a constrained optimization problem
along a predefined temporal horizon, using the model to
predict the effects of future actions. Robust variants of MPC,
such as
RTMPC
, usually assume that the system is subject
to additive, bounded process uncertainty (e.g., disturbances,
model errors). As a consequence, they modify nominal plans
by either a) assuming a worst-case disturbance [27], [28],
or b) employing an auxiliary (ancillary) controller. This
controller maintains the system within some distance (“cross-
section” of a tube) from the nominal plan regardless of the
realization of the disturbances [29], [30]. Output feedback
RTMPC
[31]–[33] also accounts for the effects of sensing
uncertainty (e.g., sensing noise, imperfect state estimation).
Our work relies on an output feedback
RTMPC
[32], [33]
to generate demonstrations and a data augmentation strategy.
However, thanks to the proposed imitation learning strategy,
our approach does not require solving the optimization
problem online, reducing the onboard computational cost.
III. PROBLEM STATEMENT
Our goal is to generate a deep neural network sensorimotor
policy
πθ
, with parameters
θ
, to control a mobile robot (e.g.,
multirotor). The policy needs to be capable of tracking a
desired reference trajectory given high-dimensional noisy
sensor measurements, and has the form
ut=πθ(ot,Xdes
t),(1)
where
t
denotes the discrete time index,
ut
represents the
deterministic control actions, and
ot= (It,oother,t)
the high-
dimensional, noisy sensor measurements, comprised of an
image
It
captured by an onboard camera, and other noisy
measurements
oother,t
(e.g., attitude, velocity). The
N+ 1
steps of the reference trajectory are written as
Xdes
t=
{xdes
0|t,...,xdes
N|t}
, where
xdes
i|t
indicates the desired state at the
future time
t+i
, as given at the current time
t
, and
N > 0
represents the total number of given future desired states. Our
objective is to efficiently learn the policy parameters
ˆ
θ
by
leveraging
IL
and demonstrations provided by a model-based
controller (expert).
System model.
We assume available a model of the dy-
namics of the robot, described by a set of linear (e.g., via
linearization), discrete and time-invariant equations:
xt+1 =Axt+But+wt(2)
where matrices
ARnx×nx
and
BRnx×nu
represent
the system dynamics,
xtXRnx
represents the state
of the system, and
utURnu
represents the control
inputs. The system is subject to state and inputs constraint
X
and
U
, assumed to be convex polytopes containing the
origin [31]. The quantity
wtWRnx
in
(2)
captures
time-varying additive process uncertainties. This includes
disturbances and model errors that the system may encounter
摘要:

OutputFeedbackTubeMPC-GuidedDataAugmentationforRobust,EfcientSensorimotorPolicyLearningAndreaTagliabue,JonathanP.HowAbstract—Imitationlearning(IL)cangeneratecomputa-tionallyefcientsensorimotorpoliciesfromdemonstrationsprovidedbycomputationallyexpensivemodel-basedsensingandcontrolalgorithms.However...

展开>> 收起<<
Output Feedback Tube MPC-Guided Data Augmentation for Robust Efficient Sensorimotor Policy Learning Andrea Tagliabue Jonathan P. How.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:1.94MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注