Output Feedback Tube MPC-Guided Data Augmentation for Robust Efﬁcient Sensorimotor Policy Learning Andrea Tagliabue Jonathan P. How

2025-04-29 0 0 1.94MB 9 页 10玖币

侵权投诉

Output Feedback Tube MPC-Guided Data Augmentation for

Robust, Efﬁcient Sensorimotor Policy Learning

Andrea Tagliabue, Jonathan P. How

Abstract— Imitation learning (IL) can generate computa-

tionally efﬁcient sensorimotor policies from demonstrations

provided by computationally expensive model-based sensing

and control algorithms. However, commonly employed IL

methods are often data-inefﬁcient, requiring the collection of

a large number of demonstrations and producing policies with

limited robustness to uncertainties. In this work, we combine

IL with an output feedback robust tube model predictive

controller (RTMPC) to co-generate demonstrations and a data

augmentation strategy to efﬁciently learn neural network-based

sensorimotor policies. Thanks to the augmented data, we reduce

the computation time and the number of demonstrations needed

by IL, while providing robustness to sensing and process

uncertainty. We tailor our approach to the task of learning

a trajectory tracking visuomotor policy for an aerial robot,

leveraging a 3D mesh of the environment as part of the data

augmentation process. We numerically demonstrate that our

method can learn a robust visuomotor policy from a single

demonstration—a two-orders of magnitude improvement in

demonstration efﬁciency compared to existing IL methods.

I. INTRODUCTION

Imitation learning [1]–[3] is increasingly employed to gen-

erate computationally-efﬁcient policies from computationally-

expensive model-based sensing [4], [5] and control [6]–[8]

algorithms for onboard deployment. The key to this method

is to leverage the inference speed of deep neural networks,

which are trained to imitate a set of task-relevant expert

demonstrations collected from the model-based algorithms.

This approach has been used to generate efﬁcient sensorimotor

policies [7], [9]–[12] capable of producing control commands

from raw sensory data, bypassing the computational cost of

control and state estimation, while providing beneﬁts in terms

of latency and robustness. Such sensorimotor policies have

demonstrated impressive performance on a variety of tasks,

including agile ﬂight [7] and driving [13].

However, one of the fundamental limitations of existing

methods employed to produce sensorimotor policies (e.g.,

Behavior Cloning (BC) [1], DAgger [2]) is the overall number

of demonstrations that must be collected from the model-

based algorithm. This inefﬁciency hinders the possibility

of generating policies by directly collecting data from the

real robot, requiring the user to rely on accurate simulators.

Furthermore, the need to repeatedly query a computationally

expensive expert introduces high computational costs during

training, requiring expensive training equipment. The funda-

mental cause of these inefﬁciencies is the need to achieve

robustness to sensing noise, disturbances, and modeling errors

encountered during deployment, which can cause deviations

All the authors are with the MIT Department of Aeronautics and

Astronautics. {atagliab, jhow}@mit.edu









Fig. 1: Proposed visuomotor policy learning approach. We collect demonstrations from

an output feedback robust tube MPC, which accounts for the effects of process and

sensing uncertainty via its tube section Z. We use the tube to obtain a data augmentation

strategy, employing a 3D mesh of the environment created from images Itcaptured by

an onboard camera during the demonstration collection phase. The augmented synthetic

images I+

tcorrespond to states sampled inside the tube, and the corresponding actions

are obtained via the ancillary controller, part of the tube MPC framework.

of the policy’s state distribution from its training distribution—

an issue known as covariate shift.

Strategies based on matching the disturbances between the

training and deployment domains (e.g., Domain Randomiza-

tion (

) [14]) improve the robustness of the learned policy,

but do not address the demonstration and computational

efﬁciency challenges. Data augmentation approaches based on

generating synthetic sensory measurements and correspond-

ing stabilizing actions have shown promising results [15],

[16]. However, they rely on handcrafted heuristics for the

augmentation procedure and have been mainly studied on

lower-dimensional tasks (e.g., steering a 2D Dubins car).

In this work, we propose a new framework that leverages

a model-based robust controller, an output feedback

RTMPC

to provide robust demonstrations and a corresponding data

augmentation strategy for efﬁcient sensorimotor policy learn-

ing. Speciﬁcally, we extend our previous work where we show

that

RTMPC

can be leveraged to generate augmented data

that enables efﬁcient learning of a motor control policy [6].

This new work, summarized in Figure 1, employs an output

feedback variant of

RTMPC

and a model of the sensory

observations to design a data augmentation strategy that also

accounts for the uncertainty caused by sensing imperfections

during the demonstration collection phase. This approach

enables efﬁcient, robust sensorimotor learning from a few

demonstrations and relaxes our previous assumption [6] that

state information is available at training and deployment time.

We tailor our method to the context of learning a visuo-

motor policy, capable of robustly tracking given trajectories

while using images from an onboard camera to control the

position of an aerial robot. Our approach leverages a 3D mesh

of the environment reconstructed from images to generate

the observation model needed for the proposed augmentation

arXiv:2210.10127v1 [cs.RO] 18 Oct 2022

strategy. This technique is additionally well-suited to be used

with photorealistic, data-driven simulation engines such as

FlightMare [17] and FlightGoggles [18], or SLAM and state

estimation pipelines that directly produce 3D dense/mesh

reconstructions of the environment, such as Kimera [19].

Contributions.

In summary, our work presents the fol-

lowing contributions: a) We introduce a new data aug-

mentation strategy, an extension of our previous work [6],

enabling efﬁcient and robust learning of a sensorimotor

policy, capable to generate control actions directly from

raw sensor measurements (e.g., images) instead of state

estimates. Our approach is grounded in the output feedback

RTMPC

framework theory, unlike previous methods that rely

on handcrafted heuristics, and leverages a 3D mesh of the

environment to generate augmented data. b) We demonstrate

our methodology in the context of visuomotor policy learning

for an aerial robot, showing that it can track a trajectory

from raw images with high robustness (

>90%

success

rate) after a

single demonstration

, despite sensory noise and

disturbances. c) We open-source our framework, available at

https://github.com/andretag.

II. RELATED WORKS

Sensorimotor policy learning by imitating model-based

experts.

Learning a sensorimotor policy by imitating the

output of a model-based expert algorithm bypasses the

computational cost of planning [8], control [6] and state

estimation [10], with the potential of achieving increased

robustness [7], [20] and reduced latency with respect to

conventional autonomy pipelines. The sensory input typically

employed is based on raw images [3], [12], [21] or pre-

processed representations, such as feature tracks [7], depth

maps [8], or intermediate layers of a CNN [22]. Vision is

often complemented with proprioceptive information, such

as the one provided by IMUs [7] or wheel encoders [21].

These approaches showcase the advantages of sensorimotor

policy learning but do not leverage any data augmentation

strategy. Consequently, they query the expert many times

during the data collection phase, increasing time, number of

demonstrations and computational effort to obtain the policy.

Data augmentation for visuomotor and sensorimotor

learning.

Traditional data augmentation strategies for vi-

suomotor policy learning have focused on increasing a

policy’s generalization ability by applying perturbations [23]

or noise [24] directly in image space, without modifying

the corresponding action. These methods do not directly

address covariate shift issues caused by process uncertainties.

The self-driving literature has developed a second class of

visuomotor data augmentation strategies [15], [16], [25]

capable of compensating covariate-shift issues. This class

relies instead on ﬁrst generating different views from synthetic

cameras [15], [16] or data-driven simulators [25] and then

computing a corresponding action via a handcrafted controller.

These methods, however, rely on heuristics to establish

relevant augmented views and the corresponding control

action. Our work provides a more general methodology for

data augmentation and demonstrates it on a quadrotor system

(higher-dimensional than the planar self-driving car models).

Output feedback RTMPC.

Model predictive control [26]

leverages a model of the system dynamics to generate actions

that take into account state and actuation constraints. This

is achieved by solving a constrained optimization problem

along a predeﬁned temporal horizon, using the model to

predict the effects of future actions. Robust variants of MPC,

such as

RTMPC

, usually assume that the system is subject

to additive, bounded process uncertainty (e.g., disturbances,

model errors). As a consequence, they modify nominal plans

by either a) assuming a worst-case disturbance [27], [28],

or b) employing an auxiliary (ancillary) controller. This

controller maintains the system within some distance (“cross-

section” of a tube) from the nominal plan regardless of the

realization of the disturbances [29], [30]. Output feedback

RTMPC

[31]–[33] also accounts for the effects of sensing

uncertainty (e.g., sensing noise, imperfect state estimation).

Our work relies on an output feedback

RTMPC

[32], [33]

to generate demonstrations and a data augmentation strategy.

However, thanks to the proposed imitation learning strategy,

our approach does not require solving the optimization

problem online, reducing the onboard computational cost.

III. PROBLEM STATEMENT

Our goal is to generate a deep neural network sensorimotor

policy

πθ

, with parameters

, to control a mobile robot (e.g.,

multirotor). The policy needs to be capable of tracking a

desired reference trajectory given high-dimensional noisy

sensor measurements, and has the form

ut=πθ(ot,Xdes

t),(1)

where

denotes the discrete time index,

represents the

deterministic control actions, and

ot= (It,oother,t)

the high-

dimensional, noisy sensor measurements, comprised of an

image

captured by an onboard camera, and other noisy

measurements

oother,t

(e.g., attitude, velocity). The

N+ 1

steps of the reference trajectory are written as

Xdes

{xdes

0|t,...,xdes

N|t}

, where

xdes

i|t

indicates the desired state at the

future time

t+i

, as given at the current time

, and

N > 0

represents the total number of given future desired states. Our

objective is to efﬁciently learn the policy parameters

θ∗

leveraging

and demonstrations provided by a model-based

controller (expert).

System model.

We assume available a model of the dy-

namics of the robot, described by a set of linear (e.g., via

linearization), discrete and time-invariant equations:

xt+1 =Axt+But+wt(2)

where matrices

A∈Rnx×nx

and

B∈Rnx×nu

represent

the system dynamics,

xt∈X⊂Rnx

represents the state

of the system, and

ut∈U⊂Rnu

represents the control

inputs. The system is subject to state and inputs constraint

and

, assumed to be convex polytopes containing the

origin [31]. The quantity

wt∈W⊂Rnx

(2)

captures

time-varying additive process uncertainties. This includes

disturbances and model errors that the system may encounter

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OutputFeedbackTubeMPC-GuidedDataAugmentationforRobust,EfcientSensorimotorPolicyLearningAndreaTagliabue,JonathanP.HowAbstractImitationlearning(IL)cangeneratecomputa-tionallyefcientsensorimotorpoliciesfromdemonstrationsprovidedbycomputationallyexpensivemodel-basedsensingandcontrolalgorithms.However...

展开>> 收起<<

Output Feedback Tube MPC-Guided Data Augmentation for Robust Efﬁcient Sensorimotor Policy Learning Andrea Tagliabue Jonathan P. How.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Output Feedback Tube MPC-Guided Data Augmentation for Robust Efﬁcient Sensorimotor Policy Learning Andrea Tagliabue Jonathan P. How

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: