Learning the Dynamics of Compliant Tool-Environment Interaction for Visuo-Tactile Contact Servoing

2025-04-29 0 0 2.94MB 15 页 10玖币

侵权投诉

Learning the Dynamics of Compliant

Tool-Environment Interaction for Visuo-Tactile

Contact Servoing

Mark Van der Merwe Dmitry Berenson Nima Fazeli

Department of Robotics

University of Michigan

{markvdm, dmitryb, nfz}@umich.edu

https://www.mmintlab.com/extrinsic-contact-servoing/

Abstract: Many manipulation tasks require the robot to control the contact be-

tween a grasped compliant tool and the environment, e.g. scraping a frying pan

with a spatula. However, modeling tool-environment interaction is difﬁcult, es-

pecially when the tool is compliant, and the robot cannot be expected to have

the full geometry and physical properties (e.g., mass, stiffness, and friction) of

all the tools it must use. We propose a framework that learns to predict the ef-

fects of a robot’s actions on the contact between the tool and the environment

given visuo-tactile perception. Key to our framework is a novel contact feature

representation that consists of a binary contact value, the line of contact, and an

end-effector wrench. We propose a method to learn the dynamics of these contact

features from real world data that does not require predicting the geometry of the

compliant tool. We then propose a controller that uses this dynamics model for

visuo-tactile contact servoing and show that it is effective at performing scraping

tasks with a spatula, even in scenarios where precise contact needs to be made to

avoid obstacles.

Keywords: Contact-Rich Manipulation, Multi-Modal Dynamics Learning

1 Introduction

Figure 1: We present a method for extrinsic contact servoing, i.e., controlling contact between a

compliant tool and the environment. Our method is able to complete the requested contact trajectory,

avoiding contact with surface obstacles, and successfully scrape the target object. Note that to do

this the spatula must be tilted so that only a corner of it is in contact.

Many manipulation tasks require the robot to control the contact between a grasped tool and the

environment. The ability to reason over and control this extrinsic contact is crucial to enabling

helpful robots that can scrape a frying pan with a spatula, eraser or wipe a surface [1], screw a bottle

cap onto a bottle [2], perform peg-in-hole assemblies [3,4], and perform many other tasks.

6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.

arXiv:2210.03836v1 [cs.RO] 7 Oct 2022

In this work, we seek to address the problem of controlling the extrinsic contact between a grasped

compliant tool (e.g. a spatula) and the environment. In general, the robot cannot expect to have the

full geometry and physical properties (e.g., mass, friction, stiffness) of all the tools it must use or

the geometries of the environments it must manipulate in. Instead, the robot must utilize multimodal

sensory observations, such as pointclouds and tactile feedback, to act on the environment.

In recent years, learning-based methods have become increasingly popular to address the complex-

ities of robotic manipulation, including for contact-rich tasks [5]. These methods can be loosely

grouped into model-free methods, that directly learn a policy [3,2,6], and model-based methods,

that learn system dynamics [7,8,9]. By focusing on modeling system dynamics, model-based meth-

ods can plan to reach new goals without retraining, and are often more data-efﬁcient [9]. Therefore,

we propose learning the dynamics of our system to solve the extrinsic contact servoing task.

It is not obvious which representation to use for these dynamics. Fully recovering tool and environ-

ment geometries from visual data [10,11] and tactile feedback [12] has been widely explored, with

recent extensions to compliant geometries [13]; however, even if the system can be fully identiﬁed,

contact models to resolve interactions can have limited ﬁdelity [14]. On the other hand, learned dy-

namics representations can be difﬁcult to interpret and require demonstrations or observations from

the desired state to specify goals [7,15]. Instead, we propose a novel contact feature representa-

tion for our learning method that focuses on tool-environment interaction and bypasses explicitly

modeling the whole system. We represent the contact conﬁguration as 1) a binary contact mode

(indicating if the system is in contact); 2) a contact geometry (as a line in 3D space); and 3) an

end-effector wrench.

We propose a learning architecture to model the dynamics of the proposed contact representation

from raw sensory observations over candidate action trajectories. We propose structuring the model

as a latent space dynamics model with a decoder that recovers the contact state. We also propose

an action offset term in the dynamics that allows us to accurately propagate robot poses, despite

controller errors (e.g. from robot impedance). To provide labels to our model, we collect self-

supervised data on a 7DoF Franka Emika Panda, using sensor data to automatically label contact

state.

We validate our proposed method by completing various desired contact trajectories on the real robot

system. We ﬁrst show that our method can track diverse desired contact trajectories in the absence

of obstacles. Next, we demonstrate that we can utilize extrinsic contact servoing to scrape a target

object from the table, while handling occlusions and avoiding contact with obstacles (Fig. 1).

2 Related Work

Existing research has investigated the task of recovering contact locations. Manuelli et al. [16] local-

ize point contacts on a rigid robot with known geometry by employing a particle ﬁltering approach

to update a set of candidate contact locations based on force torque sensing. Kim et al. [4] and Ma

et al. [17] model contact between a grasped rigid object and the environment by assuming stationary

line contacts and modeling the deformation of a GelSlim gripper. The estimated line contact is then

used in a Reinforcement Learning (RL) policy. Neither of these methods extends to compliant tools

and neither models the dynamics of the contact conﬁguration.

Other works explore tactile servoing methods, where contact at the sensor is driven to a desired

conﬁguration. Li et al. [18] use a large tactile pad and deﬁne contact conﬁguration features of

objects pressed against the sensor. They manually construct a feedback controller based on these

features and use it to drive contacts to desired conﬁgurations. Sutanto et al. [19] use a smaller proﬁle

tactile sensor and learn the dynamics of a learned latent space. They then employ a Model Predictive

Control (MPC) scheme to drive contacts to desired conﬁgurations on the sensor. Both of these works

assume contact is happening at the sensing location. We, on the other hand, seek to servo extrinsic

contacts, where we do not get direct sensing at the point of contact.

Other work focuses on maintaining contact between a tool and the environment. Sakaino [20] uses

imitation learning to learn a controller able to maintain contact between a mop and a tabletop. In

contrast, we wish to not only maintain contact but control the extrinsic contact geometry.

3 Problem Formulation

We parameterize our contact feature as a binary contact indicator cb∈ {0,1}, used to indicate

whether the tool is in contact, a contact line cl∈R2×3representing the contact geometry between

the tool and the environment, and an end effector wrench cw∈R6. The geometry clis only active

when the tool is in contact cb= 1. The contact representation allows extrinsic contact goals to be

expressed as desired contact trajectories G= [g1,g2,...,gL], where each gi∈R2×3is a desired

contact line to reach. We assume that contact should be maintained throughout the task.

We formulate extrinsic contact servoing as a model predictive planning problem, given observations

of the current state of the system o0. For a given horizon T, we select the next Tdesired contact

lines [gi+1,...,gi+T]⊆Gto be our current contact goal sequence. The planning problem is:

min

a0:T−1

t=1

d(cl

t,gi+t)

s.t. cb

t= 1,∀t∈[1, T ]

{cb

0:T,cl

0:T,cw

0:T}=g(o0,a0:T−1)

(1)

Here gis a model describing the contact feature dynamics. The binary constraint ensures that the

tool remains in contact while the cost function dmeasures the distance between the two contact

lines, as the average Euclidean distance between the line endpoints. Finally, if d(cl

1,gi+1)<  we

increment i, thus moving to the next sequence of desired contact lines for the next round of planning.

4 Method

4.1 Contact Feature Dynamics Model

To solve our constrained optimization Eq. 1, we require a model gwhich can map from raw obser-

vations o0and a proposed action trajectory a0:T−1to the resulting contact states {cb

0:T,cl

0:T,cw

0:T}.

We propose modeling the contact feature dynamics as a deep neural network. Our actions are

changes in end effector pose.

We assume access to a pointcloud v0and input wrench h0measured at the robot’s wrist as our

observations, o0= (v0,h0). Note that end effector wrench is both an input to our method and part

of the contact state; predicting future wrench aids the representation learning and provides expected

wrenches for planning.

We perform all learning in the local end effector frame. We transform the pointcloud to the end

effector frame EE0v0and clip to a 0.5m3bounding box region around the end effector that con-

tains the contact event. We similarly predict our contact lines in the current end effector frame,

EEtcl

t,∀t∈[1, T ]. Learning in the end effector frame provides invariance in the visual domain to

translations and rotations of the end effector and removes distractors that do not contribute to the

contact state, such as the rest of the robot arm or the scene background.

Our contact feature dynamics model (Figure 2) has three components: an encoder ewhich maps

from raw observations to a learned latent space, a decoder dwhich maps from the latent space to the

contact state, and a dynamics model fwhich captures dynamics in the latent space. We parameterize

the models by a set of learned weights θ.

We start by embedding the current observations into the latent space with our encoder ˆ

z0=

e(v0,h0). We unroll actions in the latent space as the contact state alone has insufﬁcient con-

textual information (e.g. end-effector pose and local geometry information) to predict the next

contact state. Because we predict the tth contact state in the current end effector frame EEt, an

important consideration when designing our dynamics model is being able to accurately recover

this frame. Controller error, e.g., from the impedance of the robot, means the commanded action

is not perfectly executed. To account for this, we predict an additional term from our dynamics

model ∆ˆ

at+1, which is an SE(3) transformation that predicts the offset between the commanded

and realized next end effector pose. Thus, our dynamics model predicts, ˆ

zt+1,∆ˆ

at+1 =f(ˆ

zt,at).

This allows us to construct the following recursive estimate of our end effector frame Wˆ

TEEt+1 =

Wˆ

TEEtT(at)T(∆ˆ

at+1), where Wˆ

TEEtis the SE(3) transformation describing the pose of the end

effector at time t. We know the initial transform Wˆ

TEE0from our robot proprioception, T(at)pro-

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

LearningtheDynamicsofCompliantTool-EnvironmentInteractionforVisuo-TactileContactServoingMarkVanderMerweDmitryBerensonNimaFazeliDepartmentofRoboticsUniversityofMichiganfmarkvdm,dmitryb,nfzg@umich.eduhttps://www.mmintlab.com/extrinsic-contact-servoing/Abstract:Manymanipulationtasksrequiretherobottocon...

展开>> 收起<<

Learning the Dynamics of Compliant Tool-Environment Interaction for Visuo-Tactile Contact Servoing.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Learning the Dynamics of Compliant Tool-Environment Interaction for Visuo-Tactile Contact Servoing

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: