Learning the Dynamics of Compliant Tool-Environment Interaction for Visuo-Tactile Contact Servoing

2025-04-29 0 0 2.94MB 15 页 10玖币
侵权投诉
Learning the Dynamics of Compliant
Tool-Environment Interaction for Visuo-Tactile
Contact Servoing
Mark Van der Merwe Dmitry Berenson Nima Fazeli
Department of Robotics
University of Michigan
{markvdm, dmitryb, nfz}@umich.edu
https://www.mmintlab.com/extrinsic-contact-servoing/
Abstract: Many manipulation tasks require the robot to control the contact be-
tween a grasped compliant tool and the environment, e.g. scraping a frying pan
with a spatula. However, modeling tool-environment interaction is difficult, es-
pecially when the tool is compliant, and the robot cannot be expected to have
the full geometry and physical properties (e.g., mass, stiffness, and friction) of
all the tools it must use. We propose a framework that learns to predict the ef-
fects of a robot’s actions on the contact between the tool and the environment
given visuo-tactile perception. Key to our framework is a novel contact feature
representation that consists of a binary contact value, the line of contact, and an
end-effector wrench. We propose a method to learn the dynamics of these contact
features from real world data that does not require predicting the geometry of the
compliant tool. We then propose a controller that uses this dynamics model for
visuo-tactile contact servoing and show that it is effective at performing scraping
tasks with a spatula, even in scenarios where precise contact needs to be made to
avoid obstacles.
Keywords: Contact-Rich Manipulation, Multi-Modal Dynamics Learning
1 Introduction
Figure 1: We present a method for extrinsic contact servoing, i.e., controlling contact between a
compliant tool and the environment. Our method is able to complete the requested contact trajectory,
avoiding contact with surface obstacles, and successfully scrape the target object. Note that to do
this the spatula must be tilted so that only a corner of it is in contact.
Many manipulation tasks require the robot to control the contact between a grasped tool and the
environment. The ability to reason over and control this extrinsic contact is crucial to enabling
helpful robots that can scrape a frying pan with a spatula, eraser or wipe a surface [1], screw a bottle
cap onto a bottle [2], perform peg-in-hole assemblies [3,4], and perform many other tasks.
6th Conference on Robot Learning (CoRL 2022), Auckland, New Zealand.
arXiv:2210.03836v1 [cs.RO] 7 Oct 2022
In this work, we seek to address the problem of controlling the extrinsic contact between a grasped
compliant tool (e.g. a spatula) and the environment. In general, the robot cannot expect to have the
full geometry and physical properties (e.g., mass, friction, stiffness) of all the tools it must use or
the geometries of the environments it must manipulate in. Instead, the robot must utilize multimodal
sensory observations, such as pointclouds and tactile feedback, to act on the environment.
In recent years, learning-based methods have become increasingly popular to address the complex-
ities of robotic manipulation, including for contact-rich tasks [5]. These methods can be loosely
grouped into model-free methods, that directly learn a policy [3,2,6], and model-based methods,
that learn system dynamics [7,8,9]. By focusing on modeling system dynamics, model-based meth-
ods can plan to reach new goals without retraining, and are often more data-efficient [9]. Therefore,
we propose learning the dynamics of our system to solve the extrinsic contact servoing task.
It is not obvious which representation to use for these dynamics. Fully recovering tool and environ-
ment geometries from visual data [10,11] and tactile feedback [12] has been widely explored, with
recent extensions to compliant geometries [13]; however, even if the system can be fully identified,
contact models to resolve interactions can have limited fidelity [14]. On the other hand, learned dy-
namics representations can be difficult to interpret and require demonstrations or observations from
the desired state to specify goals [7,15]. Instead, we propose a novel contact feature representa-
tion for our learning method that focuses on tool-environment interaction and bypasses explicitly
modeling the whole system. We represent the contact configuration as 1) a binary contact mode
(indicating if the system is in contact); 2) a contact geometry (as a line in 3D space); and 3) an
end-effector wrench.
We propose a learning architecture to model the dynamics of the proposed contact representation
from raw sensory observations over candidate action trajectories. We propose structuring the model
as a latent space dynamics model with a decoder that recovers the contact state. We also propose
an action offset term in the dynamics that allows us to accurately propagate robot poses, despite
controller errors (e.g. from robot impedance). To provide labels to our model, we collect self-
supervised data on a 7DoF Franka Emika Panda, using sensor data to automatically label contact
state.
We validate our proposed method by completing various desired contact trajectories on the real robot
system. We first show that our method can track diverse desired contact trajectories in the absence
of obstacles. Next, we demonstrate that we can utilize extrinsic contact servoing to scrape a target
object from the table, while handling occlusions and avoiding contact with obstacles (Fig. 1).
2 Related Work
Existing research has investigated the task of recovering contact locations. Manuelli et al. [16] local-
ize point contacts on a rigid robot with known geometry by employing a particle filtering approach
to update a set of candidate contact locations based on force torque sensing. Kim et al. [4] and Ma
et al. [17] model contact between a grasped rigid object and the environment by assuming stationary
line contacts and modeling the deformation of a GelSlim gripper. The estimated line contact is then
used in a Reinforcement Learning (RL) policy. Neither of these methods extends to compliant tools
and neither models the dynamics of the contact configuration.
Other works explore tactile servoing methods, where contact at the sensor is driven to a desired
configuration. Li et al. [18] use a large tactile pad and define contact configuration features of
objects pressed against the sensor. They manually construct a feedback controller based on these
features and use it to drive contacts to desired configurations. Sutanto et al. [19] use a smaller profile
tactile sensor and learn the dynamics of a learned latent space. They then employ a Model Predictive
Control (MPC) scheme to drive contacts to desired configurations on the sensor. Both of these works
assume contact is happening at the sensing location. We, on the other hand, seek to servo extrinsic
contacts, where we do not get direct sensing at the point of contact.
Other work focuses on maintaining contact between a tool and the environment. Sakaino [20] uses
imitation learning to learn a controller able to maintain contact between a mop and a tabletop. In
contrast, we wish to not only maintain contact but control the extrinsic contact geometry.
2
3 Problem Formulation
We parameterize our contact feature as a binary contact indicator cb∈ {0,1}, used to indicate
whether the tool is in contact, a contact line clR2×3representing the contact geometry between
the tool and the environment, and an end effector wrench cwR6. The geometry clis only active
when the tool is in contact cb= 1. The contact representation allows extrinsic contact goals to be
expressed as desired contact trajectories G= [g1,g2,...,gL], where each giR2×3is a desired
contact line to reach. We assume that contact should be maintained throughout the task.
We formulate extrinsic contact servoing as a model predictive planning problem, given observations
of the current state of the system o0. For a given horizon T, we select the next Tdesired contact
lines [gi+1,...,gi+T]Gto be our current contact goal sequence. The planning problem is:
min
a0:T1
T
X
t=1
d(cl
t,gi+t)
s.t. cb
t= 1,t[1, T ]
{cb
0:T,cl
0:T,cw
0:T}=g(o0,a0:T1)
(1)
Here gis a model describing the contact feature dynamics. The binary constraint ensures that the
tool remains in contact while the cost function dmeasures the distance between the two contact
lines, as the average Euclidean distance between the line endpoints. Finally, if d(cl
1,gi+1)< we
increment i, thus moving to the next sequence of desired contact lines for the next round of planning.
4 Method
4.1 Contact Feature Dynamics Model
To solve our constrained optimization Eq. 1, we require a model gwhich can map from raw obser-
vations o0and a proposed action trajectory a0:T1to the resulting contact states {cb
0:T,cl
0:T,cw
0:T}.
We propose modeling the contact feature dynamics as a deep neural network. Our actions are
changes in end effector pose.
We assume access to a pointcloud v0and input wrench h0measured at the robot’s wrist as our
observations, o0= (v0,h0). Note that end effector wrench is both an input to our method and part
of the contact state; predicting future wrench aids the representation learning and provides expected
wrenches for planning.
We perform all learning in the local end effector frame. We transform the pointcloud to the end
effector frame EE0v0and clip to a 0.5m3bounding box region around the end effector that con-
tains the contact event. We similarly predict our contact lines in the current end effector frame,
EEtcl
t,t[1, T ]. Learning in the end effector frame provides invariance in the visual domain to
translations and rotations of the end effector and removes distractors that do not contribute to the
contact state, such as the rest of the robot arm or the scene background.
Our contact feature dynamics model (Figure 2) has three components: an encoder ewhich maps
from raw observations to a learned latent space, a decoder dwhich maps from the latent space to the
contact state, and a dynamics model fwhich captures dynamics in the latent space. We parameterize
the models by a set of learned weights θ.
We start by embedding the current observations into the latent space with our encoder ˆ
z0=
e(v0,h0). We unroll actions in the latent space as the contact state alone has insufficient con-
textual information (e.g. end-effector pose and local geometry information) to predict the next
contact state. Because we predict the tth contact state in the current end effector frame EEt, an
important consideration when designing our dynamics model is being able to accurately recover
this frame. Controller error, e.g., from the impedance of the robot, means the commanded action
is not perfectly executed. To account for this, we predict an additional term from our dynamics
model ˆ
at+1, which is an SE(3) transformation that predicts the offset between the commanded
and realized next end effector pose. Thus, our dynamics model predicts, ˆ
zt+1,ˆ
at+1 =f(ˆ
zt,at).
This allows us to construct the following recursive estimate of our end effector frame Wˆ
TEEt+1 =
Wˆ
TEEtT(at)T(∆ˆ
at+1), where Wˆ
TEEtis the SE(3) transformation describing the pose of the end
effector at time t. We know the initial transform Wˆ
TEE0from our robot proprioception, T(at)pro-
3
摘要:

LearningtheDynamicsofCompliantTool-EnvironmentInteractionforVisuo-TactileContactServoingMarkVanderMerweDmitryBerensonNimaFazeliDepartmentofRoboticsUniversityofMichiganfmarkvdm,dmitryb,nfzg@umich.eduhttps://www.mmintlab.com/extrinsic-contact-servoing/Abstract:Manymanipulationtasksrequiretherobottocon...

展开>> 收起<<
Learning the Dynamics of Compliant Tool-Environment Interaction for Visuo-Tactile Contact Servoing.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:2.94MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注