
Force/Torque Sensing for Soft Grippers using an External Camera
Jeremy A. Collins1, Patrick Grady1, Charles C. Kemp1
Input Image Force and Torque Estimate
Force/Torque
Sensor
Eye-in-Hand
Camera
VFTS-Net
FX
FY
FZ
TX
TY
TZ
Data Capture Setup
Fig. 1. We modify a soft robotic gripper by adding an eye-in-hand camera and a force/torque sensor. Data is collected by teleoperating the robot in a
variety of home and office settings. We train a network, VFTS-Net, to take images from the camera as input and output 3-axis forces and 3-axis torques.
Estimates from VFTS-Net are visualized as lightly shaded arrows, and ground truth measurements from the force/torque sensor are darkly shaded arrows.
Abstract— Robotic manipulation can benefit from wrist-
mounted force/torque (F/T) sensors, but conventional F/T sen-
sors can be expensive, difficult to install, and damaged by
high loads. We present Visual Force/Torque Sensing (VFTS),
a method that visually estimates the 6-axis F/T measurement
that would be reported by a conventional F/T sensor. In contrast
to approaches that sense loads using internal cameras placed
behind soft exterior surfaces, our approach uses an external
camera with a fisheye lens that observes a soft gripper. VFTS
includes a deep learning model that takes a single RGB image as
input and outputs a 6-axis F/T estimate. We trained the model
with sensor data collected while teleoperating a robot (Stretch
RE1 from Hello Robot Inc.) to perform manipulation tasks.
VFTS outperformed F/T estimates based on motor currents,
generalized to a novel home environment, and supported three
autonomous tasks relevant to healthcare: grasping a blanket,
pulling a blanket over a manikin, and cleaning a manikin’s
limbs. VFTS also performed well with a manually operated
pneumatic gripper. Overall, our results suggest that an external
camera observing a soft gripper can perform useful visual
force/torque sensing for a variety of manipulation tasks.
I. INTRODUCTION
During robotic manipulation, grippers often apply forces
and torques to the environment. Sensing the force and torque
applied by the gripper has been useful for autonomous
manipulation, but sensors that provide this information have
limitations. Notably, conventional F/T sensors can be expen-
sive, difficult to mount, and damaged by high loads.
For example, a common approach to measuring the load
applied to a gripper is to mount an F/T sensor between the
1Jeremy A. Collins, Patrick Grady, and Charles C. Kemp are with
the Institute for Robotics and Intelligent Machines at the Georgia Insti-
tute of Technology (GT). This work was supported by NSF Award #
2024444. Code, data, and models are available at https://github.
com/Healthcare-Robotics/visual-force-torque. Charles C.
Kemp is an associate professor at GT. He also owns equity in and works
part-time for Hello Robot Inc., which sells the Stretch RE1. He receives
royalties from GT for sales of the Stretch RE1.
gripper and the robot’s wrist. F/T sensors often use strain
gauges to sense tiny deformations in an elastic element of
the sensor. This approach requires that the strain gauges be
resilient to the external load applied to the gripper as well as
gravitational and inertial forces from the gripper itself. For
many applications, the strain gauges need to be both stiff and
sensitive, and protective coverings could reduce performance
by interfering with the load on the strain gauges. Together,
these design objectives are difficult to achieve.
We present an alternative to conventional F/T sensors.
Instead of relying on the deformation of internal compo-
nents, VFTS directly observes the deformation of a soft
gripper using an external camera. The high compliance of
soft grippers results in deformations that can be visually
observed using a commodity camera. By observing this load-
dependent phenomenon, the causative forces and torques can
be estimated. We rigidly mount a camera with a fisheye lens
to the gripper (i.e., an eye-in-hand camera). We then train
a convolutional neural network, VFTS-Net, to estimate the
applied force and torque based on a single RGB image from
this camera (Figure 1).
In contrast with conventional F/T sensors, our approach
relies on a low-cost USB camera (∼$60). Our method eases
installation by allowing the camera to be mounted to the
exterior of the gripper rather than between the gripper and
the wrist. Since the camera visually senses the loads from a
distance, it is also less likely to be damaged by high loads.
Researchers have investigated related methods that involve
placing a camera inside a gripper behind a compliant surface.
Loads applied can be estimated by observing deformation in
the surface. In contrast, our approach uses an external camera
to observe a soft gripper. Our approach does not require
modification of the gripper’s contact surfaces or interior. The
global view of the external camera facilitates estimation of
arXiv:2210.00051v3 [cs.RO] 8 May 2023