
Learning to Estimate 3-D States of Deformable Linear Objects from
Single-Frame Occluded Point Clouds
Kangchen Lv, Mingrui Yu, Yifan Pu, Xin Jiang, Gao Huang, and Xiang Li
Abstract— Accurately and robustly estimating the state of
deformable linear objects (DLOs), such as ropes and wires, is
crucial for DLO manipulation and other applications. However,
it remains a challenging open issue due to the high dimen-
sionality of the state space, frequent occlusions, and noises.
This paper focuses on learning to robustly estimate the states
of DLOs from single-frame point clouds in the presence of
occlusions using a data-driven method. We propose a novel
two-branch network architecture to exploit global and local
information of input point cloud respectively and design a fusion
module to effectively leverage the advantages of both methods.
Simulation and real-world experimental results demonstrate
that our method can generate globally smooth and locally
precise DLO state estimation results even with heavily occluded
point clouds, which can be directly applied to real-world robotic
manipulation of DLOs in 3-D space.
I. INTRODUCTION
Robotic manipulation of deformable linear objects
(DLOs), such as ropes and wires, has a wide variety of
applications in industrial, service, and health-care sectors
[1], [2]. An accurate and robust state estimator for DLOs
is obviously the prerequisite for subsequent manipulations.
Compared to rigid objects, the infinite dimensional DLO
state space makes it very challenging to perceive deforma-
tions. Besides, occlusions and noises occur frequently in
unstructured environments, resulting in higher requirements
for robust DLO state estimation.
Commonly used representations to describe DLO states
include Fourier-based parameterization [3], implicit latent
descriptors learned by neural networks [4], [5], a chain
of uniformly distributed nodes [6]–[9], etc. Among these
methods, representing a DLO as a chain of 3-D nodes (see
Fig. 1) is general in various manipulation tasks and will be
adopted in this work.
A complete processing stream to estimate the DLO state
can be roughly divided into three procedures: segmentation
(i.e., segmenting the DLO from environment), detection (i.e.,
estimating the DLO state in a single frame), and tracking
(i.e., tracking the deformation across several frames). As
sensors are RGB or RGB-D cameras in most cases, seg-
menting the DLO region in image space is the essential
first step for consecutive processing. [10]–[13] focus on
how to obtain pixel-level DLO masks of high quality using
K. Lv, M. Yu, Y. Pu, G. Huang, and X. Li are with the Department of Au-
tomation, Tsinghua University, China. X. Jiang is with the Beijing Academy
of Artificial Intelligence, Beijing, China. This work was supported in part by
the National Key R&D Program of China under Grant 2020AAA0105200,
in part by the Institute for Guo Qiang, Tsinghua University, and in part by
the National Natural Science Foundation of China under Grant U21A20517
and 52075290. Corresponding author: Xiang Li (xiangli@tsinghua.edu.cn)
Input unordered point cloud of DLO
Output estimated ordered nodes
Fig. 1. Illustration of our task: 3-D occlusion-robust DLO state estimation
from a single-frame point cloud. Red points are the unordered incomplete
point cloud of the occluded rope and blue connected dots represent our
estimated ordered node sequence as its current state.
traditional image processing or data-driven methods. As for
detection, this step aims at estimating the positions of nodes
along the DLO in one frame with the cleaned sensory data
as input. For example, [14], [15] use neural networks to
encode the DLO into several sequential key-points in the 2-
D image space; [16] estimates a skeleton line and 3-D joint
positions on it from point cloud to represent the DLO, but
not robust against occlusions and different DLO types. As
for tracking, various works have also been proposed to track
the correspondence of point cloud across video frames in
the presence of occlusions and self-intersections [17]–[23].
These works model DLO tracking task as a GMM-based
non-rigid point registration problem with some geometric
constraints. However, these pure tracking-based methods rely
on an accurate initial state which requires manual setting or
specific initial conditions. Besides, there are few effective
ways to rectify the accumulated drift errors or re-initialize
for tracking failure. Therefore, it is necessary to develop an
accurate and robust 3-D state estimation method for DLOs
from a single frame, which can be independently applied
to estimate the DLO state in each frame or combined with
tracking methods above to utilize temporal information.
In this paper, we focus on estimating a sequence of ordered
and uniformly distributed nodes from single-frame point
cloud occlusion-robustly to represent the state of DLO, as
shown in Fig. 1. Note that we only use point cloud as our
input without any auxiliary physical simulation and robot
configurations. The challenges of this task are as follows:
1) there are few distinguishable features in the point cloud
of DLOs; 2) occlusions and noises are common in the
environment; 3) generalization ability for different DLOs
is required. To deal with challenges above, we propose a
novel two-branch network architecture to leverage both the
global geometry information for guaranteeing smooth and
occlusion-robust shape, and local geometry information for
arXiv:2210.01433v2 [cs.RO] 2 May 2023