
2
1) Learning-based Bayesian particle filter is formulated
as a Bayesian particle filter with modeling the pose
likelihood with a Mixture Density Network (MDN) [4]
to resolve the multi-modal issue and calculate estimation
uncertainties. This estimates the relatively large pose
errors (both of position and orientation) of complex
shapes based on the contact wrench.
2) Self-supervised and RL-based controller increases re-
liability and data efficiency compared to end-to-end RL
combining the transformer [5]-based supervised learning
to predict and on-policy RL to predict and optimize the
low-level controller for tightening. This completes the
task by adapting to the residual errors in real time.
3) Real-world implementation of nut-tightening is a key
contribution of this work. To the best of my knowledge,
this is the first real-world execution of such a contact-
intensive tight-tolerance task (nut tightening) over large
position and orientation errors. The robustness and effec-
tiveness were validated through real-world experiments.
II. RELATED WORKS
A. Contact information based pose estimation
In an earlier study [6], pose uncertainty in SE(2) was
estimated by matching the contact configuration space (C-
space) with a pre-acquired C-space, but this method is com-
putationally demanding to calculate the likelihood for the
complex shape of objects. More recent studies, such as the
memory unscented particle filter proposed in [7], aim to
localize more complex-shaped objects in SE(3) but multiple
tactile sensors are required and it is costly. While F/T sensors
have been used instead of tactile sensors in [8, 9], these
works focus on objects with simple shapes or require object-
specific motions, limiting generalization with complex shapes.
To overcome these issues, data-driven methods have been
proposed to address complex contacts that are difficult to
model while maintaining low computation costs. In [10], the
contact pattern generated by a tilt-then-rotate motion is trained,
and the misalignment direction is classified. However, this
method classifies the discretized misaligned directions with
only position uncertainties. Recently, [11, 12, 13] updates the
estimation filter for complex shapes by several discontinuous
poking or touching. These methods require full geometry
information about the shape, involve high computational over-
head, and are only applicable to objects with distinguishable
keypoints in their shape.
B. RL-based assembly tasks
Reinforcement learning (RL) has been widely employed
to address contact-intensive tasks to handle complex contact
behaviors. The most popular approach is end-to-end residual
learning of a control input to the position-based nominal
trajectory (e.g., learning a model-free residual policy [14, 15]
and optimizing the force-control parameters as the residual
control input [16]). A fixed nominal trajectory limits the
range of adaptable uncertainty. In [17], an RL controller is
trained to compute the desired force and orientation of a
hybrid position/force low-level controller for the peg-in-hole
Fig. 2: The experiment environment setup consists of a Franka
Emika Panda robotic manipulator, ATI Gamma FT sensor,
HEBI X-series actuator, universal vice, nut, and bolt.
task. Another approach proposes a distributed RL agent, RD2,
which employs a long short-term memory (LSTM) structure to
use only the force/torque as input [18]. The common limitation
is that they only address relatively simple insertion problems
of objects with simple shapes. [19] develops RL-based nut
fastening with complex shapes through theira simulator [20],
but it has not been verified in real-world environments. Our
recent work [21] proposes a high-level RL-based controller on
top of a low-level linear quadratic tracking (LQT) controller
for the bolting task, and we extend the uncertainty range with
novel approach in this paper. Furthermore, the limitation of
all existing studies is that the policy is trained with end-to-
end RL, which has low reliability and data efficiency.
III. PRELIMINARIES
A. System Description
In this subsection, we describe the system setup of the
task, on which our proposed framework is implemented.
We construct the simulation and experimental setup with a
robotic manipulator (Franka Emika Panda), an FT sensor
(ATI gamma SI-65-6) to measure the 6-DOF contact wrench,
and a HEBI X-series gripper capable of infinite rotation for
rotational assembly tasks, as shown in Fig. 2. A manipulating
object (e.g., nut) with the position pt∈R3and orientation
Rt∈SO(3) is rigidly attached to the HEBI gripper, and a
fixed target object (e.g., bolt) with the position ptar
t∈R3
and orientation Rtar
t∈SO(3) is installed in the environment,
where ⋆trepresents a variable at time t. Motion planning and
low-level control of the manipulating object are implemented
in the 6-DOF Cartesian space. The low-level controller is an
admittance controller with the reference manipulating object
dynamics given as
Mt¨et+Bt˙et+Ktet=Fc
t(1)
where et= [ep
t, eR
t]T∈R6is the error vector, with the linear
position error ep
t=pref
t−pt∈R3and the orientation error as
geometric error eR
t=1
2(RT
tRref
t−Rref
t
TRt)∨. Here, pref
t∈
R3is the reference position, Rref
t∈SO(3) is the reference