
An action dimensionality extension (ADE) method is proposed in [37], which draws on the ideas of
latent space and hierarchical RL but works differently. ADE method first constructs a low-dimensional
action space according to the similarity between action components and trains a primitive agent effectively.
Then the agent is extended into high-dimensional original space and continues to be trained to obtain a better
performance on the task. ADE combines the advantages of higher data-efficiency in low-dimensional action
space and better performance in high-dimensional action space.
All the methods mentioned above can accelerate RL by optimizing the dimensionality of action and state
space. However, the policies are still globally connected. Specifically, the policy for each action component
is decided by all state components. Some irrelevant state components will damage the performance of
policies and also decrease the data-efficiency and convergence speed. If the dependence of action
components on state components is obtained, the state space could be divided into several sub-state spaces
and construct several local connection policies for each action component, which is beneficial to improve
data-efficiency and convergence speed.
1.3 Motivation and Contribution
Motivated by constructing controllers more efficiently through RL method, this paper first defines CS3
to judge the effect of an action component on a state component in complex tasks. Then a connection graph
is constructed based on CS3 to clarify the correlativity of action components with state components and
define the input of policies. LCRL method based on the connection graph is proposed to eliminate the
influence of irrelevant state components on the policies for action components, thus accelerating the training
process.
The main contributions of this paper are listed as follows. First, we propose LCRL method to accelerate
convergence and increase the final reward of RL algorithms. LCRL method is based on the definitions of
CS3 and the connection graph, which show the dependence of action components on state components.
Besides, LCRL method is implemented to construct a learning-based compliance controller of the peg-in-
hole assembly task, which causes lower force/moment and guarantees a more stable control process.
The rest of the paper is organized as follows. Section 2 introduces LCRL method. Section 3 develops
the control method of robotic peg-in-hole assembly using LCRL method. Sections 4 and 5 provide simulation
and experiment verifications. Section 6 summarizes the research work of this paper.
2. Local Connection Reinforcement Learning
2.1 Reinforcement Learning in Continuous Space
RL abstracts arbitrary control problems into Markov decision processes (MDP)
,
where is state space, is action space,
is the state transition function,
is the reward function, and γ∈[0,1) is the discount rate. Here we record state space
as a m-dimensional space and action space as a n-dimensional space. State
is a m-dimensional
vector
. Action
is a n-dimensional vector
. State transition
function is recorded as
, which gives the probability of next state
once current state
and action
are determined. State transition function is usually implicit in complex tasks. The targets of