the CNS activates muscles in groups to decrease the complexity required to control each individual
muscle [d’Avella et al., 2003, Ting and McKay, 2007]. According to muscle synergy theory, the
CNS produces a small number of signals. The combinations of these signals are distributed to the
muscles [Wojtara et al., 2014]. Muscle synergy is the term for the coordination of muscles that
activate at the same time [Ferrante et al., 2016]. A synergy can include multiple muscles, and a
muscle can belong to multiple synergies. Synergies produce complicated activation patterns for a set
of muscles during the performance of a task, which is commonly measured using electromyography
(EMG) [Tresch et al., 2002, Singh et al., 2018]. EMG signals are typically recorded as a matrix with
a column for activation signals for a moment and a row for activation of a muscle [Rabbi et al., 2020].
Factorisation methods on the matrix are used to extract muscle synergies from muscle activation pat-
terns. Four most commonly used factorization methods are non-negative matrix factorisation [Steele
et al., 2015, Schwartz et al., 2016, Lee and Seung, 1999, Rozumalski et al., 2017, Shuman et al.,
2016, Saito et al., 2018] , principal component analysis [Ting and Macpherson, 2005, Ting et al.,
2015, Danion and Latash, 2010, Falaki et al., 2017], independent component analysis [Hyvärinen and
Oja, 2000, Hart and Giszter, 2013], and factor analysis [Kieliba et al., 2018, Saito et al., 2015].
In the field of robot control, only a few works [Palli et al., 2014, Wimböck et al., 2011, Ficuciello
et al., 2016] have exploited the idea of muscle synergy for dimensionality reduction to simplify the
control. However, these works usually first use motion dataset from humans to obtain the synergy
space and then learn to control in this synergy space. In contrast, our work learns the synergy space
simultaneously with the control policy in the synergy space.
Affinity propagation
[Frey and Dueck, 2007] is a clustering algorithm based on multi-round message
passing between input data points. It does not need to pre-define the number of clusters and proceeds
by finding each instance an exemplar. Data points that choose the same exemplar belongs to the same
cluster.
Suppose
{xi}n
i=1
is a set of data points. Define
S∈Rn×n
as a similarity matrix. When
i6=j
, the
element
si,j
at
i
th row and
j
th column is the similarity between
xi
and
xj
, which can be measured as,
for example, the negative squared distance of two data points. When
i=j
, the element
si,j
represents
how likely the corresponding instance is to become an exemplar. The vector of diagonal elements,
(s11, s22 , . . . , snn)
, is called preference. Non-diagonal elements in
S
constitute the affinity matrix.
The algorithm takes
S
as input and proceeds by updating two matrices: the responsibility matrix
R
whose values
ri,j
represent whether
xj
is well-suited to be the exemplar for
xi
; the availability
matrix
A
whose values
ai,j
quantify the appropriateness for
xi
picking
xj
as its exemplar [Frey and
Dueck, 2007]. These two matrices are initialized to be zeroes and can be regarded as log-probability
tables. The algorithm then alternatives between two message-passing steps. First, the responsibility
matrix is updated:
ri,j ←si,j −max
j06=j(ai,j0+si,j0).(1)
Then, the availability matrix is updated:
ai,j ←min 0, rj,j +X
i06∈{i,j}
max(0, ri0,j )for i6=j;aj,j ←X
i06=j
max(0, ri0,j ).(2)
Messages are passed until the clusters stabilize or the pre-determined number of iterations is reached.
Then the exemplar of iis arg maxjri,j +ai,j .
3 Method
In this section, we present our Synergy-Oriented LeARning (SOLAR) scheme that incorporates the
muscle synergy mechanism into modular reinforcement learning to reduce its learning complexity.
Our method has two major components. The first one is an unsupervised learning module that utilizes
the morphological structure and value information to discover the synergy hierarchy. The second
is a novel attention-based policy architecture that supports synergy-aware learning. Both of the
components are specially designed to enable the control of robots with different morphologies. We
first introduce the problem settings and then describe the details of the two components.
Problem settings.
We consider
N
robots, each with a unique morphology. Agent
n
contains
Kn
limb actuators that are connected together to constitute its overall morphological structure. Examples
of such robots that are studied in this paper include
Humanoid
++ and UNIMALs. At each discrete
3