
Exact first moments of the RV coefficient by invariant orthogonal integration
After recalling the above preliminaries, somewhat lengthy but necessary, the heart of this contribution can be
uncovered: invariant orthogonal integration consists in computing the expected null moments of the RV coefficient
by averaging, along the invariant Haar orthogonal measure in the non-trivial eigenspace, the orientations of
one configuration with respect to the other, by orthogonal transformation of, say, the first eigenspace (section
3.2). It constitutes a distinct alternative, with different outcomes, to the traditional permutation approach, whose
exchangeability assumption breaks down for weighted objects: typically, the profile dispersion is expected to be
larger for lighter objects (Bavaud, 2013) and the
n
object scores cannot follow the same distribution. The present
approach also yields a novel significance test for the RV coefficient (equation 16), taking into account skewness
and kurtosis corrections to the usual normal approximation.
Computing the moments of the RV coefficient requires to evaluate the orthogonal coefficients (23) constituted by
Haar expectations of orthogonal monomials. Low-order moments can be computed, with increasing difficulty, by
elementary means (section 3.3), but the fourth-order moment requires a more systematic approach (section 3.6),
provided by the Weingarten calculus developed by workers in random matrix theory and free probability. Both
procedures yield the same results for low-order moments (section 3.7), which is both expected and reassuring.
The first RV moment (11) coincides with all known proposals. The second centered RV moment (12) is simpler
than its permutation analog, and underlines the effective dimensionality of a configuration. The third centered RV
moment (13) is particularly enlightening: the RV skewness is simply proportional to the product of the spectral
skewness of both configurations, thus elucidating the often noticed positive skewness of the RV coefficient. The
expression for the fourth centered RV moment (9), (14) is also simple to express and to compute, yet more difficult
to interpret.
2 Euclidean configurations in a weighted setting: a concise remainder
2.1 Weighted multidimensional scaling and standard kernels
Consider
n
objects endowed with positive weights
fi>0
with
Pn
i=1 fi= 1
, as well with pairwise dissimilarities
D= (Dij )
between pairs of objects. The
n×n
matrix
D
is assumed to be squared Euclidean, that is of the form
Dij =kxi−xjk2
for
xi,xj∈Rr
, with
r≤n−1
. The pair
(f,D)
constitutes a weighted configuration, with
fi= 1/n for unweighted configurations.
Weighted multidimensional scaling aims at determining object coordinates
X= (xiα)∈Rn×r
reproducing the
dissimilarities
D
while expressing a maximum amount of dispersion or inertia
∆
(3) in low dimensions. It is
performed by the following weighted generalization of the well-known Torgerson–Gower scaling procedure (see
e.g. Borg and Groenen, 2005): first, define
Π=diag(f)
, as well as the weighted centering matrix
H=In−1nf>
,
which obeys H2=H. However, H>6=H, unless fis uniform.
Second, compute the matrix
B
of scalar products by double centering:
B=−1
2H D H>
. Third, define the
n×n
kernel Kas the matrix of weighted scalar products :
K=√Π B√Π,that is Kij =pfifjBij .
Fourth, perform the spectral decomposition with ˆ
Uorthogonal and ˆ
Λdiagonal
K=ˆ
Uˆ
Λˆ
U>ˆ
Uˆ
U>=ˆ
U>ˆ
U=Inˆ
Λ=diag(λ).(1)
By construction,
K
possesses one trivial eigenvalue
λ0= 0
associated to the eigenvector
√f
and
n−1
non-
negative eigenvalues decreasingly ordered as
λ1≥λ2≥. . . ≥λn−1≥0
, among which
r=rg(K)
are strictly
positive.
From now on the trivial eigenspace will be discarded: set
ˆ
U= (√f|U)
, where
U∈Rn×(n−1)
and
Λ=
diag(λ1, . . . , λn−1). Direct substitution from (1) yields
K=UΛU>UU>=In−√f√f>U>U=In−1U>√f=0n.(2)
Finally, the searched for coordinates obtain as
X=Π−1
2UΛ 1
2
, that is
xiα =uiα√λα/√fi
. One verifies easily
that
Dij =
n−1
X
α=1
(xiα −xjα)2∆ = 1
2
n
X
i,j=1
fifjDij =Tr(K) =
n−1
X
α=1
λα.(3)
2