Exact first moments of the RV coefficient by invariant orthogonal integration_2

2025-04-27 0 0 946.69KB 14 页 10玖币

侵权投诉

EXACT FIRST MOMENTS OF THE RV COEFFICIENT BY

INVARIANT ORTHOGONAL INTEGRATION

François Bavaud

University of Lausanne, Switzerland

fbavaud@unil.ch

October 4, 2022

ABSTRACT

The RV coefﬁcient measures the similarity between two multivariate conﬁgurations, and its

signiﬁcance testing has attracted various proposals in the last decades. We present a new

approach, the invariant orthogonal integration, permitting to obtain the exact ﬁrst four moments

of the RV coefﬁcient under the null hypothesis. It consists in averaging along the Haar measure

the respective orientations of the two conﬁgurations, and can be applied to any multivariate

setting endowed with Euclidean distances between the observations. Our proposal also covers the

weighted setting of observations of unequal importance, where the exchangeability assumption,

justifying the usual permutation tests, breaks down.

The proposed RV moments express as simple functions of the kernel eigenvalues occurring in the

weighted multidimensional scaling of the two conﬁgurations. The expressions for the third and

fourth moments seem original. The ﬁrst three moments can be obtained by elementary means,

but computing the fourth moment requires a more sophisticated apparatus, the Weingarten

calculus for orthogonal groups. The central role of standard kernels and their spectral moments

is emphasized.

Keywords

RV coefﬁcient

weighted multidimensional scaling

spectral moments

invariant orthogonal

integration ·Weingarten calculus

1 Introduction

The RV coefﬁcient is a well-known measure of similarity between two datasets, each consisting of multivariate

proﬁles measured on the same

observations or objects. This contribution proposes a new approach, the invariant

orthogonal integration, permitting to obtain the exact ﬁrst four moments of the RV coefﬁcient under the null

hypothesis of absence of relation between the two datasets. The main results, theorem 1 and corollary 1, are

exposed in section 3.1. The approach is fully nonparametric, and allows the handling of weighted objets, typically

made of aggregates such as regions, documents or species, which abound in multivariate analysis.

In the present distance-based data-analytic approach, data sets are constituted by weighted conﬁgurations speciﬁed

by the object weights together with their pair dissimilarities, assumed to be squared Euclidean. Factorial

coordinates, reproducing the dissimilarities, and permitting a maximum compression of the conﬁguration inertia,

obtain by weighted multidimensional scaling. The latter, seldom exposed in the literature and hence brieﬂy

recalled in section 2.1, is a direct generalization of classical scaling. The central step is provided by the spectral

decomposition of the matrix of weighted centered scalar products or kernel. It permits to decompose the spectral

eigenspace into a trivial one-dimensional part, determined by the object weights, common to both conﬁgurations,

and a non-trivial part of dimension

n−1

, orthogonal to the square root of the weights. The weighted RV coefﬁcient

obtains as the normalized scalar product between the kernels of the two conﬁgurations (section 2.2), and turns out

to be equivalent to its original deﬁnition expressed by cross-covariances (Escouﬁer, 1973; Robert and Escouﬁer,

1976).

arXiv:2210.00639v1 [math.ST] 2 Oct 2022

Exact ﬁrst moments of the RV coefﬁcient by invariant orthogonal integration

After recalling the above preliminaries, somewhat lengthy but necessary, the heart of this contribution can be

uncovered: invariant orthogonal integration consists in computing the expected null moments of the RV coefﬁcient

by averaging, along the invariant Haar orthogonal measure in the non-trivial eigenspace, the orientations of

one conﬁguration with respect to the other, by orthogonal transformation of, say, the ﬁrst eigenspace (section

3.2). It constitutes a distinct alternative, with different outcomes, to the traditional permutation approach, whose

exchangeability assumption breaks down for weighted objects: typically, the proﬁle dispersion is expected to be

larger for lighter objects (Bavaud, 2013) and the

object scores cannot follow the same distribution. The present

approach also yields a novel signiﬁcance test for the RV coefﬁcient (equation 16), taking into account skewness

and kurtosis corrections to the usual normal approximation.

Computing the moments of the RV coefﬁcient requires to evaluate the orthogonal coefﬁcients (23) constituted by

Haar expectations of orthogonal monomials. Low-order moments can be computed, with increasing difﬁculty, by

elementary means (section 3.3), but the fourth-order moment requires a more systematic approach (section 3.6),

provided by the Weingarten calculus developed by workers in random matrix theory and free probability. Both

procedures yield the same results for low-order moments (section 3.7), which is both expected and reassuring.

The ﬁrst RV moment (11) coincides with all known proposals. The second centered RV moment (12) is simpler

than its permutation analog, and underlines the effective dimensionality of a conﬁguration. The third centered RV

moment (13) is particularly enlightening: the RV skewness is simply proportional to the product of the spectral

skewness of both conﬁgurations, thus elucidating the often noticed positive skewness of the RV coefﬁcient. The

expression for the fourth centered RV moment (9), (14) is also simple to express and to compute, yet more difﬁcult

to interpret.

2 Euclidean conﬁgurations in a weighted setting: a concise remainder

2.1 Weighted multidimensional scaling and standard kernels

Consider

objects endowed with positive weights

fi>0

with

i=1 fi= 1

, as well with pairwise dissimilarities

D= (Dij )

between pairs of objects. The

n×n

matrix

is assumed to be squared Euclidean, that is of the form

Dij =kxi−xjk2

for

xi,xj∈Rr

, with

r≤n−1

. The pair

(f,D)

constitutes a weighted conﬁguration, with

fi= 1/n for unweighted conﬁgurations.

Weighted multidimensional scaling aims at determining object coordinates

X= (xiα)∈Rn×r

reproducing the

dissimilarities

while expressing a maximum amount of dispersion or inertia

∆

(3) in low dimensions. It is

performed by the following weighted generalization of the well-known Torgerson–Gower scaling procedure (see

e.g. Borg and Groenen, 2005): ﬁrst, deﬁne

Π=diag(f)

, as well as the weighted centering matrix

H=In−1nf>

which obeys H2=H. However, H>6=H, unless fis uniform.

Second, compute the matrix

of scalar products by double centering:

B=−1

2H D H>

. Third, deﬁne the

n×n

kernel Kas the matrix of weighted scalar products :

K=√Π B√Π,that is Kij =pfifjBij .

Fourth, perform the spectral decomposition with ˆ

Uorthogonal and ˆ

Λdiagonal

K=ˆ

Uˆ

Λˆ

U>ˆ

Uˆ

U>=ˆ

U>ˆ

U=Inˆ

Λ=diag(λ).(1)

By construction,

possesses one trivial eigenvalue

λ0= 0

associated to the eigenvector

√f

and

n−1

non-

negative eigenvalues decreasingly ordered as

λ1≥λ2≥. . . ≥λn−1≥0

, among which

r=rg(K)

are strictly

positive.

From now on the trivial eigenspace will be discarded: set

U= (√f|U)

, where

U∈Rn×(n−1)

and

Λ=

diag(λ1, . . . , λn−1). Direct substitution from (1) yields

K=UΛU>UU>=In−√f√f>U>U=In−1U>√f=0n.(2)

Finally, the searched for coordinates obtain as

X=Π−1

2UΛ 1

, that is

xiα =uiα√λα/√fi

. One veriﬁes easily

that

Dij =

n−1

α=1

(xiα −xjα)2∆ = 1

i,j=1

fifjDij =Tr(K) =

n−1

α=1

λα.(3)

Exact ﬁrst moments of the RV coefﬁcient by invariant orthogonal integration

Figure 1: Two weighted conﬁgurations (f,DX)(left) and (f,DY)(right) embedded in Rn−1

The kernels considered here are positive semi-deﬁnite and obey in addition

K√f=0n

. We call them standard

kernels. They can be related to the weighted version of centered kernels of Machine Learning (see e.g. Cortes

et al., 2012). To each weighted conﬁguration (f,D)corresponds a unique standard kernel K, and conversely.

The matrix

K0=In−√f√f>

appearing in (2) constitutes a standard kernel, referred to as the neutral kernel

in view of property

K0K=K0K=K

for any standard kernel

. The corresponding dissimilarities are the

weighted discrete distances

ij =(1

fi+1

fjfor i6=j

0otherwise.

2.2 The RV coefﬁcient

Consider two weighted conﬁgurations

(f,DX)

and

(f,DY)

endowed with the same weights

, or equivalently two

standard kernels

and

(Figure 1). Their similarity can be measured by the weighted RV coefﬁcient deﬁned

RV =RVXY =Tr(KXKY)

pTr(K2

X)Tr(K2

Y)(4)

which constitutes the cosine similarity between the vectorized matrices

and

. As a consequence,

RVXY ≥0

(since KXand KYare positive semi-deﬁnite), RVXY ≤1(by the Cauchy-Schwarz inequality) and RVXX = 1.

Quantity (4) is a straightforward weighted generalization of the RV coefﬁcient (Escouﬁer, 1973; Robert and

Escouﬁer, 1976): consider multivariate features

X∈Rn×p

and

Y∈Rn×q

, directly entering into the deﬁnition

and

as coordinates, or equivalently as

KX=√ΠXcX>

c√Π

and

KY=√ΠYcY>

c√Π

, where

Xc=HX and Yc=HY are the centered scores.

The weighted covariances are

ΣXX =X>

cΠXc

and

ΣY Y =Y>

cΠYc

. The cross-covariances are

ΣXY =

cΠYcand ΣY X =Y>

cΠXc=Σ>

XY . The original RV coefﬁcient is deﬁned in the feature space as

RVXY =Tr(ΣXY ΣY X )

pTr(Σ2

XX )Tr(Σ2

Y Y ).(5)

Proving the identity of (4) and (5) is easy.

3 Computing the moments of the RV coefﬁcient by invariant orthogonal integration

3.1 Main result and signiﬁcance testing

Deﬁne the CV coefﬁcient by the quantity CV =Tr(KXKY).

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EXACTFIRSTMOMENTSOFTHERVCOEFFICIENTBYINVARIANTORTHOGONALINTEGRATIONFrançoisBavaudUniversityofLausanne,Switzerlandfbavaud@unil.chOctober4,2022ABSTRACTTheRVcoefcientmeasuresthesimilaritybetweentwomultivariatecongurations,anditssignicancetestinghasattractedvariousproposalsinthelastdecades.Wepresenta...

展开>> 收起<<

Exact first moments of the RV coefficient by invariant orthogonal integration_2.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Exact first moments of the RV coefficient by invariant orthogonal integration_2

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: