A Clustering Algorithm for Correlation Quickest Hub Discovery Mixing Time Evolution and Random Matrix Theory

2025-04-28 0 0 380.48KB 8 页 10玖币

侵权投诉

A Clustering Algorithm for Correlation Quickest

Hub Discovery Mixing Time Evolution and

Random Matrix Theory

1st Alejandro Rodr´

ıguez Dom´

ınguez

Miralta Finance Bank S.A., Spain

arodriguez@miraltabank.com

2nd David Stynes

Munster Technological University, Ireland

david.stynes@mtu.ie

Abstract—We present a geometric version of Quickest Change

Detection (QCD) and Quickest Hub Discovery (QHD) tests in

correlation structures that allows us to include and combine

new information with distance metrics. The topic falls within

the scope of sequential, nonparametric, high-dimensional QCD

and QHD, from which state-of-the-art settings developed global

and local summary statistics from asymptotic Random Matrix

Theory (RMT) to detect changes in random matrix law. These

settings work only for uncorrelated pre-change variables. With

our geometric version of the tests via clustering, we can test

the hypothesis that we can improve state-of-the-art settings for

QHD, by combining QCD and QHD simultaneously, as well as

including information about pre-change time-evolution in corre-

lations. We can work with correlated pre-change variables and

test if the time-evolution of correlation improves performance.

We prove test consistency and design test hypothesis based on

clustering performance. We apply this solution to ﬁnancial time

series correlations. Future developments on this topic are highly

relevant in ﬁnance for Risk Management, Portfolio Management,

and Market Shocks Forecasting which can save billions of

dollars for the global economy. We introduce the Diversiﬁcation

Measure Distribution (DMD) for modeling the time-evolution of

correlations as a function of individual variables which consists

of a Dirichlet-Multinomial distribution from a distance matrix

of rolling correlations with a threshold. Finally, we are able to

verify all these hypotheses.

Index Terms—Clustering, Correlation, Distribution functions,

Financial, Graphs and networks, Quickest change detection,

Quickest hub discovery, Risk management, Sequential analysis

I. INTRODUCTION

Financial assets portfolios are combinations of individual

assets (Stocks, Bonds, Commodities, etc) that, when combined

can beneﬁt from risk diversiﬁcation, and correlation struc-

tures play a crucial role in this, as seen in H. Markowitz

[1]. Therefore, knowing in advance the future behavior of

correlation matrices is key for the risk management of port-

folios. We focus on Quickest Change Detection (QCD) on

correlation structures and Quickest Hub Discovery (QHD),

variables changing their correlation. CPD and QHD allow

measuring changes in the behavior of correlations to antic-

ipate their future behavior. They also are risk management

contingency measures to reallocate portfolios for better diver-

siﬁcation and expected risk-adjusted returns when correlations

change thereby improving the portfolio allocation process.

Also, changes in many elements of the correlation matrix all

at once are related to ﬁnancial market shocks as can be seen in

L. S. Junior and I. D. P. Franca [2], therefore, CPD and QHD

in correlations are important too for economic and ﬁnancial

shock detection, which helps to preserve wealth in ﬁnancial

markets and the economy. As can be seen in A.G Tartakovsky

[3], the existing literature in CPD/QCD focuses on a setup

where:

•The pre- and post- means and covariance matrices are

unknown.

•It does not allow for high-dimensional settings where

p >> n, p variables, and n timestamps.

•It assumes variables are i.i.d, a critical problem for de-

tecting changes in correlations that assumes dependence.

As mentioned in T. Banerjee and A. Hero [4], to tackle

this intractability, the area called sequential analysis in the

literature focus on sub-optimal statistical tests with thresholds

that improve performance and help with the test design. From

the existing literature, the only optimal test designed for corre-

lations and that can tackle most of the issues described, comes

from work developed over the last decade by T. Banerjee et

al. [5], T. Banerjee and A. Hero [4], and A. Hero and B.

Rajaratnam [6], [7]. The objective is to detect a change in the

distribution of the data matrices as quickly as possible subject

to a constraint in the false alarm rate. In this line, the authors

obtain an asymptotic distribution for a global and a family of

local summary statistics that measures a change in the law of

the Random Matrix as a way to detect the change points and

isolated hubs [4].

The key limitation of [4], [5] is that they assume variables

are uncorrelated before the change and correlated after. We

approach the QCD/QHD test from a geometric point of view

via clustering techniques. We design the geometric version

of the test and prove consistency. This allows us to combine

and incorporate new information into the test by deﬁning

different distance metrics. We design the test hypothesis for

our geometric version via clustering performance. This allows

us to compare state-of-the-art QHD test performance with

arXiv:2210.03988v1 [q-fin.ST] 8 Oct 2022

a version including information about QCD to see if we

can improve QHD performance. We also test the hypothesis

that we can improve QHD test performance by including

the pre-change time-evolution in correlation with two goals:

to tackle the state-of-the-art limitation of uncorrelated pre-

change variables and to see if time-evolution in correlation

can improve QHD and QHD+QCD performances.

II. RELATED LITERATURE

In S. Aminikhanghahi and D. Cook [8], we can see there

are multiple methods and approaches for Time Series CPD.

Authors mention supervised methods, such as Decision Trees,

Naive Bayes, Bayesian Net, SVM, Nearest Neighbor, Hidden

Markov Models, and Gaussian Mixture Models. Another group

includes unsupervised methods, like Likelihood Ratio Meth-

ods(LR), Subspace Models Methods, Probabilistic Methods,

Kernel-Based Methods, Graph-Based Methods, and Clustering

Methods. Supervised, Probabilistic, and Clustering methods do

not require extra data apart from the ﬁtting window. Whereas

LR, Subspace model, Kernel-based methods require to include

post-change data [8]. All these methods are focused on uni-

variate time series, but our focus is on CPD in correlation

structures.

From CPD literature, sequential analysis solutions are the

most suited for our problem. QCD tries to identify the closest

change point in a sequential setting. For detection to be quick,

we need high-dimensional settings with p >> n, with n

timestamps and p variables. In A. G. Tartakovsky [9] we ﬁnd a

detailed description of sequential multi-decision for CPD and

QCD. Changes in statistical properties of distributions from

previously identical populations distributions are monitored

to detect CPD, subject to lower levels of false alarms and

delays. Methodologies rely on two standardized methods and

their variants, the Page Cumulative SUM (CUSUM), and

the Shiryaev-Roberts. Both had been introduced by the same

author in A. G. Tartakovsky [9]. The problem with this setup is

that [4], stream independence is assumed but we want to detect

changes in the level of streams’ dependence, and the setting is

not high-dimensional. Alternatives that can tackle dependence

in the sequential analysis literature are based on a sub-optimal

test, which provides a performance analysis of the test which

is then used to design the test by choosing thresholds. The

efﬁciency of these tests is veriﬁed by simulations [4]. However,

these solutions are nonoptimal, not high-dimensional, and not

focused on correlation structures. To cope with these three

aspects, we need to focus on work developed by T. Banerjee

and A. Hero [5], T. Banerjee et al. [4], and A. Hero and B.

Rajaratnam [6], [7].

As a preamble, V. Veeravalli and T.Banerjee [10] focused

ﬁrst on CPD in a parametric setting, on Bayesian CPD trying

to minimize Average Detection Delay (ADD) subject to a

constraint in the Probability of False Alarm (PFA). They

also focused on a second solution, Minimax CPD, in which

Lorden´s test from G. Lorden et al. [11] is applied. Lorden

developed the ﬁrst minimax theory for delays in CPD, ”in

which he proposed a measure of detection delay obtained

by taking the supremum (over all possible change points)

of a worst-case delay over all possible realizations of the

observations, conditioned on the change point” [10]. Lorden’s

test is important to understand their posterior work.

In [6], a discovery is a correlation above a threshold, they

derive an asymptotic expression for the mean number of

discoveries that is a function of the number of samples. It is

shown that the mean number of discoveries is inﬂuenced by the

population covariance matrix by the Bhattacharyya measure

of the average pairwise dependency of the p-multivariate U-

scores deﬁned on the (n-2)-dimensional hypersphere [6]. Un-

der weak dependency assumptions, the number of discoveries

is asymptotically represented by a Poisson distribution. For

auto-correlation and cross-correlation discoveries this Poisson

distribution is measured by the number of positive vertex

degrees in the associated sample correlation graph [6]. In [7],

they focus on hub discovery in partial correlation graphs, and

an extension for variables with a speciﬁc degree of connectiv-

ity. A hub is deﬁned broadly as any variable that is correlated

with at least δother variables having a magnitude correlation

exceeding ρ. Their setup is the ﬁrst high-dimensional of its

kind. They show that the count Nδ,ρpof the number of groups

of δmutually coincident edges in the correlation graph (and

partial) with correlation threshold ρconverges to a Poisson

variable:

P(Nδ,ρp>0) →exp(−Λ/ϕ(δ)) (1)

This had implications for future work developed in [4], [5].

In [5], authors introduce a nonparametric QCD test for large-

scale random matrices based on a global summary statistic

from asymptotic properties of RMT. It is assumed pre- and

post- change distributions of the i.i.d random matrices rows

are unknown or belong to an elliptically contoured family [5].

If pre- and post- change densities f0

Xand f1

Xare known,

and the mean µmis constant before and after the change,

algorithms such as Cumulative Sum (CumSum) or Shiryaev-

Roberts (SR) can be efﬁciently used as both have optimal

properties respect to Lorden formulations. In this case, it

is a parametric CPD problem, and asymptotically optimally

solved by Generalized Likelihood Ratio (GLR) tests. For [5],

pre- and post- change densities are unknown and present an

optimal nonparametric solution, as an asymptotically optimal

solution to the minimax CPD in the random matrix setup using

large-scale RMT. The framework in [5] is suited for high-

dimensional settings. Therefore, a summary to justify why we

focus on nonparametric methods with RMT like [4], [5]:

•Pre- and post- change densities f0

Xand f1

Xare NOT

known, and µmis NOT constant before and after the

change.

•Need to focus on high-dimensional settings p >> n for

QCD and QHD in correlation structures.

•Other CPD/QHD methods are rejected too because cannot

tackle dependence.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

AClusteringAlgorithmforCorrelationQuickestHubDiscoveryMixingTimeEvolutionandRandomMatrixTheory1stAlejandroRodr´guezDom´nguezMiraltaFinanceBankS.A.,Spainarodriguez@miraltabank.com2ndDavidStynesMunsterTechnologicalUniversity,Irelanddavid.stynes@mtu.ieAbstractWepresentageometricversionofQuickestChan...

展开>> 收起<<

A Clustering Algorithm for Correlation Quickest Hub Discovery Mixing Time Evolution and Random Matrix Theory.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

A Clustering Algorithm for Correlation Quickest Hub Discovery Mixing Time Evolution and Random Matrix Theory

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: