A Clustering Algorithm for Correlation Quickest Hub Discovery Mixing Time Evolution and Random Matrix Theory

2025-04-28 0 0 380.48KB 8 页 10玖币
侵权投诉
A Clustering Algorithm for Correlation Quickest
Hub Discovery Mixing Time Evolution and
Random Matrix Theory
1st Alejandro Rodr´
ıguez Dom´
ınguez
Miralta Finance Bank S.A., Spain
arodriguez@miraltabank.com
2nd David Stynes
Munster Technological University, Ireland
david.stynes@mtu.ie
Abstract—We present a geometric version of Quickest Change
Detection (QCD) and Quickest Hub Discovery (QHD) tests in
correlation structures that allows us to include and combine
new information with distance metrics. The topic falls within
the scope of sequential, nonparametric, high-dimensional QCD
and QHD, from which state-of-the-art settings developed global
and local summary statistics from asymptotic Random Matrix
Theory (RMT) to detect changes in random matrix law. These
settings work only for uncorrelated pre-change variables. With
our geometric version of the tests via clustering, we can test
the hypothesis that we can improve state-of-the-art settings for
QHD, by combining QCD and QHD simultaneously, as well as
including information about pre-change time-evolution in corre-
lations. We can work with correlated pre-change variables and
test if the time-evolution of correlation improves performance.
We prove test consistency and design test hypothesis based on
clustering performance. We apply this solution to financial time
series correlations. Future developments on this topic are highly
relevant in finance for Risk Management, Portfolio Management,
and Market Shocks Forecasting which can save billions of
dollars for the global economy. We introduce the Diversification
Measure Distribution (DMD) for modeling the time-evolution of
correlations as a function of individual variables which consists
of a Dirichlet-Multinomial distribution from a distance matrix
of rolling correlations with a threshold. Finally, we are able to
verify all these hypotheses.
Index Terms—Clustering, Correlation, Distribution functions,
Financial, Graphs and networks, Quickest change detection,
Quickest hub discovery, Risk management, Sequential analysis
I. INTRODUCTION
Financial assets portfolios are combinations of individual
assets (Stocks, Bonds, Commodities, etc) that, when combined
can benefit from risk diversification, and correlation struc-
tures play a crucial role in this, as seen in H. Markowitz
[1]. Therefore, knowing in advance the future behavior of
correlation matrices is key for the risk management of port-
folios. We focus on Quickest Change Detection (QCD) on
correlation structures and Quickest Hub Discovery (QHD),
variables changing their correlation. CPD and QHD allow
measuring changes in the behavior of correlations to antic-
ipate their future behavior. They also are risk management
contingency measures to reallocate portfolios for better diver-
sification and expected risk-adjusted returns when correlations
change thereby improving the portfolio allocation process.
Also, changes in many elements of the correlation matrix all
at once are related to financial market shocks as can be seen in
L. S. Junior and I. D. P. Franca [2], therefore, CPD and QHD
in correlations are important too for economic and financial
shock detection, which helps to preserve wealth in financial
markets and the economy. As can be seen in A.G Tartakovsky
[3], the existing literature in CPD/QCD focuses on a setup
where:
The pre- and post- means and covariance matrices are
unknown.
It does not allow for high-dimensional settings where
p >> n, p variables, and n timestamps.
It assumes variables are i.i.d, a critical problem for de-
tecting changes in correlations that assumes dependence.
As mentioned in T. Banerjee and A. Hero [4], to tackle
this intractability, the area called sequential analysis in the
literature focus on sub-optimal statistical tests with thresholds
that improve performance and help with the test design. From
the existing literature, the only optimal test designed for corre-
lations and that can tackle most of the issues described, comes
from work developed over the last decade by T. Banerjee et
al. [5], T. Banerjee and A. Hero [4], and A. Hero and B.
Rajaratnam [6], [7]. The objective is to detect a change in the
distribution of the data matrices as quickly as possible subject
to a constraint in the false alarm rate. In this line, the authors
obtain an asymptotic distribution for a global and a family of
local summary statistics that measures a change in the law of
the Random Matrix as a way to detect the change points and
isolated hubs [4].
The key limitation of [4], [5] is that they assume variables
are uncorrelated before the change and correlated after. We
approach the QCD/QHD test from a geometric point of view
via clustering techniques. We design the geometric version
of the test and prove consistency. This allows us to combine
and incorporate new information into the test by defining
different distance metrics. We design the test hypothesis for
our geometric version via clustering performance. This allows
us to compare state-of-the-art QHD test performance with
arXiv:2210.03988v1 [q-fin.ST] 8 Oct 2022
a version including information about QCD to see if we
can improve QHD performance. We also test the hypothesis
that we can improve QHD test performance by including
the pre-change time-evolution in correlation with two goals:
to tackle the state-of-the-art limitation of uncorrelated pre-
change variables and to see if time-evolution in correlation
can improve QHD and QHD+QCD performances.
II. RELATED LITERATURE
In S. Aminikhanghahi and D. Cook [8], we can see there
are multiple methods and approaches for Time Series CPD.
Authors mention supervised methods, such as Decision Trees,
Naive Bayes, Bayesian Net, SVM, Nearest Neighbor, Hidden
Markov Models, and Gaussian Mixture Models. Another group
includes unsupervised methods, like Likelihood Ratio Meth-
ods(LR), Subspace Models Methods, Probabilistic Methods,
Kernel-Based Methods, Graph-Based Methods, and Clustering
Methods. Supervised, Probabilistic, and Clustering methods do
not require extra data apart from the fitting window. Whereas
LR, Subspace model, Kernel-based methods require to include
post-change data [8]. All these methods are focused on uni-
variate time series, but our focus is on CPD in correlation
structures.
From CPD literature, sequential analysis solutions are the
most suited for our problem. QCD tries to identify the closest
change point in a sequential setting. For detection to be quick,
we need high-dimensional settings with p >> n, with n
timestamps and p variables. In A. G. Tartakovsky [9] we find a
detailed description of sequential multi-decision for CPD and
QCD. Changes in statistical properties of distributions from
previously identical populations distributions are monitored
to detect CPD, subject to lower levels of false alarms and
delays. Methodologies rely on two standardized methods and
their variants, the Page Cumulative SUM (CUSUM), and
the Shiryaev-Roberts. Both had been introduced by the same
author in A. G. Tartakovsky [9]. The problem with this setup is
that [4], stream independence is assumed but we want to detect
changes in the level of streams’ dependence, and the setting is
not high-dimensional. Alternatives that can tackle dependence
in the sequential analysis literature are based on a sub-optimal
test, which provides a performance analysis of the test which
is then used to design the test by choosing thresholds. The
efficiency of these tests is verified by simulations [4]. However,
these solutions are nonoptimal, not high-dimensional, and not
focused on correlation structures. To cope with these three
aspects, we need to focus on work developed by T. Banerjee
and A. Hero [5], T. Banerjee et al. [4], and A. Hero and B.
Rajaratnam [6], [7].
As a preamble, V. Veeravalli and T.Banerjee [10] focused
first on CPD in a parametric setting, on Bayesian CPD trying
to minimize Average Detection Delay (ADD) subject to a
constraint in the Probability of False Alarm (PFA). They
also focused on a second solution, Minimax CPD, in which
Lorden´s test from G. Lorden et al. [11] is applied. Lorden
developed the first minimax theory for delays in CPD, ”in
which he proposed a measure of detection delay obtained
by taking the supremum (over all possible change points)
of a worst-case delay over all possible realizations of the
observations, conditioned on the change point” [10]. Lorden’s
test is important to understand their posterior work.
In [6], a discovery is a correlation above a threshold, they
derive an asymptotic expression for the mean number of
discoveries that is a function of the number of samples. It is
shown that the mean number of discoveries is influenced by the
population covariance matrix by the Bhattacharyya measure
of the average pairwise dependency of the p-multivariate U-
scores defined on the (n-2)-dimensional hypersphere [6]. Un-
der weak dependency assumptions, the number of discoveries
is asymptotically represented by a Poisson distribution. For
auto-correlation and cross-correlation discoveries this Poisson
distribution is measured by the number of positive vertex
degrees in the associated sample correlation graph [6]. In [7],
they focus on hub discovery in partial correlation graphs, and
an extension for variables with a specific degree of connectiv-
ity. A hub is defined broadly as any variable that is correlated
with at least δother variables having a magnitude correlation
exceeding ρ. Their setup is the first high-dimensional of its
kind. They show that the count Nδ,ρpof the number of groups
of δmutually coincident edges in the correlation graph (and
partial) with correlation threshold ρconverges to a Poisson
variable:
P(Nδ,ρp>0) exp(Λ(δ)) (1)
This had implications for future work developed in [4], [5].
In [5], authors introduce a nonparametric QCD test for large-
scale random matrices based on a global summary statistic
from asymptotic properties of RMT. It is assumed pre- and
post- change distributions of the i.i.d random matrices rows
are unknown or belong to an elliptically contoured family [5].
If pre- and post- change densities f0
Xand f1
Xare known,
and the mean µmis constant before and after the change,
algorithms such as Cumulative Sum (CumSum) or Shiryaev-
Roberts (SR) can be efficiently used as both have optimal
properties respect to Lorden formulations. In this case, it
is a parametric CPD problem, and asymptotically optimally
solved by Generalized Likelihood Ratio (GLR) tests. For [5],
pre- and post- change densities are unknown and present an
optimal nonparametric solution, as an asymptotically optimal
solution to the minimax CPD in the random matrix setup using
large-scale RMT. The framework in [5] is suited for high-
dimensional settings. Therefore, a summary to justify why we
focus on nonparametric methods with RMT like [4], [5]:
Pre- and post- change densities f0
Xand f1
Xare NOT
known, and µmis NOT constant before and after the
change.
Need to focus on high-dimensional settings p >> n for
QCD and QHD in correlation structures.
Other CPD/QHD methods are rejected too because cannot
tackle dependence.
摘要:

AClusteringAlgorithmforCorrelationQuickestHubDiscoveryMixingTimeEvolutionandRandomMatrixTheory1stAlejandroRodr´guezDom´nguezMiraltaFinanceBankS.A.,Spainarodriguez@miraltabank.com2ndDavidStynesMunsterTechnologicalUniversity,Irelanddavid.stynes@mtu.ieAbstract—WepresentageometricversionofQuickestChan...

展开>> 收起<<
A Clustering Algorithm for Correlation Quickest Hub Discovery Mixing Time Evolution and Random Matrix Theory.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:380.48KB 格式:PDF 时间:2025-04-28

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注