A Simplied Algorithm for Identifying Abnormal Changes in Dynamic Networks Bouchaib Azamir1a Driss Bennis1band Bertrand Michel2c

2025-04-30 0 0 2.95MB 27 页 10玖币
侵权投诉
A Simplified Algorithm for Identifying Abnormal
Changes in Dynamic Networks
Bouchaib Azamir1,a, Driss Bennis1,b and Bertrand Michel2,c
1Faculty of Sciences, Mohammed V University in Rabat, Morocco.
abouchaib azamir@um5.ac.ma
bdriss.bennis@um5.ac.ma; driss bennis@hotmail.com
2Nantes Universit´e, Ecole Centrale Nantes, Laboratoire de math´ematiques Jean Leray
UMR 6629.
1 Rue de La Noe, 44300 Nantes, France.
cbertrand.michel@ec-nantes.fr
Abstract. Topological data analysis has recently been applied to the study of dynamic
networks. In this context, an algorithm was introduced and helps, among other things,
to detect early warning signals of abnormal changes in the dynamic network under study.
However, the complexity of this algorithm increases significantly once the database studied
grows. In this paper, we propose a simplification of the algorithm without affecting its
performance. We give various applications and simulations of the new algorithm on some
weighted networks. The obtained results show clearly the efficiency of the introduced ap-
proach. Moreover, in some cases, the proposed algorithm makes it possible to highlight local
information and sometimes early warning signals of local abnormal changes.
2010 Mathematics Subject Classification: 55N35; 55U99; 91B84
Key Words. Persistent homology; closeness centrality in a network; central subnetwork; time
series.
1 Introduction
A network is an efficient representation of a given set of entities and the existing relationships
between them. Actually a network is a graph which entities are the vertices and the relations
between them are presented as edges (see for instance [14] and [17]). Usually, these relations are
quantified by (mostly positive) real numbers and in this case we say that the graph (the network) is
weighted. In most real-life situations, (weighted) networks evolve over time resulting in a family of
(weighted) networks; we then speak of dynamic (weighted) networks (see the beginning of Section
3). In general, there are two kinds of dynamic networks :
There are dynamic networks such that both the set of vertices and the set of edges are
changing over time. For instance, chat groups in social networks where the members of these
groups are undergoing changes from moment to moment.
1
arXiv:2210.08891v1 [math.AT] 17 Oct 2022
A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks 2
In some situations, the set of vertices remains unchanged over time as in the case of financial
networks such as the weighted networks representing stock market transactions. So, the only
changes occur at the level of the edges and at the level of the weights.
Dynamic networks have been subject of several studies. See for instance the papers and the
references within [20, 21, 22, 23, 24] where dynamic networks have been studied following different
approaches. In this paper, we are interested in studying the behaviour of dynamic weighted net-
works using topological data analysis (TDA). In [12], Gidea used TDA to detect the early signs of
a financial market crisis. His method is based mainly on the fact that we can associate with any
financial network of stocks a metric space whose distance is defined by the correlation coefficients
of stock market returns. This allows to compute the so-called persistent homology of this metric
space (see the preliminaries in Section 2) and, thanks to the concept of time series, the first signs
of a critical transition in the financial network are detected. It is clear that Gidea’s method can
be applied on any other dynamic network. However, for large networks, the execution time of
the algorithms used in Gidea’s method will be very high. Thus, one can ask whether the fact of
working on particular subgraphs rather than the whole network permits to achieve good results in
a reasonable amount of time. Indeed, the abundance of real-world data and its velocity usually call
for data summarization. Namely, graph summarization is a useful tool to help identify the struc-
ture and meaning of data. It has various advantages including reducing data volume and storage,
speeding up algorithms and graph queries, eliminating noise, among others. There are different
types of graph summaries depending on whether the data is homogeneous or heterogeneous and
also whether the network is static or dynamic. In the literature, graph summarization methods
use a set of basic techniques, here we are interested in those that are based on simplification or
parsimony: these methods rationalize an input graph by removing nodes or edges less “important”,
resulting in a sparse graph see [25]. Here, the summary graph consists of a subset of the original
nodes and/or edges. A representative work on node simplification-based summarization techniques
is OntoVis [30], a visual analysis tool for exploring and understanding large heterogeneous social
networks. In [6], the authors propose a four-step unsupervised algorithm for egocentric information
abstraction of heterogeneous social networks using edge, instead of node, filtering. In [31], a new
problem, to simplify weighted graphs by removing less important edges, has been proposed. Simpli-
fied graphs can be used to improve the visualization of a network, to extract its main structure, or
as a preprocessing step for other data mining algorithms. Our approach is part of the simplification
of graphs and aims to identify a subnetwork that contains a large part of the information embodied
by the original network.
Our aim in this paper is to show that there are some subnetworks that can be good candidates
for this study. Those subnetworks are extracted from the whole graph by eliminating some edges
at certain threshold which is determined by the use of the closeness centrality concept (see Section
4 for the definition of closeness centrality). The idea is based on the fact that a node having the
highest closeness centrality is able to spread information efficiently through the whole graph (see
[28] for more details). Then, based on this fact, we propose that the desired subnetwork would be
the one which contains this node called henceforth central node. After that, we need to set the
threshold where edges with higher weights will be eliminated. Also, following the closeness central-
ity notion, the threshold will be chosen from the list of weights of the incident edges to the central
node. However, it is not clear how to determine the optimal threshold. This is why we propose to
focus our study on the three quartiles, the minimum and the maximum of the above proposed list
of weights. The associated subnetworks will be called central subnetworks (see Section 4 for more
details about centrality). Our aim is to determine what threshold levels provides good results. We
will see throught several examples that when the threshold is greater than the third quartiles, the
adopted approach reduces considerably the execution time of Gidea’s method (to almost half in
A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks 3
some cases) without compromising quality and results. Moreover, in some cases we get good results
even with thresholds greater than the median.
This paper is organized as follows:
Section 2 presents some preliminaries on persistent homology. We start this section with a part
(Subsection 2.1) in which we present a short mathematical background that we need to understand
the persistent homology associated to weighted graphs (given in Subsection 2.2), and we present also
how to implement data to get persistence diagrams associated to weighted graphs (see Subsection
2.3).
In Section 3, we present how to associate persistent homology to dynamic networks (see Subsection
3.1) and show how it was used by Gidea to detect abnormal changes in financial networks (see
Subsection 3.2).
Section 4 represents our approach to reduce the computation cost of Gidea’s method (see Figure
5 for the flowchart of the proposed method). It is mainly based on a simplification of the whole
studied network by considering what we called central subnetworks determined using the notion
of closeness centrality (see Algorithm 1). We also describe how to get the persistence diagrams
associated to these central subnetworks (see Algorithm 2).
Section 5 concerns the application phase. Namely, we present some numerical experiments in
which we compare our method with existing ones. In each of the first three subsections, we simulate a
dynamic network. Each subsection deals with a kind of a weighted network whose vertex point cloud
is simulated with a given probability distribution. Then the corresponding central subnetworks are
derived. A visual comparison between the time series associated to both the whole network and
the central subnetwork shows that are very similar (see Figures in this section). This observation is
also confirmed by the calculated R-squared coefficients (see Tables in this section). In parallel, we
show that using our method the execution time is reduced considerably (see Tables in this section).
We end Section 5 by a comparison a performance comparison between our approach and the edge
collapse method (Subsection 5.4). It is known that the edge collapse method provides theoretical
guarantees and allows to obtain, in output, the same starting persistence diagram (see for instance
[16]). From Table 4, we deduce that our method is more practical for large and dense networks.
We end this article with Section 6 which includes an application of the method on real weighted
networks. In Subsection 6.1, we apply on the financial network used by Gidea [12], and in Subsection
7 we consider a cryptocurrency network. As in Section 5, we deduce an improvement of the execution
time using the new method.
2 Persistent homology computed on graphs
Persistent homology is an emergent method for detecting geometrical and topological properties of
a space endowed with a topological structure. The concept of persistence was first introduced by
Edelsbrunner, Letscher and Zomorodian in [8] and then refined by Carlsson and Zomorodian in [3].
Since then, persistent homology has become an essential tool in topological Data Analysis (TDA)
and has been applied in various scientific fields such as biology, image processing, sensor networks
etc. Nowadays, there are many references (surveys, books and notes) on TDA; for a recent one we
refer to [5].
In this section, we give a short presentation on some basic notions concerning particular simplicial
complexes as well as its corresponding persistent homology. We will focus, following our context, on
simplicial complexes associated to graphs investigated in [2]. For a general background on persistent
A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks 4
homology one can see [3, 5, 7, 11].
2.1 Simplicial complex and simplicial homology groups
We start this subsection with a recall of a simplicial complex and then we give the examples of
interest.
Given a finite set V, a simplicial complex with the vertex set Vis a set e
Kof finite subsets of V
such that the elements of Vbelong to e
Kand for any σe
K, any subset of σbelongs to e
K. The
elements of e
Kare called the faces or the simplices of e
K. The dimension of a simplex of e
Kis just
its cardinality minus 1 and the dimension of the simplicial complex e
Kis the largest dimension of
its simplices [5].
There are a lot of examples of simplicial complexes arising from various contexts. Here, we use
the following ones:
Example 2.1 (Simplicial clique complex, [2]) Let G= (V, E)be a graph, where Vis the set
of vertices and Ethe set of edges. For a positive integer k, a k-clique of Gis a set of kvertices
whose the induced subgraph is complete. Denote by Cl(G)the set of all the cliques of G. Notice
that the set Cl(G)satisfies the two conditions :
Cl(G)contains all singletons {v}with vV.
Cl(G)is closed under subsets: if τσand σCl(G), then τCl(G).
Then, Cl(G)is a simplicial complex which is called a simplicial clique complex of G.
The following (geometric) classical example of simplicial complexes can be found in any intro-
ductory books on TDA.
Example 2.2 (Vietoris-Rips complex) Given a finite set of points Xin a metric space (M, d)
and a real number α0. The Vietoris-Rips complex Ripsα(X)is a simplicial complex whose
simplices are sets {x0, . . . , xk}such that d(xi, xj)αfor all 0i, j k.
Now we are ready to recall what are simplicial homology groups of a simplicial complex. To this
end, we define some vector spaces and linear maps. Here, we only deal with vector spaces on the
field F2:= Z/2Z.
Given a simplicial complex K, the vector space generated by the p-simplices of Kis denoted by
Cp(K). It consists of all finite formal sums of p-simplices called p-chains; i.e., an element cbelongs
to Cp(K) if it can be written as follows: c=Pjγjσjfor some scalers γjF2and a family (σj)j
of p-simplices.
For a positive integer p, consider the linear map p:Cp(K)Cp1(K), called boundary map,
defined on p-simplices as follows: For every p-simplex σ,(σ) is the formal sum of the (p1)-
dimensional faces (i.e., subsets of σof cardinal p). An element of the image of pis called a
boundary. Thus, the boundary p(c) of a chain c=Pjγjσjis obtained by extending plinearly;
i.e., p(c) = Pjγjp(σj). The p-chains that have boundary 0 are called p-cycles. They form a
subspace Zpof Cp. The p-chains which are the boundary of (p+ 1)-chains are called p-boundaries
and form a subspace Bpof Cp. It is important but not difficult to show that pp+1 = 0 which is
equivalent to BpZp. Then, if we consider the quotient vector space Zp/Bp, that means we have
annihilated the p-boundaries which are not p-cycles. For instance, if we take Kto be Cl(G), the
simplicial clique complex of a graph G, the dimension of the vector space B1/Z1will be exactly the
A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks 5
number of “holes” in Gand the dimension of the vector space B0/Z0is the number of the connected
components of Cl(G) (i.e., the connected components of the graph (network) G). For this reason
and of course other important ones, the vector space Bp/Zpis considered as one of the principal
notions in algebraic topology that helps in describing topological features. It is denoted by Hp(K)
and usually called the pth simplicial homology group of K. Its elements are called homology classes.
Example 2.3 The simplicial clique complex Cl(G)associated to the graph Gin Figure 1 has three
connected components, and so the dimension of H0(Cl(G)) is three. But it has only one hole because
A9A10+A10A11+A11A12 +A12A9is a cycle but not a boundary. Notice that A5A6+A6A7+A7A8+
A8A5is a boundary of the 3-chain A5A6A8+A6A7A8.
Figure 1: A graph Gwith three connected components.
2.2 Persistence diagrams
When a simplicial complex Khas a history in the sense that it is an union of a chain of simplicial
complexes, we can capture more information on K; especially, the behaviour of homology groups
of these complexes can be used as a signature of the complex Kthat can be used to distinguish it
from other complexes. This is one of many important things that persistent homology is performing.
Consider the following chain of simplicial subcomplexes of a complex K, called a filtration of K:
∅ ⊆ K0K1 · · · Kp=K
This filtration provides more information on K in the following sense: for every couple i, j such that
0ijp, the injection KiKjinduces a homomorphism on the nth simplicial homology
groups for each dimension:
fi,j
n:Hn(Ki)Hn(Kj).
The nth persistent Betti number βi,j
nis the rank of the vector space Im(fi,j
n); that is, βi,j
n=
dim(Im(fi,j
n)). Persistent Betti numbers count how many homology classes of dimension nthat
survive during the passage from Kito Kj. We say that a homology class αHn(Ki) is born
entering in Ki, if αdoes not come from a previous subcomplex; that is, α /Im(fi1,i
n). Similarly,
if αis born in Ki, it dies entering Kjif the image of the map induced by Ki1Kj1does
not contain the image of αbut the image of the map induced by Ki1Kjdoes. In this case
the persistence of αis ji. Finally, all the collected data of births and deaths are presented in
摘要:

ASimpli edAlgorithmforIdentifyingAbnormalChangesinDynamicNetworksBouchaibAzamir1;a,DrissBennis1;bandBertrandMichel2;c1FacultyofSciences,MohammedVUniversityinRabat,Morocco.abouchaibazamir@um5.ac.mabdriss.bennis@um5.ac.ma;drissbennis@hotmail.com2NantesUniversite,EcoleCentraleNantes,Laboratoiredemath...

展开>> 收起<<
A Simplied Algorithm for Identifying Abnormal Changes in Dynamic Networks Bouchaib Azamir1a Driss Bennis1band Bertrand Michel2c.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:2.95MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注