A Simplified Algorithm for Identifying Abnormal Changes in Dynamic Networks 2
•In some situations, the set of vertices remains unchanged over time as in the case of financial
networks such as the weighted networks representing stock market transactions. So, the only
changes occur at the level of the edges and at the level of the weights.
Dynamic networks have been subject of several studies. See for instance the papers and the
references within [20, 21, 22, 23, 24] where dynamic networks have been studied following different
approaches. In this paper, we are interested in studying the behaviour of dynamic weighted net-
works using topological data analysis (TDA). In [12], Gidea used TDA to detect the early signs of
a financial market crisis. His method is based mainly on the fact that we can associate with any
financial network of stocks a metric space whose distance is defined by the correlation coefficients
of stock market returns. This allows to compute the so-called persistent homology of this metric
space (see the preliminaries in Section 2) and, thanks to the concept of time series, the first signs
of a critical transition in the financial network are detected. It is clear that Gidea’s method can
be applied on any other dynamic network. However, for large networks, the execution time of
the algorithms used in Gidea’s method will be very high. Thus, one can ask whether the fact of
working on particular subgraphs rather than the whole network permits to achieve good results in
a reasonable amount of time. Indeed, the abundance of real-world data and its velocity usually call
for data summarization. Namely, graph summarization is a useful tool to help identify the struc-
ture and meaning of data. It has various advantages including reducing data volume and storage,
speeding up algorithms and graph queries, eliminating noise, among others. There are different
types of graph summaries depending on whether the data is homogeneous or heterogeneous and
also whether the network is static or dynamic. In the literature, graph summarization methods
use a set of basic techniques, here we are interested in those that are based on simplification or
parsimony: these methods rationalize an input graph by removing nodes or edges less “important”,
resulting in a sparse graph see [25]. Here, the summary graph consists of a subset of the original
nodes and/or edges. A representative work on node simplification-based summarization techniques
is OntoVis [30], a visual analysis tool for exploring and understanding large heterogeneous social
networks. In [6], the authors propose a four-step unsupervised algorithm for egocentric information
abstraction of heterogeneous social networks using edge, instead of node, filtering. In [31], a new
problem, to simplify weighted graphs by removing less important edges, has been proposed. Simpli-
fied graphs can be used to improve the visualization of a network, to extract its main structure, or
as a preprocessing step for other data mining algorithms. Our approach is part of the simplification
of graphs and aims to identify a subnetwork that contains a large part of the information embodied
by the original network.
Our aim in this paper is to show that there are some subnetworks that can be good candidates
for this study. Those subnetworks are extracted from the whole graph by eliminating some edges
at certain threshold which is determined by the use of the closeness centrality concept (see Section
4 for the definition of closeness centrality). The idea is based on the fact that a node having the
highest closeness centrality is able to spread information efficiently through the whole graph (see
[28] for more details). Then, based on this fact, we propose that the desired subnetwork would be
the one which contains this node called henceforth central node. After that, we need to set the
threshold where edges with higher weights will be eliminated. Also, following the closeness central-
ity notion, the threshold will be chosen from the list of weights of the incident edges to the central
node. However, it is not clear how to determine the optimal threshold. This is why we propose to
focus our study on the three quartiles, the minimum and the maximum of the above proposed list
of weights. The associated subnetworks will be called central subnetworks (see Section 4 for more
details about centrality). Our aim is to determine what threshold levels provides good results. We
will see throught several examples that when the threshold is greater than the third quartiles, the
adopted approach reduces considerably the execution time of Gidea’s method (to almost half in