Parallel Order-Based Core Maintenance in Dynamic Graphs

2025-05-02 0 0 619.36KB 25 页 10玖币
侵权投诉
Parallel Order-Based Core Maintenance in Dynamic Graphs
Bin Guoa,, Emil Sekerinskib
aDepartment of Computing &Information Systems, Trent University, 1600 West Bank Drive, Peterborough, K9L 0G2, ON, Canada
bDepartment of Computing and Software, McMaster University, 1280 Main Street West, Hamilton, L8S 4L8, ON, Canada
Abstract
The core number of vertices in a graph is one of the most well-studied cohesive subgraph models due to linear
running-time algorithms. In practice, many data graphs are dynamic graphs that continuously change by inserting
and removing edges. The core numbers are updated in dynamic graphs with edge insertions and deletions, called
core maintenance. When a burst of a large number of inserted or removed edges comes in, these must be handled on
time to keep up with the data stream. There are two main sequential algorithms for core maintenance, Traversal and
Order. Experiments show that the Order algorithm significantly outperforms the Traversal algorithm over various
real graphs.
To the best of our knowledge, all existing parallel approaches are based on the Traversal algorithm. These
algorithms exploit parallelism only for vertices with dierent core numbers; they reduce to sequential algorithms
when all vertices have the same core numbers. This paper proposes a new parallel core maintenance algorithm based
on the Order algorithm. A distinguishing properly is that the algorithm allows parallelism even for graphs where all
vertices have the same core number. Extensive experiments are conducted over real-world, temporal, and synthetic
graphs on a multicore machine. The results show that for inserting and removing a batch of edges using 16 workers,
the proposed method achieves speeds of up to 289 times and 10 times compared to the most ecient existing method.
Keywords:
dynamic graphs, parallel, k-core maintenance, multicore, shared memory.
1. Introduction
Graphs are widely used to model complex networks, such as social networks, hyperlink networks, and knowledge
networks. As one of the most well-studied cohesive subgraph models, the k-core is the maximal subgraph such that
all vertices have degrees at least k. The core number of a vertex is the maximum value of ksuch that this vertex is
contained in the subgraph of k-core vertices [1, 2]. The core numbers can be computed in linear time O(m) by the BZ
algorithm [1], where mis the number of edges in a graph. Due to such computational eciency, the core number of a
vertex is a parameter of density extensively used in numerous applications [2], such as knowledge discovery [3], gene
expression [4], social networks [5], ecology [6], and finance [6].
In [7], Malliaros et al. summarize the main research on k-core decomposition from 1968 to 2019. Many papers
focus on computing the core in static graphs [1, 8, 9, 10, 11]. In practice, many data graphs are large and continuously
changing. It is of practical importance to identify the dense range as fast as possible after a change, e.g., when multiple
edges are inserted or removed. For example, it is necessary to quickly initiate a response to rapidly spreading false
information about vaccines or to urgently address new pandemic super-spreading events [12, 13, 14]. This is the
problem of maintaining the core number in dynamic graphs. In [15], Zhang et al. summarize the research on core
maintenance and applications.
Many sequential algorithms are devised for core maintenance in dynamic graphs [16, 17, 18, 19, 20, 21]. The
main idea for core maintenance is that first, a set of vertices whose core numbers need to be updated (denoted as
Corresponding author
Email addresses: binguo@trentu.ca (Bin Guo), emil@mcmaster.ca (Emil Sekerinski)
Preprint submitted to Journal of Parallel and Distributed Computing October 14, 2024
arXiv:2210.14290v3 [cs.DC] 9 Oct 2024
V) is identified by traversing a possibly larger scope of vertices (denoted as V+) such that VV+. There are
two main algorithms for maintaining core numbers over dynamic graphs, Traversal [18] and Order [17]. Given an
inserted edge, the Traversal algorithm searches Vby performing a depth-first graph traversal within a subcore, a
connected region of vertices with the same core numbers. For the Order algorithm, the size of V+is significantly
reduced, so the ratio |V+|/|V|is typically much smaller and has less variation compared to the Traversal algorithm.
Thus, the computational time is significantly improved. The experiments in [17] show that for edge insertion, Order
significantly outperforms Traversal over all tested graphs with up to 2,083 times speedups; for the edge removal,
Order outperforms Traversal over most of the tested graphs with up to 11 times speedups. Furthermore, based on
the Order algorithm, a Simplified-Order algorithm is proposed in [16], which is easier to implement and to argue for
correctness; also, Simplified-Order has improved time complexities by adopting the Order Maintenance (OM) data
structure to maintain the order of all vertices.
All the above methods are sequential for maintaining core over dynamic graphs, meaning they handle only one
edge insertion or removal at a time. The problem is that when a burst of many inserted or removed edges comes
in, these edges may not be handled on time to keep up with the data stream [14]. The prevalence of multi-core
machines suggests parallelizing the core maintenance algorithms. Many multi-core parallel batch algorithms for core
maintenance have been proposed in [22, 23, 24]. These algorithms are based on similar ideas: 1) they use an available
structure, e.g. Join Edge Set [22], to preprocess a batch of inserted or removed edges, avoiding repeated computations,
and 2) each worker performs the Traversal algorithm. However, there are three drawbacks to these approaches. First,
they are based on the sequential Traversal algorithm [20, 21], which is proved to be less ecient than the Order
algorithm [22, 16]. Second, parallelism can be exploited only for aected vertices with dierent core numbers; the
algorithm is reduced to running sequentially when all aected vertices have the same core numbers. Third, the time
complexities are not analyzed by the work-depth model [25], where the work Wis its sequential running time, and
the depth Dis its running time on an infinite number of processors.
To overcome the above drawbacks, inspired by the Simplified-Order algorithm [16], we propose a new parallel
batch algorithm, called Parallel-Order, to maintain core numbers after batches of edges insertions or removals in
dynamic graphs. Based on maintaining the k-order with the parallel OM data structure, we can parallelize the Order
algorithm. 1) For edge insertion, the idea is straightforward: each worker handles one inserted edge at a time and
propagates the aected vertices in k-order, such that only the traversed vertices in V+are locked. When traversing
vV+, not all neighbors uv.adj need to be locked if u<V+. When another worker tries to access these vertices, it
must wait until they are unlocked. Since all aected vertices are locked in korder, this method does not have blocking
cycles that lead to a deadlock. 2) Edge removal is more challenging than edge insertion. The idea is that each worker
handles one removed edge at a time and propagates the aected vertices, such that only the traversed vertices in V
are locked. When traversing vV, not all neighbors uv.adj need to be locked if u<V. The problem is that this
method may cause blocking cycles that lead to a deadlock as all aected vertices are traversed without any order. We
propose a novel mechanism to avoid such deadlocks. The main contributions of this work are summarized below:
We investigate the drawbacks of the state-of-the-art parallel core maintenance algorithms [22, 23, 24].
Based on the parallel OM data structure[26] and the Simplified-Order core maintenance algorithm[16], we
propose a Parallel-Order edge insertion algorithm for core maintenance. Only the traversed vertices in V+
are locked for synchronization. We also implement the priority queue Qcombined with the parallel OM data
structure to obtain a vertex in Qwith a minimal k-order.
We propose a Parallel-Order edge removal algorithm for core maintenance and a novel mechanism to avoid
blocking cycles that lead to a deadlock. Only the traversed vertices in Vare locked for synchronization.
We prove the correctness of our Parallel-Order insertion and removal core maintenance algorithms; we also
prove the time complexities with the work-depth model.
We conduct extensive experiments on a 64-core machine over various graphs to evaluate the Parallel-Order
algorithms for edge insertion and removal.
We analyze our parallel algorithms in the standard work-depth model [27]. The work, denoted as W, is the total
number of operations that the algorithm uses. The depth, denoted as D, is the longest chain of sequential operations.
2
Table 1: The work and depth complexities of our parallel core maintenance
Worst-case (O) Best-case (O)
Parallel W D W D
Insert m|E+|log |E+|m|E+|log |E+|m|E+|log |E+| |E+|log |E+|+m|V|
Remove m|E|m|E|m|E| |E|+m|V|
Table 1 shows the work and depth of the Parallel-Order algorithm for inserting and removing medges in parallel.
For both edge insertion and removal, one issue is that the depth Dis equal to the work Win the worst case; that
is, all workers execute in one blocking chain such that only one worker is active. However, with a high probability,
such a worst-case does not happen as the number of locked vertices is always small for each insertion and removal.
For our method, all vertices in V+or Vare locked together for each insertion or removal. In Fig. 1, on 16 tested
graphs (in Table 2 of our experiment section), we summarize the number of dierent sizes of V+when randomly
inserting and removing 100,000 edges. We observe that almost all V+and Vhave really small sizes for insertion or
removal, respectively. Specifically, more than 97% of edge insertions and removals for all tested graphs have |V+|and
|V|between 0 and 10. That means, with a high probability that less than 11 vertices are locked when inserting or
removing one edge, which leads to a low probability that all workers execute in one long blocking chain.
Figure 1: The number of dierent sizes of V+for inserting and removing 100,000 edges by using our parallel core maintenance algorithms,
respectively. The x-axis is the size of V+and the y-axis is the number of such size of V.
The rest of this paper is organized as follows. The related work is discussed in Section 2. The preliminaries are
given in Section 3. Our new parallelized Order-Based core maintenance algorithms are proposed in Section 4. We
discuss the implementation of priority queues and buer queues in Section 5. We conduct extensive performance
studies and show the results in Section 6, and conclude this work in Section 7.
2. Related Work
2.1. Core Decomposition
The BZ algorithm [1] has linear running time O(m) by using bucket structures, where mis the number of edges. In
[8], an external memory algorithm is proposed, so-called EMcore, which runs in a top-down manner such that the
whole graph does not have to be loaded into memory. In [11], Wen et al. provide a semi-external algorithm, which
requires O(n) memory to maintain the information of vertices, where nis the number of vertices. In [9], Khaouid et
al. investigate the core decomposition in a single PC over large graphs by using GraphChi and WebGraph models. In
[10], Montresoret et al. consider the core decomposition in a distributed system. In addition, the parallel computation
of core decomposition in multi-core processors is first investigated in [28], where the ParK algorithm was proposed.
Based on the main idea of ParK, a more scalable PKC algorithm has been reported in [29].
3
2.2. Core Maintenance
In [20, 21], an algorithm similar to the Traversal algorithm is given, but with a quadratic time complexity. In [11],
a semi-external algorithm for core maintenance is proposed to reduce the I/O cost, but this method takes more CPU
time than the Traversal algorithm. In [30], Sun et al. design algorithms to maintain approximate cores in dynamic
hypergraphs in which a hyperedge may contain one or more participating vertices compared with exactly two in
graphs. In [14], Gabert et al. propose parallel core maintenance algorithms for maintaining cores over hypergraphs.
There exists some research based on the above core maintenance. In [31], the authors study computing all k-cores
in the graph snapshot over the time window. In [32], the authors explore the hierarchy core maintenance. In [33],
distributed core maintenance is explored. In [34], a parallel approximate k-core decomposition and maintenance
approach is proposed, where bounded approximate core numbers for vertices can be maintained with high probability.
2.3. Weighted Graphs
All the above work focuses on unweighted graphs, but graphs are weighted in many applications. For an edge-
weighted graph, the degree of a vertex is the sum of the weights of all its incident edges. However, weighted graphs
have a large search range for maintaining the core numbers after a change by using the traditional core maintenance
algorithms directly, as the degree of a related vertex may change widely. In [35], Zhou et al. extend the coreness to
weighted graphs and devise weighted core decomposition algorithms; also, they devise weighted core maintenance
based on the k-order [17, 16]. In [36], Liu et al. improve the core decomposition and incremental maintenance
algorithm to suit edge-weighted graphs.
3. Preliminaries
Let G=(V,E) be an undirected and unweighted graph; V(G) denotes the set of vertices, and E(G) represents the
set of edges in G. When the context is clear, we will use Vand Einstead of V(G) and E(G), respectively. As G
is an undirected graph, an edge (u,v)E(G) is equivalent to (v,u)E(G). We denote the number of vertices
and edges of Gby nand m, respectively. We define the set of neighbors of a vertex uVas u.adj, formally
u.adj ={vV: (u,v)E}. We denote the degree of uin Gas u.deg =|u.adj|.
Definition 3.1 (k-Core).Given an undirected graph G=(V,E) and an integer k, a subgraph Gkof Gis called a k-core
if it satisfies the following conditions: (1) for uV(Gk), u.deg k; (2) Gkis maximal. Moreover, Gk+1Gk, for all
k0, and G0is just G.
Definition 3.2 (Core Number).Given an undirected graph G=(V,E), the core number of a vertex uG(V), denoted
as u.core, is defined as u.core =max{k:uV(Gk)}. That means u.core is the largest ksuch that there exists a k-core
containing u.
Definition 3.3 (k-Subcore).Given an undirected graph G=(V,E), a maximal set of vertices SVis called a k-
subcore if (1) uS,u.core =k; (2) the induced subgraph G(S) is connected. The subcore that contain vertex uis
denoted as sc(u).
3.1. Core Decomposition
Given a graph G, the problem of computing the core number for each uV(G) is called core decomposition. In [1],
Batagelj et al. propose a linear time O(m+n) algorithm, the so-called BZ algorithm, shown in Algorithm 1. The
general idea is peeling: to compute the k-core Gkof G, repeatedly vertices (and their adjacent edges) whose degrees
are less than kare removed. When there are no more vertices to be removed, the resulting graphs are the k-core of
G. The core number of uis determined in line 5. The min-priority queue Qcan be eciently implemented by bucket
sorting [1], leading to a linear running time of O(m+n).
4
Algorithm 1: BZ algorithm for core decomposition
1for uVdo u.d← |u.adj|;u.core =
2Qa min-priority queue by u.dfor all uV
3while Q,do
4uQ.dequeue()
5u.core u.d; remove ufrom G
6for vu.adj do
7if u.d<v.dthen v.dv.d1
8update Q
3.2. Core Maintenance
The problem of maintaining the core numbers for dynamic graphs Gwhen edges are inserted into and removed from
Gcontinuously is called core maintenance. The insertion and removal of vertices can be simulated as a sequence of
edge insertions and removals.
Definition 3.4 (Candidate Set Vand Searching Set V+).Given a graph G=(V,E), when an edge is inserted or
removed, a candidate set of vertices, denoted V, needs to be identified, and the core numbers of vertices in Vmust
be updated. To identify V, a minimal set of vertices, denoted V+, is traversed by accessing their adjacent edges.
Clearly, we have VV+. An ecient core maintenance algorithm should have a small ratio of |V+|/|V|. It is
shown that the Order [22] insertion algorithm has a significantly lower ratio compared to the Traversal [20] insertion
algorithm. This is why we try to parallelize the Order algorithm in this paper.
In [21, 20], it is proved that after inserting or removing one edge, the core number of vertices in Vincreases or
decreases by at most one, respectively; Vis only located in the k-subcore, where kis the lower core number of two
vertices that the inserted or removed edge connect.
3.3. The Order-Based Core Maintenance
The state-of-the-art core maintenance solution is the Order algorithm [17, 16]. For edge insertion, it is based on three
notions: k-order, candidate degree, and remaining degree. For edge removal, it uses the notion of the max-core degree
[20].
3.3.1. Edge Insertion
Definition 3.5 (k-Order ).[17] Given a graph G, the k-order is defined for any pairs of vertices uand vover the
graph Gas follows: (1) when u.core <v.core, we have uv; (2) when u.core =v.core, we have uvif us core
number is determined before vs by the peeling steps of the BZ algorithm.
Ak-order is an instance of all the possible vertex sequences produced by the BZ algorithm. When generating the
k-order, there may be multiple vertices vQthat have the same value of u.dand can be dequeued from Qat the same
time together (Algorithm 1, line 4). When dealing with these vertices with the same value of d, dierent sequences
generate dierent instances of correct k-order for all vertices. There are three heuristic strategies, “small degree first”,
“large degree first”, and “random”. The experiments in [17] show that the “small degree first” consistently has the
best performance over all tested real graphs, and thus we choose this strategy for implementation and experiments.
For the k-order, transitivity holds, that is, uvif uwwv. For each edge insertion and removal, the k-order
will be maintained. Here, Okdenotes the sequence of vertices in k-order whose core numbers are k. A sequence
O=O0O1O2· · · over V(G) can be obtained, where OiOjif i<j. It is clear that is defined over the sequence of
O=O0O1O2· · · . In other words, for all vertices in the graph, the sequence Oindicates the k-order .
Given an undirected graph G=(V,E) with Oin k-order, each edge (u,v)E(G) can be assigned a direction such
that uvand the directed edge is denoted as u7→ v. By doing this, a direct acyclic graph (DAG)
G=(V,
E) can be
constructed where each edge u7→ v
E(
G) satisfies uv. Of course, the k-order of Gis a topological order of
G.
Here, the set of successors of vis defined as u(
G).post ={v|u7→ v
E(
G)}; the set of predecessors of vis defined as
5
摘要:

ParallelOrder-BasedCoreMaintenanceinDynamicGraphsBinGuoa,∗,EmilSekerinskibaDepartmentofComputing&InformationSystems,TrentUniversity,1600WestBankDrive,Peterborough,K9L0G2,ON,CanadabDepartmentofComputingandSoftware,McMasterUniversity,1280MainStreetWest,Hamilton,L8S4L8,ON,CanadaAbstractThecorenumberofv...

展开>> 收起<<
Parallel Order-Based Core Maintenance in Dynamic Graphs.pdf

共25页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:25 页 大小:619.36KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 25
客服
关注