Parallel Order-Based Core Maintenance in Dynamic Graphs

2025-05-02 0 0 619.36KB 25 页 10玖币

Bin Guoa,∗, Emil Sekerinskib

aDepartment of Computing &Information Systems, Trent University, 1600 West Bank Drive, Peterborough, K9L 0G2, ON, Canada

bDepartment of Computing and Software, McMaster University, 1280 Main Street West, Hamilton, L8S 4L8, ON, Canada

Abstract

The core number of vertices in a graph is one of the most well-studied cohesive subgraph models due to linear

running-time algorithms. In practice, many data graphs are dynamic graphs that continuously change by inserting

and removing edges. The core numbers are updated in dynamic graphs with edge insertions and deletions, called

core maintenance. When a burst of a large number of inserted or removed edges comes in, these must be handled on

time to keep up with the data stream. There are two main sequential algorithms for core maintenance, Traversal and

Order. Experiments show that the Order algorithm signiﬁcantly outperforms the Traversal algorithm over various

real graphs.

To the best of our knowledge, all existing parallel approaches are based on the Traversal algorithm. These

algorithms exploit parallelism only for vertices with diﬀerent core numbers; they reduce to sequential algorithms

when all vertices have the same core numbers. This paper proposes a new parallel core maintenance algorithm based

on the Order algorithm. A distinguishing properly is that the algorithm allows parallelism even for graphs where all

vertices have the same core number. Extensive experiments are conducted over real-world, temporal, and synthetic

graphs on a multicore machine. The results show that for inserting and removing a batch of edges using 16 workers,

the proposed method achieves speeds of up to 289 times and 10 times compared to the most eﬃcient existing method.

Keywords:

dynamic graphs, parallel, k-core maintenance, multicore, shared memory.

1. Introduction

Graphs are widely used to model complex networks, such as social networks, hyperlink networks, and knowledge

networks. As one of the most well-studied cohesive subgraph models, the k-core is the maximal subgraph such that

all vertices have degrees at least k. The core number of a vertex is the maximum value of ksuch that this vertex is

contained in the subgraph of k-core vertices [1, 2]. The core numbers can be computed in linear time O(m) by the BZ

algorithm [1], where mis the number of edges in a graph. Due to such computational eﬃciency, the core number of a

vertex is a parameter of density extensively used in numerous applications [2], such as knowledge discovery [3], gene

expression [4], social networks [5], ecology [6], and ﬁnance [6].

In [7], Malliaros et al. summarize the main research on k-core decomposition from 1968 to 2019. Many papers

focus on computing the core in static graphs [1, 8, 9, 10, 11]. In practice, many data graphs are large and continuously

changing. It is of practical importance to identify the dense range as fast as possible after a change, e.g., when multiple

edges are inserted or removed. For example, it is necessary to quickly initiate a response to rapidly spreading false

information about vaccines or to urgently address new pandemic super-spreading events [12, 13, 14]. This is the

problem of maintaining the core number in dynamic graphs. In [15], Zhang et al. summarize the research on core

maintenance and applications.

Many sequential algorithms are devised for core maintenance in dynamic graphs [16, 17, 18, 19, 20, 21]. The

main idea for core maintenance is that ﬁrst, a set of vertices whose core numbers need to be updated (denoted as

∗Corresponding author

Email addresses: binguo@trentu.ca (Bin Guo), emil@mcmaster.ca (Emil Sekerinski)

Preprint submitted to Journal of Parallel and Distributed Computing October 14, 2024

arXiv:2210.14290v3 [cs.DC] 9 Oct 2024

V∗) is identiﬁed by traversing a possibly larger scope of vertices (denoted as V+) such that V∗⊆V+. There are

two main algorithms for maintaining core numbers over dynamic graphs, Traversal [18] and Order [17]. Given an

inserted edge, the Traversal algorithm searches V∗by performing a depth-ﬁrst graph traversal within a subcore, a

connected region of vertices with the same core numbers. For the Order algorithm, the size of V+is signiﬁcantly

reduced, so the ratio |V+|/|V∗|is typically much smaller and has less variation compared to the Traversal algorithm.

Thus, the computational time is signiﬁcantly improved. The experiments in [17] show that for edge insertion, Order

signiﬁcantly outperforms Traversal over all tested graphs with up to 2,083 times speedups; for the edge removal,

Order outperforms Traversal over most of the tested graphs with up to 11 times speedups. Furthermore, based on

the Order algorithm, a Simplified-Order algorithm is proposed in [16], which is easier to implement and to argue for

correctness; also, Simplified-Order has improved time complexities by adopting the Order Maintenance (OM) data

structure to maintain the order of all vertices.

All the above methods are sequential for maintaining core over dynamic graphs, meaning they handle only one

edge insertion or removal at a time. The problem is that when a burst of many inserted or removed edges comes

in, these edges may not be handled on time to keep up with the data stream [14]. The prevalence of multi-core

machines suggests parallelizing the core maintenance algorithms. Many multi-core parallel batch algorithms for core

maintenance have been proposed in [22, 23, 24]. These algorithms are based on similar ideas: 1) they use an available

structure, e.g. Join Edge Set [22], to preprocess a batch of inserted or removed edges, avoiding repeated computations,

and 2) each worker performs the Traversal algorithm. However, there are three drawbacks to these approaches. First,

they are based on the sequential Traversal algorithm [20, 21], which is proved to be less eﬃcient than the Order

algorithm [22, 16]. Second, parallelism can be exploited only for aﬀected vertices with diﬀerent core numbers; the

algorithm is reduced to running sequentially when all aﬀected vertices have the same core numbers. Third, the time

complexities are not analyzed by the work-depth model [25], where the work Wis its sequential running time, and

the depth Dis its running time on an inﬁnite number of processors.

To overcome the above drawbacks, inspired by the Simplified-Order algorithm [16], we propose a new parallel

batch algorithm, called Parallel-Order, to maintain core numbers after batches of edges insertions or removals in

dynamic graphs. Based on maintaining the k-order with the parallel OM data structure, we can parallelize the Order

algorithm. 1) For edge insertion, the idea is straightforward: each worker handles one inserted edge at a time and

propagates the aﬀected vertices in k-order, such that only the traversed vertices in V+are locked. When traversing

v∈V+, not all neighbors u∈v.adj need to be locked if u<V+. When another worker tries to access these vertices, it

must wait until they are unlocked. Since all aﬀected vertices are locked in korder, this method does not have blocking

cycles that lead to a deadlock. 2) Edge removal is more challenging than edge insertion. The idea is that each worker

handles one removed edge at a time and propagates the aﬀected vertices, such that only the traversed vertices in V∗

are locked. When traversing v∈V∗, not all neighbors u∈v.adj need to be locked if u<V∗. The problem is that this

method may cause blocking cycles that lead to a deadlock as all aﬀected vertices are traversed without any order. We

propose a novel mechanism to avoid such deadlocks. The main contributions of this work are summarized below:

•We investigate the drawbacks of the state-of-the-art parallel core maintenance algorithms [22, 23, 24].

•Based on the parallel OM data structure[26] and the Simplified-Order core maintenance algorithm[16], we

propose a Parallel-Order edge insertion algorithm for core maintenance. Only the traversed vertices in V+

are locked for synchronization. We also implement the priority queue Qcombined with the parallel OM data

structure to obtain a vertex in Qwith a minimal k-order.

•We propose a Parallel-Order edge removal algorithm for core maintenance and a novel mechanism to avoid

blocking cycles that lead to a deadlock. Only the traversed vertices in V∗are locked for synchronization.

•We prove the correctness of our Parallel-Order insertion and removal core maintenance algorithms; we also

prove the time complexities with the work-depth model.

•We conduct extensive experiments on a 64-core machine over various graphs to evaluate the Parallel-Order

algorithms for edge insertion and removal.

We analyze our parallel algorithms in the standard work-depth model [27]. The work, denoted as W, is the total

number of operations that the algorithm uses. The depth, denoted as D, is the longest chain of sequential operations.

Table 1: The work and depth complexities of our parallel core maintenance

Worst-case (O) Best-case (O)

Parallel W D W D

Insert m′|E+|log |E+|m′|E+|log |E+|m′|E+|log |E+| |E+|log |E+|+m′|V∗|

Remove m′|E∗|m′|E∗|m′|E∗| |E∗|+m′|V∗|

Table 1 shows the work and depth of the Parallel-Order algorithm for inserting and removing m′edges in parallel.

For both edge insertion and removal, one issue is that the depth Dis equal to the work Win the worst case; that

is, all workers execute in one blocking chain such that only one worker is active. However, with a high probability,

such a worst-case does not happen as the number of locked vertices is always small for each insertion and removal.

For our method, all vertices in V+or V∗are locked together for each insertion or removal. In Fig. 1, on 16 tested

graphs (in Table 2 of our experiment section), we summarize the number of diﬀerent sizes of V+when randomly

inserting and removing 100,000 edges. We observe that almost all V+and V∗have really small sizes for insertion or

removal, respectively. Speciﬁcally, more than 97% of edge insertions and removals for all tested graphs have |V+|and

|V∗|between 0 and 10. That means, with a high probability that less than 11 vertices are locked when inserting or

removing one edge, which leads to a low probability that all workers execute in one long blocking chain.

Figure 1: The number of diﬀerent sizes of V+for inserting and removing 100,000 edges by using our parallel core maintenance algorithms,

respectively. The x-axis is the size of V+and the y-axis is the number of such size of V∗.

The rest of this paper is organized as follows. The related work is discussed in Section 2. The preliminaries are

given in Section 3. Our new parallelized Order-Based core maintenance algorithms are proposed in Section 4. We

discuss the implementation of priority queues and buﬀer queues in Section 5. We conduct extensive performance

studies and show the results in Section 6, and conclude this work in Section 7.

2. Related Work

2.1. Core Decomposition

The BZ algorithm [1] has linear running time O(m) by using bucket structures, where mis the number of edges. In

[8], an external memory algorithm is proposed, so-called EMcore, which runs in a top-down manner such that the

whole graph does not have to be loaded into memory. In [11], Wen et al. provide a semi-external algorithm, which

requires O(n) memory to maintain the information of vertices, where nis the number of vertices. In [9], Khaouid et

al. investigate the core decomposition in a single PC over large graphs by using GraphChi and WebGraph models. In

[10], Montresoret et al. consider the core decomposition in a distributed system. In addition, the parallel computation

of core decomposition in multi-core processors is ﬁrst investigated in [28], where the ParK algorithm was proposed.

Based on the main idea of ParK, a more scalable PKC algorithm has been reported in [29].

2.2. Core Maintenance

In [20, 21], an algorithm similar to the Traversal algorithm is given, but with a quadratic time complexity. In [11],

a semi-external algorithm for core maintenance is proposed to reduce the I/O cost, but this method takes more CPU

time than the Traversal algorithm. In [30], Sun et al. design algorithms to maintain approximate cores in dynamic

hypergraphs in which a hyperedge may contain one or more participating vertices compared with exactly two in

graphs. In [14], Gabert et al. propose parallel core maintenance algorithms for maintaining cores over hypergraphs.

There exists some research based on the above core maintenance. In [31], the authors study computing all k-cores

in the graph snapshot over the time window. In [32], the authors explore the hierarchy core maintenance. In [33],

distributed core maintenance is explored. In [34], a parallel approximate k-core decomposition and maintenance

approach is proposed, where bounded approximate core numbers for vertices can be maintained with high probability.

2.3. Weighted Graphs

All the above work focuses on unweighted graphs, but graphs are weighted in many applications. For an edge-

weighted graph, the degree of a vertex is the sum of the weights of all its incident edges. However, weighted graphs

have a large search range for maintaining the core numbers after a change by using the traditional core maintenance

algorithms directly, as the degree of a related vertex may change widely. In [35], Zhou et al. extend the coreness to

weighted graphs and devise weighted core decomposition algorithms; also, they devise weighted core maintenance

based on the k-order [17, 16]. In [36], Liu et al. improve the core decomposition and incremental maintenance

algorithm to suit edge-weighted graphs.

3. Preliminaries

Let G=(V,E) be an undirected and unweighted graph; V(G) denotes the set of vertices, and E(G) represents the

set of edges in G. When the context is clear, we will use Vand Einstead of V(G) and E(G), respectively. As G

is an undirected graph, an edge (u,v)∈E(G) is equivalent to (v,u)∈E(G). We denote the number of vertices

and edges of Gby nand m, respectively. We deﬁne the set of neighbors of a vertex u∈Vas u.adj, formally

u.adj ={v∈V: (u,v)∈E}. We denote the degree of uin Gas u.deg =|u.adj|.

Deﬁnition 3.1 (k-Core).Given an undirected graph G=(V,E) and an integer k, a subgraph Gkof Gis called a k-core

if it satisﬁes the following conditions: (1) for ∀u∈V(Gk), u.deg ≥k; (2) Gkis maximal. Moreover, Gk+1⊆Gk, for all

k≥0, and G0is just G.

Deﬁnition 3.2 (Core Number).Given an undirected graph G=(V,E), the core number of a vertex u∈G(V), denoted

as u.core, is deﬁned as u.core =max{k:u∈V(Gk)}. That means u.core is the largest ksuch that there exists a k-core

containing u.

Deﬁnition 3.3 (k-Subcore).Given an undirected graph G=(V,E), a maximal set of vertices S⊆Vis called a k-

subcore if (1) ∀u∈S,u.core =k; (2) the induced subgraph G(S) is connected. The subcore that contain vertex uis

denoted as sc(u).

3.1. Core Decomposition

Given a graph G, the problem of computing the core number for each u∈V(G) is called core decomposition. In [1],

Batagelj et al. propose a linear time O(m+n) algorithm, the so-called BZ algorithm, shown in Algorithm 1. The

general idea is peeling: to compute the k-core Gkof G, repeatedly vertices (and their adjacent edges) whose degrees

are less than kare removed. When there are no more vertices to be removed, the resulting graphs are the k-core of

G. The core number of uis determined in line 5. The min-priority queue Qcan be eﬃciently implemented by bucket

sorting [1], leading to a linear running time of O(m+n).

Algorithm 1: BZ algorithm for core decomposition

1for u∈Vdo u.d← |u.adj|;u.core =∅

2Q←a min-priority queue by u.dfor all u∈V

3while Q,∅do

4u←Q.dequeue()

5u.core ←u.d; remove ufrom G

6for v∈u.adj do

7if u.d<v.dthen v.d←v.d−1

8update Q

3.2. Core Maintenance

The problem of maintaining the core numbers for dynamic graphs Gwhen edges are inserted into and removed from

Gcontinuously is called core maintenance. The insertion and removal of vertices can be simulated as a sequence of

edge insertions and removals.

Deﬁnition 3.4 (Candidate Set V∗and Searching Set V+).Given a graph G=(V,E), when an edge is inserted or

removed, a candidate set of vertices, denoted V∗, needs to be identiﬁed, and the core numbers of vertices in V∗must

be updated. To identify V∗, a minimal set of vertices, denoted V+, is traversed by accessing their adjacent edges.

Clearly, we have V∗⊆V+. An eﬃcient core maintenance algorithm should have a small ratio of |V+|/|V∗|. It is

shown that the Order [22] insertion algorithm has a signiﬁcantly lower ratio compared to the Traversal [20] insertion

algorithm. This is why we try to parallelize the Order algorithm in this paper.

In [21, 20], it is proved that after inserting or removing one edge, the core number of vertices in V∗increases or

decreases by at most one, respectively; V∗is only located in the k-subcore, where kis the lower core number of two

vertices that the inserted or removed edge connect.

3.3. The Order-Based Core Maintenance

The state-of-the-art core maintenance solution is the Order algorithm [17, 16]. For edge insertion, it is based on three

notions: k-order, candidate degree, and remaining degree. For edge removal, it uses the notion of the max-core degree

[20].

3.3.1. Edge Insertion

Deﬁnition 3.5 (k-Order ≺).[17] Given a graph G, the k-order ≺is deﬁned for any pairs of vertices uand vover the

graph Gas follows: (1) when u.core <v.core, we have u≺v; (2) when u.core =v.core, we have u≺vif u’s core

number is determined before v’s by the peeling steps of the BZ algorithm.

Ak-order ≺is an instance of all the possible vertex sequences produced by the BZ algorithm. When generating the

k-order, there may be multiple vertices v∈Qthat have the same value of u.dand can be dequeued from Qat the same

time together (Algorithm 1, line 4). When dealing with these vertices with the same value of d, diﬀerent sequences

generate diﬀerent instances of correct k-order for all vertices. There are three heuristic strategies, “small degree ﬁrst”,

“large degree ﬁrst”, and “random”. The experiments in [17] show that the “small degree ﬁrst” consistently has the

best performance over all tested real graphs, and thus we choose this strategy for implementation and experiments.

For the k-order, transitivity holds, that is, u≺vif u≺w∧w≺v. For each edge insertion and removal, the k-order

will be maintained. Here, Okdenotes the sequence of vertices in k-order whose core numbers are k. A sequence

O=O0O1O2· · · over V(G) can be obtained, where Oi≺Ojif i<j. It is clear that ≺is deﬁned over the sequence of

O=O0O1O2· · · . In other words, for all vertices in the graph, the sequence Oindicates the k-order ≺.

Given an undirected graph G=(V,E) with Oin k-order, each edge (u,v)∈E(G) can be assigned a direction such

that u≺vand the directed edge is denoted as u7→ v. By doing this, a direct acyclic graph (DAG) ⃗

G=(V,⃗

E) can be

constructed where each edge u7→ v∈⃗

E(⃗

G) satisﬁes u≺v. Of course, the k-order of Gis a topological order of ⃗

Here, the set of successors of vis deﬁned as u(⃗

G).post ={v|u7→ v∈⃗

E(⃗

G)}; the set of predecessors of vis deﬁned as

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ParallelOrder-BasedCoreMaintenanceinDynamicGraphsBinGuoa,∗,EmilSekerinskibaDepartmentofComputing&InformationSystems,TrentUniversity,1600WestBankDrive,Peterborough,K9L0G2,ON,CanadabDepartmentofComputingandSoftware,McMasterUniversity,1280MainStreetWest,Hamilton,L8S4L8,ON,CanadaAbstractThecorenumberofv...

展开>> 收起<<

Parallel Order-Based Core Maintenance in Dynamic Graphs.pdf

共25页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Parallel Order-Based Core Maintenance in Dynamic Graphs

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: