Overlapping Community Detection using Dynamic Dilated Aggregation in Deep Residual GCN Md Nurul Muttakin10000000261146260 Md. Iqbal

2025-04-29 0 0 2.24MB 35 页 10玖币

侵权投诉

Overlapping Community Detection using Dynamic

Dilated Aggregation in Deep Residual GCN

Md Nurul Muttakin1[0000−0002−6114−6260], Md. Iqbal

Hossain2[0000−0001−6212−7638], and Md. Saidur Rahman1[0000−0003−0112−0242]

1Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh

2University of Arizona, USA

Abstract.

Overlapping community detection is a critical problem in

unsupervised machine learning on network-structured data. Some research

has considered applying graph convolutional networks (GCN) to tackle

the problem. However, the considered GCN is shallow and can not capture

communities with larger diameters. Moreover, it is still open to incorporate

dynamic dilated aggregation in deep GCNs for irregular graphs. In this

study, we design a deep dynamic residual GCN (DynaResGCN) based

on our dynamic dilated aggregation algorithm and a uniﬁed end-to-end

encoder-decoder-based framework to detect overlapping communities. The

deep DynaResGCN model is used as the encoder, whereas we incorporate

the Bernoulli-Poisson (BP) model as the decoder. Consequently, we

apply our overlapping community detection framework in a research

topics dataset without having ground truth, a set of networks from

Facebook having reliable (hand-labeled) ground truth, and in a set of

very large co-authorship networks having empirical (not hand-labeled)

ground truth. Our experimentation on these datasets shows signiﬁcantly

superior performance over many state-of-the-art methods for detecting

overlapping communities in networks.

Keywords: Overlapping Community Detection ·Deep GCN ·Unsuper-

vised Learning

1 Introduction

Network structured data are ubiquitous. Almost every domain of science and

engineering has to deal with networks. The nodes of real-world networks often

form special groups where link density is high, and the density of edges is low

among these groups. This property of networks is called community structure

[

]. Human society is composed of communities that are in many cases virtual

due to the wide use of online social platforms. Indeed, social communities are

of great importance to social scientists and have been studied thoroughly for

a long time [

]. Communities in the global ﬁnancial network play

a key role in inﬂuencing the transitions of the global economy and ﬁnancial

system [

]. In protein-protein interaction networks, protein communities act as

functional modules to perform speciﬁc tasks in cells [

]. As a consequence,

arXiv:2210.11174v3 [cs.AI] 28 Sep 2024

2 N. Muttakin et al.

community detection is an essential tool to analyze diﬀerent kinds of networks

such as social networks, research topics networks, protein-protein interaction

networks, ecological networks, ﬁnancial networks, etc., and to understand their

structure.

The communities in most real-world networks, especially social networks,

overlap [

]. The same participant participates in diﬀerent communities at the

same time. Sometimes, a participant in a community might simultaneously be

a diﬀerent community member. A person in a social network is in his or her

family’s community while he/she is also in the community of his or her coworkers.

In reality, a node may belong to an unlimited number of communities in a social

network. This overlapping node is critical for various reasons, as it might be

used positively and negatively. For instance, in a social network context, a node

belonging to two disputing communities might be potentially used to resolve the

dispute. On the contrary, this node may disclose sensitive information about a

community to other communities violating privacy ethics. Indeed, the overlap

is an inherent characteristic of many real-world social networks [

]. Moreover,

disjoint community detection induces a partition in the network that eventually

breaks the communities [

]. Therefore, detecting overlapping communities in

diﬀerent networks is of utmost importance, and thus, it has attracted signiﬁcant

attention in the research community [66].

Many works in the literature are related to non-overlapping community

detection [

]. In the case of non-overlapping community detection, node

embedding approaches perform well [

]. However, most of these approaches

are not well suited for overlapping community detection due to the lack of scalable

and reliable clustering methods for high-dimensional vectors. A viable approach

to resolve this issue is to generate the node embedding as community aﬃliation,

which can be done eﬀectively and eﬃciently using GCNs [50].

Among the classical approaches for overlapping community detection, label

propagation algorithms are prominent [

]. For instance, SLPA is an overlapping

community detection algorithm based on the idea of label propagation [

]. DE-

MON [

], a node-centric bottom-up overlapping community detection algorithm,

also leverages ego network structure with overlapping label propagation. Other

algorithmic approaches include seed-set expansion [

], evolutionary algorithms

[

], and heuristics-based methods [

]. Some heuristics-based works

considered game-theoretic concepts [62].

Many studies consider some kinds of matrix factorization along with proba-

bilistic inference [

]. One of the earliest works in this regime

is BigClam [

]. BigClam was designed for overlapping community detection

in large networks [

]. BigClam uses gradient ascent to create an embedding,

which is later used to determine nodes’ community aﬃliation. MNMF [

] uses

modularity-based regularization with non-negative matrix factorization. DANMF

uses a telescopic non-negative matrix factorization approach based on a deep

auto-encoder to learn membership distribution over nodes [

]. Some works [

]

performed factorization of modularity matrices using neural nets, while other

works considered adversarial learning [

] or deep belief networks [

] to learn

Title Suppressed Due to Excessive Length 3

community aﬃliation. However, they did not use graphs in their neural network

architecture, which is crucial to achieving superior performance [

]. Finally,

NOCD [

] used a graph neural networks (GNN) based encoder-decoder model

to detect overlapping communities. Unlike NMF-based methods, it optimizes the

weights of a GNN to generate a better community aﬃliation matrix.

Graph neural networks (GNN) are a class of specialized neural networks that

can operate on graph-structured data [

]. The authors of NOCD [

] ﬁrst

developed a graph neural network-based approach for overlapping community

detection. In that approach, they used a two-layer shallow graph convolutional

network as an encoder to generate community embedding and a decoder based

on the Bernoulli-Poisson model [

] to reconstruct the original graph from

the embedding. However, a shallow GCN has essential limitations to capturing

communities with a larger community diameter [

] (

2) as it can aggregate

information only up to two hops from a node. One solution to the issue is to

consider a deep graph convolutional networks (DeepGCN) [

] instead

of a shallow GCN. However, Li et al. [

] proposed the DeepGCN in the case

of point cloud learning, where the considered graph is regular while the real-

world networks are mostly irregular graphs (having variable-sized neighborhoods).

The DeepGCN proposed by [

] is also speciﬁc to regular graphs where image

pixels are considered nodes, while [

] completely ignores dilated aggregation.

Moreover, [

] proposes a diﬀerent approach that requires QR factorization with

backpropagation, making it intractable for practical implementation. Thus, it is

essential to design a deep GCN that can handle the irregular graphs to detect

overlapping communities with a larger diameter.

Training deeper GCNs is challenging because of over-smoothing and vanishing

gradient problems [

]. Previous works [

] overcame this challenge

with the idea of residual connection and dilated aggregation borrowed from ResNet

[

]. Residual connection means having direct connectivity from a layer’s input

to that layer’s output, bypassing the non-linearity in the activation function. In

GCNs, each node aggregates information from the neighboring nodes. It is called

dilated aggregation if this aggregation mechanism aggregates information from

some of the distant neighbors while skipping some of the nearest neighbors. In

dilated aggregation, we may incorporate some randomization while considering

neighbors, which introduces the concept of edge dynamicity. Dilated aggregation

with edge dynamicity is called dynamic dilated aggregation. It is still challenging

to introduce dynamic dilated aggregation in irregular graphs to train deep GCNs.

Indeed, none of the existing works on deep GCN considered dilated aggregation

in irregular graphs.

In this study, we design dynamic dilated aggregation mechanisms in irregular

graphs. We incorporate our dynamic dilated aggregation schemes along with

residual connection into the GCN to obtain a deep GCN that can learn from

any graphs. Eventually, we apply our designed deep GCN to detect overlapping

communities in diﬀerent kinds of networks with diﬀerent sizes. For overlapping

community detection, we adopt an encoder-decoder-based approach similar to

NOCD. Our encoder is a deep GCN model called DynaResGCN, which generates

4 N. Muttakin et al.

the community embedding, and the decoder attempts to reconstruct the original

graph.

Our objective is summarized as follows:

–

Detecting overlapping communities with higher community diameters eﬀec-

tively

–Designing a DeepGCN model

In Table 1, we provide a conceptual comparison of our method with the relevant

literature.

Table 1: Conceptual comparison of our method to the relevant literature.

Algorithm/

Model

Overlapping

community

detection

Large

community

diameter

Residual

connection

Edge

dynamicity

Dil.Agg.

in irregular

graphs1

Deep

GCN

model

Reference

DynaResGCN+BP ✔ ✔ ✔ ✔ ✔ ✔ Ours

NOCD ✔ ✕ ✕ ✕ ✕ ✕ [50]

BigClam ✔ ✔ ✕ ✕ ✕ ✕ [69]

SLPA ✔ ✔ ✕ ✕ ✕ ✕ [67]

DEMON ✔ ✔ ✕ ✕ ✕ ✕ [15]

DANMF ✔ ✔ ✕ ✕ ✕ ✕ [73]

CDE ✔ ✔ ✕ ✕ ✕ ✕ [33]

EPM ✔ ✔ ✕ ✕ ✕ ✕ [74]

CESNA ✔ ✔ ✕ ✕ ✕ ✕ [71]

SNMF ✔ ✔ ✕ ✕ ✕ ✕ [59]

SNetOC ✔ ✔ ✕ ✕ ✕ ✕ [54]

DeepWalk with K-means ✔ ✔ ✕ ✕ ✕ ✔ [42]

Graph2Gauss with K-means ✔ ✔ ✕ ✕ ✕ ✕ [5]

DeepGCN ✕N/A ✔ ✔ ✕ ✔ [32]

GCDN ✕N/A ✔ ✔ ✕ ✔ [57]

GCNII ✕N/A ✔ ✕ ✕ ✔ [12]

Krylov GCN ✕N/A N/A N/A N/A ✔[35]

(1) Dilated aggregation in the graphs having variable-sized neighborhoods.

For evaluation, we consider a large research topics network [

], a set of

Facebook datasets [

] having reliable ground truth information, and a set of very

large co-authorship networks (nodes

) [

] as an additional benchmark. In

the case of the topics network, ground truth information related to overlapping

communities does not exist. Therefore, we have to rely on unsupervised metrics

called ﬁtness functions or quality measures such as conductance, clustering

coeﬃcient, density, and coverage [

] to determine the quality of the detected

overlapping communities. In addition to the evaluation metrics, we generate

heatmaps or colormaps to illustrate community overlap. We compare our results

in topics network with existing overlapping community detection approaches,

such as NOCD, BigClam, SLPA, DANMF, and DEMON. Our methods show

superior performance over most of the considered existing methods in terms of

both quality metrics and visualization. In the case of the datasets having ground

truth information, we compare with exactly the same baselines as in NOCD

considering both node features and without node features in terms of normalized

mutual information (NMI) [

]. Our method, DynaResGCN, achieves a clear

Title Suppressed Due to Excessive Length 5

superior performance over all the baselines in almost every case in terms of NMI.

In summary, our contributions are as follows:

–

Development of a deep residual GCN (DynaResGCN) model based on dy-

namic dilated aggregation in irregular graphs

–

Design of an eﬀective and scalable overlapping community detection frame-

work based on DynaResGCN and Bernoulli-Poisson model

–

Detection of overlapping communities in real-world networks outperforming

many state-of-the-art methods

The remainder of the paper is organized as follows. In Section 2, we introduce

preliminary terminologies and some of the prominent related works. We describe

our methodology in Section 3. Section 4 deals with our experimentation. In

section 5, we present our results and discuss our insights with overall ﬁndings.

Finally, Section 6 provides our conclusions.

2 Preliminaries

In this section, we introduce the necessary notations and terminologies that will

be used throughout the paper and discuss prominent related works brieﬂy.

2.1 Notations and Terminologies

We consider an undirected graph as

= (

V, E

), where

is the set of vertices

and

is the set of edges. We represent the adjacency matrix as

A∈ {

}n×n

the number of vertices as

, the set of vertices as

{

, . . . , n}

, and the

set of edges as

{

(

u, v

)

∈V×V

Auv

= 1

}

. For each node, we may

consider a

-dimensional feature vector. We denote the feature matrix as

X∈

Rn×d

. If no feature is associated with a node,

is an identity matrix. Let

us assume

is the set of communities and the cardinality of

. In the

case of overlapping community detection, we assign each node to some of the

communities in

. However, for each node, there is a strength associated with

each diﬀerent community. We deﬁne this strength as the community aﬃliation of

the node. For each node, there is a

-dimensional community aﬃliation vector.

We assign the community aﬃliation as a non-negative real number. Therefore, we

can obtain a community aﬃliation matrix as

F∈Rn×k

≥0

. For a node

, we denote

the membership strength of community

Fuc

. We use a threshold to obtain a

binary community aﬃliation matrix as

F∈ {

}n×k

. Therefore, each node can

be assigned to multiple communities. It is also possible that some nodes have no

communities at all. The following text describes the basic concepts related to

our paper.

Artiﬁcial neural networks are a class of machine learning models. The smallest

computing unit of a neural network is a neuron. The input of a neuron is a

vector. A neuron is a parameterized function whose output varies with diﬀerent

parameters for the same input. These neurons connected in a computational

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

OverlappingCommunityDetectionusingDynamicDilatedAggregationinDeepResidualGCNMdNurulMuttakin1[0000−0002−6114−6260],Md.IqbalHossain2[0000−0001−6212−7638],andMd.SaidurRahman1[0000−0003−0112−0242]1BangladeshUniversityofEngineeringandTechnology,Dhaka1000,Bangladesh2UniversityofArizona,USAAbstract.Overlap...

展开>> 收起<<

Overlapping Community Detection using Dynamic Dilated Aggregation in Deep Residual GCN Md Nurul Muttakin10000000261146260 Md. Iqbal.pdf

共35页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Overlapping Community Detection using Dynamic Dilated Aggregation in Deep Residual GCN Md Nurul Muttakin10000000261146260 Md. Iqbal

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: