Overlapping Community Detection using Dynamic Dilated Aggregation in Deep Residual GCN Md Nurul Muttakin10000000261146260 Md. Iqbal

2025-04-29 0 0 2.24MB 35 页 10玖币
侵权投诉
Overlapping Community Detection using Dynamic
Dilated Aggregation in Deep Residual GCN
Md Nurul Muttakin1[0000000261146260], Md. Iqbal
Hossain2[0000000162127638], and Md. Saidur Rahman1[0000000301120242]
1Bangladesh University of Engineering and Technology, Dhaka 1000, Bangladesh
2University of Arizona, USA
Abstract.
Overlapping community detection is a critical problem in
unsupervised machine learning on network-structured data. Some research
has considered applying graph convolutional networks (GCN) to tackle
the problem. However, the considered GCN is shallow and can not capture
communities with larger diameters. Moreover, it is still open to incorporate
dynamic dilated aggregation in deep GCNs for irregular graphs. In this
study, we design a deep dynamic residual GCN (DynaResGCN) based
on our dynamic dilated aggregation algorithm and a unified end-to-end
encoder-decoder-based framework to detect overlapping communities. The
deep DynaResGCN model is used as the encoder, whereas we incorporate
the Bernoulli-Poisson (BP) model as the decoder. Consequently, we
apply our overlapping community detection framework in a research
topics dataset without having ground truth, a set of networks from
Facebook having reliable (hand-labeled) ground truth, and in a set of
very large co-authorship networks having empirical (not hand-labeled)
ground truth. Our experimentation on these datasets shows significantly
superior performance over many state-of-the-art methods for detecting
overlapping communities in networks.
Keywords: Overlapping Community Detection ·Deep GCN ·Unsuper-
vised Learning
1 Introduction
Network structured data are ubiquitous. Almost every domain of science and
engineering has to deal with networks. The nodes of real-world networks often
form special groups where link density is high, and the density of edges is low
among these groups. This property of networks is called community structure
[
20
]. Human society is composed of communities that are in many cases virtual
due to the wide use of online social platforms. Indeed, social communities are
of great importance to social scientists and have been studied thoroughly for
a long time [
14
,
64
,
25
,
41
]. Communities in the global financial network play
a key role in influencing the transitions of the global economy and financial
system [
9
]. In protein-protein interaction networks, protein communities act as
functional modules to perform specific tasks in cells [
45
,
52
,
10
]. As a consequence,
arXiv:2210.11174v3 [cs.AI] 28 Sep 2024
2 N. Muttakin et al.
community detection is an essential tool to analyze different kinds of networks
such as social networks, research topics networks, protein-protein interaction
networks, ecological networks, financial networks, etc., and to understand their
structure.
The communities in most real-world networks, especially social networks,
overlap [
70
]. The same participant participates in different communities at the
same time. Sometimes, a participant in a community might simultaneously be
a different community member. A person in a social network is in his or her
family’s community while he/she is also in the community of his or her coworkers.
In reality, a node may belong to an unlimited number of communities in a social
network. This overlapping node is critical for various reasons, as it might be
used positively and negatively. For instance, in a social network context, a node
belonging to two disputing communities might be potentially used to resolve the
dispute. On the contrary, this node may disclose sensitive information about a
community to other communities violating privacy ethics. Indeed, the overlap
is an inherent characteristic of many real-world social networks [
29
]. Moreover,
disjoint community detection induces a partition in the network that eventually
breaks the communities [
44
]. Therefore, detecting overlapping communities in
different networks is of utmost importance, and thus, it has attracted significant
attention in the research community [66].
Many works in the literature are related to non-overlapping community
detection [
1
,
58
]. In the case of non-overlapping community detection, node
embedding approaches perform well [
55
,
8
]. However, most of these approaches
are not well suited for overlapping community detection due to the lack of scalable
and reliable clustering methods for high-dimensional vectors. A viable approach
to resolve this issue is to generate the node embedding as community affiliation,
which can be done effectively and efficiently using GCNs [50].
Among the classical approaches for overlapping community detection, label
propagation algorithms are prominent [
37
]. For instance, SLPA is an overlapping
community detection algorithm based on the idea of label propagation [
67
]. DE-
MON [
15
], a node-centric bottom-up overlapping community detection algorithm,
also leverages ego network structure with overlapping label propagation. Other
algorithmic approaches include seed-set expansion [
3
,
18
], evolutionary algorithms
[
36
], and heuristics-based methods [
47
,
17
,
21
,
34
]. Some heuristics-based works
considered game-theoretic concepts [62].
Many studies consider some kinds of matrix factorization along with proba-
bilistic inference [
69
,
74
,
54
,
31
,
30
,
33
,
59
]. One of the earliest works in this regime
is BigClam [
69
]. BigClam was designed for overlapping community detection
in large networks [
69
]. BigClam uses gradient ascent to create an embedding,
which is later used to determine nodes’ community affiliation. MNMF [
60
] uses
modularity-based regularization with non-negative matrix factorization. DANMF
uses a telescopic non-negative matrix factorization approach based on a deep
auto-encoder to learn membership distribution over nodes [
73
]. Some works [
7
,
72
]
performed factorization of modularity matrices using neural nets, while other
works considered adversarial learning [
28
,
11
] or deep belief networks [
26
] to learn
Title Suppressed Due to Excessive Length 3
community affiliation. However, they did not use graphs in their neural network
architecture, which is crucial to achieving superior performance [
50
]. Finally,
NOCD [
50
] used a graph neural networks (GNN) based encoder-decoder model
to detect overlapping communities. Unlike NMF-based methods, it optimizes the
weights of a GNN to generate a better community affiliation matrix.
Graph neural networks (GNN) are a class of specialized neural networks that
can operate on graph-structured data [
22
,
63
,
68
]. The authors of NOCD [
50
] first
developed a graph neural network-based approach for overlapping community
detection. In that approach, they used a two-layer shallow graph convolutional
network as an encoder to generate community embedding and a decoder based
on the Bernoulli-Poisson model [
69
,
74
,
54
] to reconstruct the original graph from
the embedding. However, a shallow GCN has essential limitations to capturing
communities with a larger community diameter [
19
] (
>
2) as it can aggregate
information only up to two hops from a node. One solution to the issue is to
consider a deep graph convolutional networks (DeepGCN) [
32
,
12
,
57
,
35
] instead
of a shallow GCN. However, Li et al. [
32
] proposed the DeepGCN in the case
of point cloud learning, where the considered graph is regular while the real-
world networks are mostly irregular graphs (having variable-sized neighborhoods).
The DeepGCN proposed by [
57
] is also specific to regular graphs where image
pixels are considered nodes, while [
12
] completely ignores dilated aggregation.
Moreover, [
35
] proposes a different approach that requires QR factorization with
backpropagation, making it intractable for practical implementation. Thus, it is
essential to design a deep GCN that can handle the irregular graphs to detect
overlapping communities with a larger diameter.
Training deeper GCNs is challenging because of over-smoothing and vanishing
gradient problems [
32
,
12
,
57
,
35
]. Previous works [
32
,
57
,
12
] overcame this challenge
with the idea of residual connection and dilated aggregation borrowed from ResNet
[
24
]. Residual connection means having direct connectivity from a layer’s input
to that layer’s output, bypassing the non-linearity in the activation function. In
GCNs, each node aggregates information from the neighboring nodes. It is called
dilated aggregation if this aggregation mechanism aggregates information from
some of the distant neighbors while skipping some of the nearest neighbors. In
dilated aggregation, we may incorporate some randomization while considering
neighbors, which introduces the concept of edge dynamicity. Dilated aggregation
with edge dynamicity is called dynamic dilated aggregation. It is still challenging
to introduce dynamic dilated aggregation in irregular graphs to train deep GCNs.
Indeed, none of the existing works on deep GCN considered dilated aggregation
in irregular graphs.
In this study, we design dynamic dilated aggregation mechanisms in irregular
graphs. We incorporate our dynamic dilated aggregation schemes along with
residual connection into the GCN to obtain a deep GCN that can learn from
any graphs. Eventually, we apply our designed deep GCN to detect overlapping
communities in different kinds of networks with different sizes. For overlapping
community detection, we adopt an encoder-decoder-based approach similar to
NOCD. Our encoder is a deep GCN model called DynaResGCN, which generates
4 N. Muttakin et al.
the community embedding, and the decoder attempts to reconstruct the original
graph.
Our objective is summarized as follows:
Detecting overlapping communities with higher community diameters effec-
tively
Designing a DeepGCN model
In Table 1, we provide a conceptual comparison of our method with the relevant
literature.
Table 1: Conceptual comparison of our method to the relevant literature.
Algorithm/
Model
Overlapping
community
detection
Large
community
diameter
Residual
connection
Edge
dynamicity
Dil.Agg.
in irregular
graphs1
Deep
GCN
model
Reference
DynaResGCN+BP ✔ ✔ ✔ Ours
NOCD ✕ ✕ ✕ [50]
BigClam [69]
SLPA [67]
DEMON [15]
DANMF [73]
CDE [33]
EPM [74]
CESNA [71]
SNMF [59]
SNetOC [54]
DeepWalk with K-means [42]
Graph2Gauss with K-means [5]
DeepGCN N/A [32]
GCDN N/A [57]
GCNII N/A [12]
Krylov GCN N/A N/A N/A N/A [35]
(1) Dilated aggregation in the graphs having variable-sized neighborhoods.
For evaluation, we consider a large research topics network [
6
], a set of
Facebook datasets [
39
] having reliable ground truth information, and a set of very
large co-authorship networks (nodes
>
10
K
) [
50
] as an additional benchmark. In
the case of the topics network, ground truth information related to overlapping
communities does not exist. Therefore, we have to rely on unsupervised metrics
called fitness functions or quality measures such as conductance, clustering
coefficient, density, and coverage [
38
,
69
] to determine the quality of the detected
overlapping communities. In addition to the evaluation metrics, we generate
heatmaps or colormaps to illustrate community overlap. We compare our results
in topics network with existing overlapping community detection approaches,
such as NOCD, BigClam, SLPA, DANMF, and DEMON. Our methods show
superior performance over most of the considered existing methods in terms of
both quality metrics and visualization. In the case of the datasets having ground
truth information, we compare with exactly the same baselines as in NOCD
considering both node features and without node features in terms of normalized
mutual information (NMI) [
40
]. Our method, DynaResGCN, achieves a clear
Title Suppressed Due to Excessive Length 5
superior performance over all the baselines in almost every case in terms of NMI.
In summary, our contributions are as follows:
Development of a deep residual GCN (DynaResGCN) model based on dy-
namic dilated aggregation in irregular graphs
Design of an effective and scalable overlapping community detection frame-
work based on DynaResGCN and Bernoulli-Poisson model
Detection of overlapping communities in real-world networks outperforming
many state-of-the-art methods
The remainder of the paper is organized as follows. In Section 2, we introduce
preliminary terminologies and some of the prominent related works. We describe
our methodology in Section 3. Section 4 deals with our experimentation. In
section 5, we present our results and discuss our insights with overall findings.
Finally, Section 6 provides our conclusions.
2 Preliminaries
In this section, we introduce the necessary notations and terminologies that will
be used throughout the paper and discuss prominent related works briefly.
2.1 Notations and Terminologies
We consider an undirected graph as
G
= (
V, E
), where
V
is the set of vertices
and
E
is the set of edges. We represent the adjacency matrix as
A∈ {
0
,
1
}n×n
,
the number of vertices as
n
, the set of vertices as
V
=
{
1
, . . . , n}
, and the
set of edges as
E
=
{
(
u, v
)
V×V
:
Auv
= 1
}
. For each node, we may
consider a
d
-dimensional feature vector. We denote the feature matrix as
X
Rn×d
. If no feature is associated with a node,
X
is an identity matrix. Let
us assume
C
is the set of communities and the cardinality of
C
is
k
. In the
case of overlapping community detection, we assign each node to some of the
communities in
C
. However, for each node, there is a strength associated with
each different community. We define this strength as the community affiliation of
the node. For each node, there is a
k
-dimensional community affiliation vector.
We assign the community affiliation as a non-negative real number. Therefore, we
can obtain a community affiliation matrix as
FRn×k
0
. For a node
u
, we denote
the membership strength of community
c
as
Fuc
. We use a threshold to obtain a
binary community affiliation matrix as
F∈ {
0
,
1
}n×k
. Therefore, each node can
be assigned to multiple communities. It is also possible that some nodes have no
communities at all. The following text describes the basic concepts related to
our paper.
Artificial neural networks are a class of machine learning models. The smallest
computing unit of a neural network is a neuron. The input of a neuron is a
vector. A neuron is a parameterized function whose output varies with different
parameters for the same input. These neurons connected in a computational
摘要:

OverlappingCommunityDetectionusingDynamicDilatedAggregationinDeepResidualGCNMdNurulMuttakin1[0000−0002−6114−6260],Md.IqbalHossain2[0000−0001−6212−7638],andMd.SaidurRahman1[0000−0003−0112−0242]1BangladeshUniversityofEngineeringandTechnology,Dhaka1000,Bangladesh2UniversityofArizona,USAAbstract.Overlap...

展开>> 收起<<
Overlapping Community Detection using Dynamic Dilated Aggregation in Deep Residual GCN Md Nurul Muttakin10000000261146260 Md. Iqbal.pdf

共35页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:35 页 大小:2.24MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 35
客服
关注