Fast Community Detection in Dynamic and Heterogeneous Networks Maoyu Zhang1 Jingfei Zhang2and Wenlin Dai1

2025-05-06 0 0 1.61MB 42 页 10玖币
侵权投诉
Fast Community Detection in Dynamic and
Heterogeneous Networks
Maoyu Zhang1, Jingfei Zhang2and Wenlin Dai1
1Institute of Statistics and Big Data, Renmin University of China
2Department of Management Science, University of Miami
Abstract
Dynamic heterogeneous networks describe the temporal evolution of interactions
among nodes and edges of different types. While there is a rich literature on finding
communities in dynamic networks, the application of these methods to dynamic het-
erogeneous networks can be inappropriate, due to the involvement of different types
of nodes and edges and the need to treat them differently.In this paper, we propose
a statistical framework for detecting common communities in dynamic and hetero-
geneous networks. Under this framework, we develop a fast community detection
method called DHNet that can efficiently estimate the community label as well as the
number of communities. An attractive feature of DHNet is that it does not require the
number of communities to be known a priori, a common assumption in community
detection methods. While DHNet does not require any parametric assumptions on
the underlying network model, we show that the identified label is consistent under a
time-varying heterogeneous stochastic block model with a temporal correlation struc-
ture and edge sparsity. We further illustrate the utility of DHNet through simulations
and an application to review data from Yelp, where DHNet shows improvements both
in terms of accuracy and interpretability over existing solutions.
Keywords: dynamic heterogeneous network, modularity, community detection, null model,
consistency, Yelp reviews.
#Maoyu Zhang and Jingfei Zhang are joint first authors.
1
arXiv:2210.13596v2 [stat.CO] 29 Oct 2022
1 Introduction
One of the fundamental problems in network data analysis is community detection that
aims to divide the network into non-overlapping groups of nodes such that nodes within
the same community are densely connected and nodes from different communities are rel-
atively sparsely connected. Community detection can provide valuable insights on the
organization of a network and greatly facilitate the analysis of network characteristics. As
such, community detection methods have been applied to numerous scientific fields such as
social science (Moody and White,2003), biology (Sørlie et al.,2001) and business (Linden
et al.,2003). Over the past few decades, the problem of community detection has been
approached from methodological, algorithmic and theoretical perspectives with substantial
developments. We refer to Fortunato (2010) and Abbe (2017) for comprehensive reviews
on this topic.
While the majority of existing community detection methods are developed for a homo-
geneous network or a dynamic network, networks that are dynamic and heterogeneous are
fast emerging in recent years. For example, in a dynamic healthcare network, nodes can be
patients,diseases,doctors and hospitals and edges can be in the type of patient-disease (pa-
tient treated for disease) and patient-doctor (patient treated by doctor) and doctor-hospital
(doctor works at hospital). These edges are expected to evolve with time as patients may
develop new diseases that are treated by different doctors at possibly different hospitals.
Figure 1provides an illustration of a dynamic heterogeneous Yelp review network, which
is analyzed in Section 6. In this figure, there are three types of nodes including users,
businesses and categories and three types of edges including user-user (user is friend with
user), user-business (business is reviewed by user) and business-category (business is la-
beled with category). As users review different businesses over time, this network is both
heterogeneous and dynamic.
2
𝑡"𝑡#
Figure 1: The dynamic heterogeneous Yelp network with two communities, including three
types of nodes (user, business, category) and edges (user is friend with user, business is
reviewed by user, business is labeled with category).
Due to the rich information embedded in a dynamic heterogeneous network, many meth-
ods have been developed recently for its analysis, such as network embedding (Wang et al.,
2022;Zhang et al.,2022), representation learning (Yin et al.,2019) and link prediction
(Xue et al.,2020;Jiang et al.,2021). However, community detection in dynamic hetero-
geneous networks is less studied. One relevant work is Sun et al. (2010), which provides
a mixture model-based generative model for estimating the community structure, which is
assumed to be time-varying. Other works on this topic include Sengupta and Chen (2015)
and Zhang and Cao (2017), though they only focus on a single heterogeneous network.
In our work, we focus on detecting common communities in a dynamic heterogeneous
network, that is, the community assignment does not vary with time but the interactions
within and between communities do. Finding common communities are useful in many
applications. For example, in genetic studies and brain connectivity studies, the common
communities represent functional groups of genes or brain regions that are coordinated in
biological processes, and identifying them is of keen scientific interests (Zhang and Cao,
2017;Zhang et al.,2020). In the Yelp review network, it is plausible that businesses,
3
categories and the majority of users have a common community structure over time, as
the service offered by a business and the interests of users (e.g., pets, parks, fine dining)
are often stable over a period of time. One notable advantage of considering a common
community structure is that the networks observed at different time points are allowed
to be highly sparse if S, the number of time points, increases. For example, we show in
Theorem 1that consistent community detection is achievable as long as λS → ∞, where λ
is the average degree, while the single network case requires λ→ ∞ to achieve community
detection consistency. Moreover, our approach allows the community strength to be highly
variable over time. For example, a community needs to be active for only a very short
period of time for it to be consistently identified; see more discussions after Theorem 1.
In this paper, we propose a statistical framework for modularity-based common com-
munity detection in the dynamic heterogeneous network, where no parametric assumptions
are made on the model underlying the observed networks. Under this framework, we de-
velop a fast community detection method called DHNet that can efficiently estimate the
community label as well as the number of communities. An attractive feature of DHNet
is that it does not require the number of communities to be known a priori, a common
assumption in community detection methods. Although DHNet does not rely on parametric
assumptions on the underlying network model, we propose a new dynamic heterogeneous
stochastic block model with a temporal correlation structure and edge sparsity, and show
that DHNet can consistently estimate the community label under this model. This provides
theoretical justifications of the proposed method and also sheds lights on how different
network properties (e.g., sparsity, size, community strength) affect its performance. The
consistency property of our method when applied to dynamic bi-partite or multi-partite
networks follows as special cases.
The remainder of the article is organized as follows. Section 2describes a community
4
detection framework and proposes a modularity function for finding common communities
in a dynamic heterogeneous network. Section 3describes a fast community detection
method called DHNet that can efficiently estimate the community label as well as the number
of communities. Section 4shows the consistency property of DHNet under a dynamic
heterogeneous stochastic block model. Section 5demonstrates the efficacy of DHNet through
simulation studies and Section 6applies the proposed method to review data from Yelp.
The paper is concluded with a short discussion section.
2 Community Detection with Modularity
2.1 Notation
We write [m] = {1, . . . , m}for an integer m > 0. To ease notation, we start the introduction
with a single heterogeneous networks with Ltypes of nodes. Let V[l]= (v[l]
1, . . . , v[l]
nl)be
the set containing the l-th type of nodes for l[L], where nlis the number of l-th type
nodes. Denote the heterogeneous network as G= (L
l=1V[l],E ∪ E+), where set Econtains
edges between nodes of the same type and set E+contains edges between nodes of different
types. When E=,Gforms a multi-partite network, i.e., edges are only established
between different types of nodes. Let G[l]denote the homogeneous network formed within
node set V[l]with an nl×nladjacency matrix A[l], and G[l1l2]= (V[l1]V[l2], E[l1l2])denote
the bi-partite network formed between node sets V[l1]and V[l2]with an nl1×nl2bi-adjacency
matrix A[l1l2],l1, l2[L]. See Figure 2for an example of a heterogeneous network with
L= 2.
Consider a dynamic heterogeneous network {G(t), t ∈ T } with Ltypes of nodes, where
G(t)=(SL
l=1 V[l],E(t)∪ E+(t)) is a heterogeneous network at time tdefined as above. The
5
摘要:

FastCommunityDetectioninDynamicandHeterogeneousNetworksMaoyuZhang1,JingfeiZhang2andWenlinDai11InstituteofStatisticsandBigData,RenminUniversityofChina2DepartmentofManagementScience,UniversityofMiamiAbstractDynamicheterogeneousnetworksdescribethetemporalevolutionofinteractionsamongnodesandedgesofdier...

展开>> 收起<<
Fast Community Detection in Dynamic and Heterogeneous Networks Maoyu Zhang1 Jingfei Zhang2and Wenlin Dai1.pdf

共42页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:42 页 大小:1.61MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 42
客服
关注