Towards Real-Time Temporal Graph Learning Deniz Gurevin Mohsin Shan Tong Gengy Weiwen Jiangz Caiwen Dingand Omer Khan University of Connecticut Storrs CT USA

2025-05-06 0 0 1.34MB 9 页 10玖币
侵权投诉
Towards Real-Time Temporal Graph Learning
Deniz Gurevin, Mohsin Shan, Tong Geng, Weiwen Jiang, Caiwen Dingand Omer Khan
University of Connecticut, Storrs, CT, USA
University of Rochester, Rochester, NY, USA
George Mason University, Fairfax, VA, USA
{deniz.gurevin, mohsin.shan, caiwen.ding, khan}@uconn.edu, tgeng@ur.rochester.edu, wjiang8@gmu.edu
Abstract—In recent years, graph representation learning has
gained significant popularity, which aims to generate node
embeddings that capture features of graphs. One of the methods
to achieve this is employing a technique called random walks that
captures node sequences in a graph and then learns embeddings
for each node using a natural language processing technique
called Word2Vec. These embeddings are then used for deep
learning on graph data for classification tasks, such as link
prediction or node classification. Prior work operates on pre-
collected temporal graph data and is not designed to handle
updates on a graph in real-time. Real world graphs change
dynamically and their entire temporal updates are not available
upfront. In this paper, we propose an end-to-end graph learning
pipeline that performs temporal graph construction, creates low-
dimensional node embeddings, and trains multi-layer neural
network models in an online setting. The training of the neural
network models is identified as the main performance bottleneck
as it performs repeated matrix operations on many sequentially
connected low-dimensional kernels. We propose to unlock fine-
grain parallelism in these low-dimensional kernels to boost
performance of model training.
Index Terms—graph learning algorithm, random walks, dy-
namic graphs, performance characterization
I. INTRODUCTION
Graph representation learning (GRL) utilizes artificial intel-
ligence methods to learn the representation of graph structured
data [1]–[6]. It has gained significant popularity in various
application areas from social networks [7], [8], to biology
and chemistry [9]–[11]. Although prior works have analyzed
the performance of GRL workloads [12]–[14], they have
mostly focused on static input graphs using graph representa-
tion learning techniques such as Graph Convolution Network
(GCN) [15] and others [2], [16].
To learn graph dynamics on temporally changing graphs,
random walk based GRL algorithms have been proposed
[17]. Unlike static graphs, temporal graphs are dynamically
changing graphs with time data associated with each inter-
action between nodes. The front-end of the pipeline takes a
temporal graph as an input and maps the underlying graph
structure into a low-dimension embedding space by feeding
temporal random walks [18] into a word2vec model, which is a
common technique from Natural Language Processing (NLP)
[19]. This GRL process outputs node embeddings that capture
the underlying features of the nodes in the graph. These node
embeddings are then used to train a Feed-forward Neural
Network (FNN) for link prediction or node classification tasks.
A shortcoming of prior work [17] is that the input graph
to the pipeline is not temporally updated. The workload takes
graph data that contains the entire temporal information and
performs the GRL algorithm to train an FNN on the pre-
collected graph data. Therefore, the pipeline is not designed
to assign temporal updates on the GRL algorithm and train
FNNs on the fly with streaming snapshots of dynamic graphs.
However, real world graphs change dynamically and their
temporal updates are not available all at once, but sequentially
collected between graph snapshots associated with timestamps
[20]–[22]. At each timestamp, GRL needs to be performed
only on the updated portion of the graph. More importantly,
node embeddings that capture the entire history of graph
dynamics are not available to the FNN prior to the training in
a real world setting. As the graph evolves, embeddings also
change temporally, and the FNN training at timestamp tis
performed with the node embeddings that contain information
about only the current and previous timestamps {0,1, ..., t}
and not future timestamps {t+ 1,t+ 2, ..., T}. Due to these
reasons, it is computationally infeasible to perform the FNN
training on the entire temporal graph data once. The training
must be performed on discrete data batches that are collected
in sequential order. In the previous work, this type of training
approach has not been evaluated.
The main objective of this paper is to implement and
characterize a temporally updating graph learning pipeline and
perform the FNN training using an online setting where the
training data becomes available as the graph evolves. Our
pipeline takes temporal graph snapshots at each timestamp
t, and starts with an R-Tree based graph construction step
[23] that can keep track of the new updates in the graph since
the last timestamp t1. These updates are then streamed
into the GRL step that performs random walks and word2vec
on the graph to obtain node embeddings. We utilize EvoNRL
[24] that continuously learns embeddings from the temporally
updated graph into low-dimension graph representations, while
avoiding redundant updates between graph snapshots. The
graph updates are then forwarded to the final training step
for link prediction or node classification. Here, we adopt
an online learning strategy, where a single temporal graph
batch is trained for consecutive iterations at each timestamp
t. This step concludes the temporal graph learning pipeline
for a timestamp. These steps are repeated for the subsequent
timestamps, {t+ 1, t + 2, ..., N}.
Our evaluation of the temporal GRL pipeline shows that
arXiv:2210.04114v2 [cs.LG] 12 Oct 2022
the execution time is dominated by the FNN training phase.
Although several parallelization methods are proposed for
random walks and word2vec in [17], the FNN training im-
plementation is left out for future performance enhancements.
In this paper, we explore fine-grain parallelization of FNN
training for performance acceleration. Our strategy considers
parallelization opportunities in the individual matrix kernels in
the forward and backward propagation paths for each iteration
of the FNN model. We implement four state-of-the-art matrix
multiplication (MM) algorithms (i.e., inner, outer, row-wise
and column-wise product) for each low-dimensional kernel in
the FNN model. The evaluation on a large core count shared
memory multicore shows that using the right parallelization
strategy yields significant potential for performance scaling.
In summary, we make the following contributions:
We propose an end-to-end temporal random walk-
based graph learning algorithm for online processing
of temporal graphs. The proposed real-time GRL al-
gorithm is implemented in Python, and released as
an open-source benchmark at https://github.com/grvndnz/
ST-Graph-Learning. The performance analysis identifies
the FNN training as the main performance bottleneck.
We implement a C++ parallel implementation of the N-
layer Feed-forward Neural Network (FNN) pipeline. The
individual matrix kernels in the forward and backward
propagation steps of FNN training are evaluated for in-
depth performance analysis. We implement four state-of-
the-art parallelization strategies for the low-dimensional
matrix-multiplications in the FNN pipeline to evaluate
the performance scaling potential on multicore proces-
sors. The FNN pipeline implementations are released as
an open-source benchmark at https://github.com/grvndnz/
Parallel-FNN.
II. RELATED WORK
In the recent years, there has been a surge in research in
the area of graph representation learning (GRL), which aims
to encode graph structure into a low-dimensional embedding
space [25]. The main goal of GRL research is to optimize this
encoding in a way that the representations in the learned space
reflect the original graph structure. Some of the early works
in GRL include DeepWalk [4] and Node2Vec [26], which
leverage node proximity in graphs using the the idea of word
proximity NLP [19]. Later, other works have incorporated this
idea to learn graph structural properties such as similarity in
degree sequences [6] and the behavioral roles of the nodes
[5]. These embedding algorithms train node embeddings for
individual nodes, and therefore, require additional training via
stochastic gradient descent to make predictions on new nodes.
There are works for learning inductive node embeddings that
combine external node features into graph structures. These
works include graph convolutional networks (GCN) [15],
GNNs [1], GraphSAGE [2], and Graph Attention Networks
(GAT) [3].
Most research for deep learning on graphs assumes that the
underlying graph is static. However, the idea of temporally
changing graphs is more realistic when it comes to most real-
life systems. While static graph learning models can be applied
to temporal graphs by ignoring the temporal evolution [27],
temporal graph structure contains significant insights about
the system. There have been works that explore processing
temporal graphs as a sequence of snapshots [28]–[31] that can
capture the evolution of temporal graph dynamics. Similarly,
streaming graphs process temporally changing data in the
finest granularity in terms of time and it is computationally
much more expensive compared to snapshots [32].
Previously, in terms of processing temporal graphs for graph
representation learning, Talati et al. [17] proposed an imple-
mentation using a temporal random-walk based algorithm.
This work has performed detailed algorithm and hardware
based performance characterization of the pipeline and identi-
fied several execution bottlenecks. However, the analyzed GRL
and FNN training algorithms are performed on pre-collected
graph data and not temporally evolving snapshots of graph.
Consequently, the FNN model is trained with node embed-
dings that capture the entire history of the graph structure
(past and future node interactions). In this work, we propose
an implementation that evaluates prediction tasks in an online
setting on temporally changing graphs.
III. RUN-TIME TEMPORAL GRAPH LEARNING
In this section, we give the details of the proposed run-time
GRL algorithm for temporal graphs. We begin with giving the
following definitions.
Definition 1: A graph Gis defined as a tuple G= (V, E)
where V={v0, v1, ..., vn}is the set of nnodes and E=
{e0, e1, ..., em}is the set of medges where an edge connects
two nodes.
In the case of temporal graphs, where nodes are contin-
uously added/removed and edges dynamically change with
time, in order to maintain these temporal updates, the graph
is processed into a sequence of discrete graph snapshots.
Therefore, a temporal network is traditionally represented as a
sequence of static graphs (G1, G2, ..., GT)for Ttimestamps.
Similarly, an edge between node vaand node vbat a timestamp
t∈ {0,1, ..., T 1}is represented as (va, vb, t).
Definition 2: In a graph G= (V, E), a random walk from
node vato node vbis defined as a sequence of connected
edges w={(va, v1),(v1, v2), ..., (vk, vb)}where k+ 1 is the
length of w.
The main concept of random walks is that they capture the
structure of the graph and node properties by randomly visiting
neighboring nodes and sampling the graph. In order to math-
ematically represent these node properties, a GRL algorithm
maps node properties in a graph into a low-dimensional space.
Definition 3: Given an input graph G= (V, E), a graph
representation algorithm f:GRdmaps nodes of the graph
into a d-dimensional space that captures the properties and
closeness of the nodes.
摘要:

TowardsReal-TimeTemporalGraphLearningDenizGurevin,MohsinShan,TongGengy,WeiwenJiangz,CaiwenDingandOmerKhanUniversityofConnecticut,Storrs,CT,USAyUniversityofRochester,Rochester,NY,USAzGeorgeMasonUniversity,Fairfax,VA,USAfdeniz.gurevin,mohsin.shan,caiwen.ding,khang@uconn.edu,ytgeng@ur.rochester.e...

展开>> 收起<<
Towards Real-Time Temporal Graph Learning Deniz Gurevin Mohsin Shan Tong Gengy Weiwen Jiangz Caiwen Dingand Omer Khan University of Connecticut Storrs CT USA.pdf

共9页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:9 页 大小:1.34MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 9
客服
关注