Towards Real-Time Temporal Graph Learning Deniz Gurevin Mohsin Shan Tong Gengy Weiwen Jiangz Caiwen Dingand Omer Khan University of Connecticut Storrs CT USA

2025-05-06 0 0 1.34MB 9 页 10玖币

侵权投诉

Towards Real-Time Temporal Graph Learning

Deniz Gurevin∗, Mohsin Shan∗, Tong Geng†, Weiwen Jiang‡, Caiwen Ding∗and Omer Khan∗

∗University of Connecticut, Storrs, CT, USA

†University of Rochester, Rochester, NY, USA

‡George Mason University, Fairfax, VA, USA

∗{deniz.gurevin, mohsin.shan, caiwen.ding, khan}@uconn.edu, †tgeng@ur.rochester.edu, ‡wjiang8@gmu.edu

Abstract—In recent years, graph representation learning has

gained signiﬁcant popularity, which aims to generate node

embeddings that capture features of graphs. One of the methods

to achieve this is employing a technique called random walks that

captures node sequences in a graph and then learns embeddings

for each node using a natural language processing technique

called Word2Vec. These embeddings are then used for deep

learning on graph data for classiﬁcation tasks, such as link

prediction or node classiﬁcation. Prior work operates on pre-

collected temporal graph data and is not designed to handle

updates on a graph in real-time. Real world graphs change

dynamically and their entire temporal updates are not available

upfront. In this paper, we propose an end-to-end graph learning

pipeline that performs temporal graph construction, creates low-

dimensional node embeddings, and trains multi-layer neural

network models in an online setting. The training of the neural

network models is identiﬁed as the main performance bottleneck

as it performs repeated matrix operations on many sequentially

connected low-dimensional kernels. We propose to unlock ﬁne-

grain parallelism in these low-dimensional kernels to boost

performance of model training.

Index Terms—graph learning algorithm, random walks, dy-

namic graphs, performance characterization

I. INTRODUCTION

Graph representation learning (GRL) utilizes artiﬁcial intel-

ligence methods to learn the representation of graph structured

data [1]–[6]. It has gained signiﬁcant popularity in various

application areas from social networks [7], [8], to biology

and chemistry [9]–[11]. Although prior works have analyzed

the performance of GRL workloads [12]–[14], they have

mostly focused on static input graphs using graph representa-

tion learning techniques such as Graph Convolution Network

(GCN) [15] and others [2], [16].

To learn graph dynamics on temporally changing graphs,

random walk based GRL algorithms have been proposed

[17]. Unlike static graphs, temporal graphs are dynamically

changing graphs with time data associated with each inter-

action between nodes. The front-end of the pipeline takes a

temporal graph as an input and maps the underlying graph

structure into a low-dimension embedding space by feeding

temporal random walks [18] into a word2vec model, which is a

common technique from Natural Language Processing (NLP)

[19]. This GRL process outputs node embeddings that capture

the underlying features of the nodes in the graph. These node

embeddings are then used to train a Feed-forward Neural

Network (FNN) for link prediction or node classiﬁcation tasks.

A shortcoming of prior work [17] is that the input graph

to the pipeline is not temporally updated. The workload takes

graph data that contains the entire temporal information and

performs the GRL algorithm to train an FNN on the pre-

collected graph data. Therefore, the pipeline is not designed

to assign temporal updates on the GRL algorithm and train

FNNs on the ﬂy with streaming snapshots of dynamic graphs.

However, real world graphs change dynamically and their

temporal updates are not available all at once, but sequentially

collected between graph snapshots associated with timestamps

[20]–[22]. At each timestamp, GRL needs to be performed

only on the updated portion of the graph. More importantly,

node embeddings that capture the entire history of graph

dynamics are not available to the FNN prior to the training in

a real world setting. As the graph evolves, embeddings also

change temporally, and the FNN training at timestamp tis

performed with the node embeddings that contain information

about only the current and previous timestamps {0,1, ..., t}

and not future timestamps {t+ 1,t+ 2, ..., T}. Due to these

reasons, it is computationally infeasible to perform the FNN

training on the entire temporal graph data once. The training

must be performed on discrete data batches that are collected

in sequential order. In the previous work, this type of training

approach has not been evaluated.

The main objective of this paper is to implement and

characterize a temporally updating graph learning pipeline and

perform the FNN training using an online setting where the

training data becomes available as the graph evolves. Our

pipeline takes temporal graph snapshots at each timestamp

t, and starts with an R-Tree based graph construction step

[23] that can keep track of the new updates in the graph since

the last timestamp t−1. These updates are then streamed

into the GRL step that performs random walks and word2vec

on the graph to obtain node embeddings. We utilize EvoNRL

[24] that continuously learns embeddings from the temporally

updated graph into low-dimension graph representations, while

avoiding redundant updates between graph snapshots. The

graph updates are then forwarded to the ﬁnal training step

for link prediction or node classiﬁcation. Here, we adopt

an online learning strategy, where a single temporal graph

batch is trained for consecutive iterations at each timestamp

t. This step concludes the temporal graph learning pipeline

for a timestamp. These steps are repeated for the subsequent

timestamps, {t+ 1, t + 2, ..., N}.

Our evaluation of the temporal GRL pipeline shows that

arXiv:2210.04114v2 [cs.LG] 12 Oct 2022

the execution time is dominated by the FNN training phase.

Although several parallelization methods are proposed for

random walks and word2vec in [17], the FNN training im-

plementation is left out for future performance enhancements.

In this paper, we explore ﬁne-grain parallelization of FNN

training for performance acceleration. Our strategy considers

parallelization opportunities in the individual matrix kernels in

the forward and backward propagation paths for each iteration

of the FNN model. We implement four state-of-the-art matrix

multiplication (MM) algorithms (i.e., inner, outer, row-wise

and column-wise product) for each low-dimensional kernel in

the FNN model. The evaluation on a large core count shared

memory multicore shows that using the right parallelization

strategy yields signiﬁcant potential for performance scaling.

In summary, we make the following contributions:

•We propose an end-to-end temporal random walk-

based graph learning algorithm for online processing

of temporal graphs. The proposed real-time GRL al-

gorithm is implemented in Python, and released as

an open-source benchmark at https://github.com/grvndnz/

ST-Graph-Learning. The performance analysis identiﬁes

the FNN training as the main performance bottleneck.

•We implement a C++ parallel implementation of the N-

layer Feed-forward Neural Network (FNN) pipeline. The

individual matrix kernels in the forward and backward

propagation steps of FNN training are evaluated for in-

depth performance analysis. We implement four state-of-

the-art parallelization strategies for the low-dimensional

matrix-multiplications in the FNN pipeline to evaluate

the performance scaling potential on multicore proces-

sors. The FNN pipeline implementations are released as

an open-source benchmark at https://github.com/grvndnz/

Parallel-FNN.

II. RELATED WORK

In the recent years, there has been a surge in research in

the area of graph representation learning (GRL), which aims

to encode graph structure into a low-dimensional embedding

space [25]. The main goal of GRL research is to optimize this

encoding in a way that the representations in the learned space

reﬂect the original graph structure. Some of the early works

in GRL include DeepWalk [4] and Node2Vec [26], which

leverage node proximity in graphs using the the idea of word

proximity NLP [19]. Later, other works have incorporated this

idea to learn graph structural properties such as similarity in

degree sequences [6] and the behavioral roles of the nodes

[5]. These embedding algorithms train node embeddings for

individual nodes, and therefore, require additional training via

stochastic gradient descent to make predictions on new nodes.

There are works for learning inductive node embeddings that

combine external node features into graph structures. These

works include graph convolutional networks (GCN) [15],

GNNs [1], GraphSAGE [2], and Graph Attention Networks

(GAT) [3].

Most research for deep learning on graphs assumes that the

underlying graph is static. However, the idea of temporally

changing graphs is more realistic when it comes to most real-

life systems. While static graph learning models can be applied

to temporal graphs by ignoring the temporal evolution [27],

temporal graph structure contains signiﬁcant insights about

the system. There have been works that explore processing

temporal graphs as a sequence of snapshots [28]–[31] that can

capture the evolution of temporal graph dynamics. Similarly,

streaming graphs process temporally changing data in the

ﬁnest granularity in terms of time and it is computationally

much more expensive compared to snapshots [32].

Previously, in terms of processing temporal graphs for graph

representation learning, Talati et al. [17] proposed an imple-

mentation using a temporal random-walk based algorithm.

This work has performed detailed algorithm and hardware

based performance characterization of the pipeline and identi-

ﬁed several execution bottlenecks. However, the analyzed GRL

and FNN training algorithms are performed on pre-collected

graph data and not temporally evolving snapshots of graph.

Consequently, the FNN model is trained with node embed-

dings that capture the entire history of the graph structure

(past and future node interactions). In this work, we propose

an implementation that evaluates prediction tasks in an online

setting on temporally changing graphs.

III. RUN-TIME TEMPORAL GRAPH LEARNING

In this section, we give the details of the proposed run-time

GRL algorithm for temporal graphs. We begin with giving the

following deﬁnitions.

Deﬁnition 1: A graph Gis deﬁned as a tuple G= (V, E)

where V={v0, v1, ..., vn}is the set of nnodes and E=

{e0, e1, ..., em}is the set of medges where an edge connects

two nodes.

In the case of temporal graphs, where nodes are contin-

uously added/removed and edges dynamically change with

time, in order to maintain these temporal updates, the graph

is processed into a sequence of discrete graph snapshots.

Therefore, a temporal network is traditionally represented as a

sequence of static graphs (G1, G2, ..., GT)for Ttimestamps.

Similarly, an edge between node vaand node vbat a timestamp

t∈ {0,1, ..., T −1}is represented as (va, vb, t).

Deﬁnition 2: In a graph G= (V, E), a random walk from

node vato node vbis deﬁned as a sequence of connected

edges w={(va, v1),(v1, v2), ..., (vk, vb)}where k+ 1 is the

length of w.

The main concept of random walks is that they capture the

structure of the graph and node properties by randomly visiting

neighboring nodes and sampling the graph. In order to math-

ematically represent these node properties, a GRL algorithm

maps node properties in a graph into a low-dimensional space.

Deﬁnition 3: Given an input graph G= (V, E), a graph

representation algorithm f:G−→ Rdmaps nodes of the graph

into a d-dimensional space that captures the properties and

closeness of the nodes.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsReal-TimeTemporalGraphLearningDenizGurevin,MohsinShan,TongGengy,WeiwenJiangz,CaiwenDingandOmerKhanUniversityofConnecticut,Storrs,CT,USAyUniversityofRochester,Rochester,NY,USAzGeorgeMasonUniversity,Fairfax,VA,USAfdeniz.gurevin,mohsin.shan,caiwen.ding,khang@uconn.edu,ytgeng@ur.rochester.e...

展开>> 收起<<

Towards Real-Time Temporal Graph Learning Deniz Gurevin Mohsin Shan Tong Gengy Weiwen Jiangz Caiwen Dingand Omer Khan University of Connecticut Storrs CT USA.pdf

共9页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Real-Time Temporal Graph Learning Deniz Gurevin Mohsin Shan Tong Gengy Weiwen Jiangz Caiwen Dingand Omer Khan University of Connecticut Storrs CT USA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: