the execution time is dominated by the FNN training phase.
Although several parallelization methods are proposed for
random walks and word2vec in [17], the FNN training im-
plementation is left out for future performance enhancements.
In this paper, we explore fine-grain parallelization of FNN
training for performance acceleration. Our strategy considers
parallelization opportunities in the individual matrix kernels in
the forward and backward propagation paths for each iteration
of the FNN model. We implement four state-of-the-art matrix
multiplication (MM) algorithms (i.e., inner, outer, row-wise
and column-wise product) for each low-dimensional kernel in
the FNN model. The evaluation on a large core count shared
memory multicore shows that using the right parallelization
strategy yields significant potential for performance scaling.
In summary, we make the following contributions:
•We propose an end-to-end temporal random walk-
based graph learning algorithm for online processing
of temporal graphs. The proposed real-time GRL al-
gorithm is implemented in Python, and released as
an open-source benchmark at https://github.com/grvndnz/
ST-Graph-Learning. The performance analysis identifies
the FNN training as the main performance bottleneck.
•We implement a C++ parallel implementation of the N-
layer Feed-forward Neural Network (FNN) pipeline. The
individual matrix kernels in the forward and backward
propagation steps of FNN training are evaluated for in-
depth performance analysis. We implement four state-of-
the-art parallelization strategies for the low-dimensional
matrix-multiplications in the FNN pipeline to evaluate
the performance scaling potential on multicore proces-
sors. The FNN pipeline implementations are released as
an open-source benchmark at https://github.com/grvndnz/
Parallel-FNN.
II. RELATED WORK
In the recent years, there has been a surge in research in
the area of graph representation learning (GRL), which aims
to encode graph structure into a low-dimensional embedding
space [25]. The main goal of GRL research is to optimize this
encoding in a way that the representations in the learned space
reflect the original graph structure. Some of the early works
in GRL include DeepWalk [4] and Node2Vec [26], which
leverage node proximity in graphs using the the idea of word
proximity NLP [19]. Later, other works have incorporated this
idea to learn graph structural properties such as similarity in
degree sequences [6] and the behavioral roles of the nodes
[5]. These embedding algorithms train node embeddings for
individual nodes, and therefore, require additional training via
stochastic gradient descent to make predictions on new nodes.
There are works for learning inductive node embeddings that
combine external node features into graph structures. These
works include graph convolutional networks (GCN) [15],
GNNs [1], GraphSAGE [2], and Graph Attention Networks
(GAT) [3].
Most research for deep learning on graphs assumes that the
underlying graph is static. However, the idea of temporally
changing graphs is more realistic when it comes to most real-
life systems. While static graph learning models can be applied
to temporal graphs by ignoring the temporal evolution [27],
temporal graph structure contains significant insights about
the system. There have been works that explore processing
temporal graphs as a sequence of snapshots [28]–[31] that can
capture the evolution of temporal graph dynamics. Similarly,
streaming graphs process temporally changing data in the
finest granularity in terms of time and it is computationally
much more expensive compared to snapshots [32].
Previously, in terms of processing temporal graphs for graph
representation learning, Talati et al. [17] proposed an imple-
mentation using a temporal random-walk based algorithm.
This work has performed detailed algorithm and hardware
based performance characterization of the pipeline and identi-
fied several execution bottlenecks. However, the analyzed GRL
and FNN training algorithms are performed on pre-collected
graph data and not temporally evolving snapshots of graph.
Consequently, the FNN model is trained with node embed-
dings that capture the entire history of the graph structure
(past and future node interactions). In this work, we propose
an implementation that evaluates prediction tasks in an online
setting on temporally changing graphs.
III. RUN-TIME TEMPORAL GRAPH LEARNING
In this section, we give the details of the proposed run-time
GRL algorithm for temporal graphs. We begin with giving the
following definitions.
Definition 1: A graph Gis defined as a tuple G= (V, E)
where V={v0, v1, ..., vn}is the set of nnodes and E=
{e0, e1, ..., em}is the set of medges where an edge connects
two nodes.
In the case of temporal graphs, where nodes are contin-
uously added/removed and edges dynamically change with
time, in order to maintain these temporal updates, the graph
is processed into a sequence of discrete graph snapshots.
Therefore, a temporal network is traditionally represented as a
sequence of static graphs (G1, G2, ..., GT)for Ttimestamps.
Similarly, an edge between node vaand node vbat a timestamp
t∈ {0,1, ..., T −1}is represented as (va, vb, t).
Definition 2: In a graph G= (V, E), a random walk from
node vato node vbis defined as a sequence of connected
edges w={(va, v1),(v1, v2), ..., (vk, vb)}where k+ 1 is the
length of w.
The main concept of random walks is that they capture the
structure of the graph and node properties by randomly visiting
neighboring nodes and sampling the graph. In order to math-
ematically represent these node properties, a GRL algorithm
maps node properties in a graph into a low-dimensional space.
Definition 3: Given an input graph G= (V, E), a graph
representation algorithm f:G−→ Rdmaps nodes of the graph
into a d-dimensional space that captures the properties and
closeness of the nodes.