to-end way; the second is how to regularize the learned graph
to be sparse which filters out the redundant useless edges
thus improves overall performances, and is more valuable to
real-world applications. To address these issues, firstly we
introduce the Regularized Graph Generation (RGG) module
to learn the implicit graph, which adopts the Gumbel Softmax
trick to sparsify the dense similarity matrix from node embed-
ding. Second, we introduce the Laplacian Matrix Mixed-up
Module (LM3) to incorporate the explicit relationship from
domain knowledge with the implicit graph from RGG. Fig-
ure1 shows the graph structure learned from only explicit re-
lationship in (a), both implicit and explicit relationship with-
out regularization (b), as well as from our proposed RGSL
shown in (c). We can observe that RGSL can discover the
implicit time-series relationship ignored by naive graph struc-
ture learning algorithm(shown in red boxes in Figure1(a)).
Besides, compared to Figure 1(b), the regularization module
in RGSL which automatically removes the noisy/redundant
edges making the learned graph more sparse, as well as more
effective than dense graph.
To summarize, our work presents the following contribu-
tions.
• We propose a novel and efficient model named RGSL
which first exploits both explicit and implicit time-series
relationship to assist graph structure learning, and our
proposed LM3module effectively mixes up two kinds
of Laplacian matrix collectively.
• Besides, to regularize the learned matrix, we also pro-
pose a RGG module which formulates the discrete graph
structure as a variable independent matrix and exploits
the Gumbel softmax trick to optimize the probabilistic
graph distribution parameters.
• Extensive experiments show the proposed model RGSL
significantly outperforms benchmarks on three datasets
consistently. Moreover, both the LM3module and RGG
module can be easily generalized to different spatio-
temporal graph models.
2 Methodology
In this section, we first introduce problem definition and no-
tations, and then describe the detailed implementation of pro-
posed RGSL. The overall pipeline is shown in Figure 2. The
RGSL consists of three major modules, and the first is the reg-
ularized graph generation module name RGG which learns
the discrete graph structure with trainable node embeddings
in Section 2.2 with Gumbel softmax trick. The second is the
Laplacian matrix mix-up module named LM3in 2.3 which
captures both explicit and implicit time-series correlations be-
tween nodes in a convex way. Finally, in 2.4, we utilize re-
current graph network to perform time-series forecasting con-
sidering both the spatial correlation and temporal dependency
simultaneously.
2.1 Preliminary
The traffic series forecasting is to predict the future time
series from historical traffic records. Denote the training
data by X0:T={X0,X1,...,Xt,...,XT}, and Xt=
{X0
t,X1
t,...,XN
t}, where the superscript refers to series
and subscript refers to time. There are total T timestamps
for training and τtimestamps required for traffic forecast-
ing. We denote G(0) as the explicit graph constructed with
priori time-series relationship, and G(l)as the implicit graph
learned from trainable node embeddings, and the vertex of
graph G(l)represents traffic series X, and A∈RN×Nis the
adjacent matrix of the graph Grepresenting the similarity be-
tween time-series. Thus, the time-series forecasting with the
explicit graph task can be defined as:
min
Wθ
L(XT+1:T+τ,
ˆ
XT+1:T+τ;X0:T,G(0),G(l))(1)
where Wθdenotes all the learnable parameters, ˆ
XT+1:T+τ
denotes the ground truth future values, Lis the loss function.
2.2 Regularized Graph Generation
Regularization method Dropout [Srivastava et al., 2014]
aims at preventing neural networks from overfitting by ran-
domly drop connections during training. However, traditional
Dropout equally treats every connection and drop them with
the same distribution acquired from cross-validation, which
doesn’t consider the different significance of different edges.
In our Regularized Graph Generation(RGG) module, inspired
by [Shang et al., 2021]and works in reinforcement learning,
we simply resolve the regularization problem with Gumble
Softmax to replace Softmax, which is super convenient to
employ, increases the explainability of prediction and shows
nice improvements. Another motivation of applying Gumble
Softmax trick is to alleviate the density of the learned matrix
after training from GNNs.
Let E∈RN×dbe the learned node embedding matrix, d
is the embedding dimension, θis the probability matrix then
θij ∈θrepresents the probability to preserve the edge of
time-series ito j, which is formulated as:
θ=EE>(2)
Let σbe activation function and sis the temperature variable,
then the sparse adjacency matrix A(l)is defined as:
A(l)=σ((log(θij /(1 −θij )) + (g1
ij −g2
ij ))/s)
s.t. g1
ij , g2
ij ∼Gumbel(0,1) (3)
Equation 3 is the Gumbel softmax implementation of our task
where the A(l)
ij = 1 with the probability θij and 0 with re-
maining probability. It can be easily proved that Gumbel
Softmax shares the same probability distribution as the nor-
mal Softmax, which ensures that the graph forecasting net-
work keeps consistent with the trainable probability matrix
generation statistically.
At each iteration, we calculate the adjacent matrix θas the
Equation 2 suggests, Gumbel-Max samples the adjacent ma-
trix to determine which edge to preserve and which to dis-
card, which is similar to Dropout. However, dropout ran-
domly selects edges or neurons with equal probability, while
we drop out useful edges with small likelihood and tend to
get rid of those redundant edges. As in Figure. 4(a), all the
non-diagonal entries are non-zero, but substantial amounts of
them are small-value and regarded as useless or even noisy.