Regularized Graph Structure Learning with Semantic Knowledge for Multi-variates Time-Series Forecasting Hongyuan Yu123 Ting Li1 Weichen Yu23 Jianguo Li1y Yan Huang23 Liang Wang23y Alex Liu1

2025-04-26 0 0 4.13MB 8 页 10玖币
侵权投诉
Regularized Graph Structure Learning with Semantic Knowledge
for Multi-variates Time-Series Forecasting
Hongyuan Yu123, Ting Li1, Weichen Yu23, Jianguo Li1, Yan Huang23 , Liang Wang23, Alex Liu1
1Ant Group
2AI School, University of Chinese Academy of Sciences
3CRIPAC, NLPR, Institute of Automation, Chinese Academy of Sciences, China
{hongyuan.yu,weichen.yu}@cripac.ia.ac.cn, {lt317068, lijg.zero, alexliu}@antgroup.com,
{yhuang, wangliang}@nlpr.ia.ac.cn
Abstract
Multivariate time-series forecasting is a critical task
for many applications, and graph time-series net-
work is widely studied due to its capability to
capture the spatial-temporal correlation simultane-
ously. However, most existing works focus more
on learning with the explicit prior graph structure,
while ignoring potential information from the im-
plicit graph structure, yielding incomplete struc-
ture modeling. Some recent works attempts to
learn the intrinsic or implicit graph structure di-
rectly, while lacking a way to combine explicit
prior structure with implicit structure together. In
this paper, we propose Regularized Graph Struc-
ture Learning (RGSL) model to incorporate both
explicit prior structure and implicit structure to-
gether, and learn the forecasting deep networks
along with the graph structure. RGSL consists of
two innovative modules. First, we derive an im-
plicit dense similarity matrix through node embed-
ding, and learn the sparse graph structure using
the Regularized Graph Generation (RGG) based
on the Gumbel Softmax trick. Second, we pro-
pose a Laplacian Matrix Mixed-up Module (LM3)
to fuse the explicit graph and implicit graph to-
gether. We conduct experiments on three real-word
datasets. Results show that the proposed RGSL
model outperforms existing graph forecasting algo-
rithms with a notable margin, while learning mean-
ingful graph structure simultaneously. Our code
and models are made publicly available at https:
//github.com/alipay/RGSL.git.
1 Introduction
The spatial-temporal graph network [Yu et al., 2018; Cao et
al., 2020; Chen et al., 2019; Shang et al., 2021]enhances
time-series forecasting by modelling the correlation as well
as the relationship between multivariate time-series. It has
many applications. For example, the traffic flow forecasting is
equal contribution, This work was partly done when Hongyuan
Yu was intern at Ant Group.
corresponding author
Figure 1: Graph Visualization (a) naive explicit GSL trained from
prior knowledge; (b) implicit GSL from popular network AGCRN;
(c) our proposed RGSL
a basic yet important application in intelligent transportation
system, which constructs the spatial dependency graph with
road distance. The cloud service flow forecasting is a funda-
mental task in cloud serving system and e-commerce domain,
which builds the relationship graph based on request region
or zone information. Existing graph time-series forecasting
networks like STGCN [Yu et al., 2018], DCRNN [Li et al.,
2018]and ASTGCN [Guo et al., 2019]exploit fixed graph
structure constructed with domain expert knowledge to cap-
ture the multi-variate time-series relationship. The explicit
graph is not always available in every applications or may be
incomplete as it is hard for human expert to capture latent or
long-range dependence among substantial time-series. Thus
how to define accurate dynamic relationship graph becomes
a critical task for graph time-series forecasting.
As the quality of graph structure impacts the performances
of graph time-series forecasting greatly, many recent efforts
[Lu et al., 2021; Bai et al., 2020; Chen et al., 2021a]have
made for Graph Structure Learning (GSL). For instance, GTS
[Shang et al., 2021]have been proposed to learn the discrete
graph structure simultaneously with GNN. AGCRN in [Bai
et al., 2020]is proposed to learn the similarity matrix de-
rived from trainable adaptive node embedding and forecast-
ing in an end-to-end style. DGNN [Lu et al., 2021]is a
dynamic graph construction method which learns the time-
specific spatial adjacency matrix firstly and then exploits dy-
namic graph convolution to pass the message. However, these
aforementioned methods go to another extreme that they learn
the intrinsic/implicit graph structure from time-series patterns
directly, while ignoring the possibility to leverage priori time-
series relationships defined from domain expert knowledge.
In this paper, we focus on solving two problems, the first is
how to take advantage of combining the explicit time-series
relationship with implicit correlations effectively in an end-
arXiv:2210.06126v1 [cs.LG] 12 Oct 2022
to-end way; the second is how to regularize the learned graph
to be sparse which filters out the redundant useless edges
thus improves overall performances, and is more valuable to
real-world applications. To address these issues, firstly we
introduce the Regularized Graph Generation (RGG) module
to learn the implicit graph, which adopts the Gumbel Softmax
trick to sparsify the dense similarity matrix from node embed-
ding. Second, we introduce the Laplacian Matrix Mixed-up
Module (LM3) to incorporate the explicit relationship from
domain knowledge with the implicit graph from RGG. Fig-
ure1 shows the graph structure learned from only explicit re-
lationship in (a), both implicit and explicit relationship with-
out regularization (b), as well as from our proposed RGSL
shown in (c). We can observe that RGSL can discover the
implicit time-series relationship ignored by naive graph struc-
ture learning algorithm(shown in red boxes in Figure1(a)).
Besides, compared to Figure 1(b), the regularization module
in RGSL which automatically removes the noisy/redundant
edges making the learned graph more sparse, as well as more
effective than dense graph.
To summarize, our work presents the following contribu-
tions.
We propose a novel and efficient model named RGSL
which first exploits both explicit and implicit time-series
relationship to assist graph structure learning, and our
proposed LM3module effectively mixes up two kinds
of Laplacian matrix collectively.
Besides, to regularize the learned matrix, we also pro-
pose a RGG module which formulates the discrete graph
structure as a variable independent matrix and exploits
the Gumbel softmax trick to optimize the probabilistic
graph distribution parameters.
Extensive experiments show the proposed model RGSL
significantly outperforms benchmarks on three datasets
consistently. Moreover, both the LM3module and RGG
module can be easily generalized to different spatio-
temporal graph models.
2 Methodology
In this section, we first introduce problem definition and no-
tations, and then describe the detailed implementation of pro-
posed RGSL. The overall pipeline is shown in Figure 2. The
RGSL consists of three major modules, and the first is the reg-
ularized graph generation module name RGG which learns
the discrete graph structure with trainable node embeddings
in Section 2.2 with Gumbel softmax trick. The second is the
Laplacian matrix mix-up module named LM3in 2.3 which
captures both explicit and implicit time-series correlations be-
tween nodes in a convex way. Finally, in 2.4, we utilize re-
current graph network to perform time-series forecasting con-
sidering both the spatial correlation and temporal dependency
simultaneously.
2.1 Preliminary
The traffic series forecasting is to predict the future time
series from historical traffic records. Denote the training
data by X0:T={X0,X1,...,Xt,...,XT}, and Xt=
{X0
t,X1
t,...,XN
t}, where the superscript refers to series
and subscript refers to time. There are total T timestamps
for training and τtimestamps required for traffic forecast-
ing. We denote G(0) as the explicit graph constructed with
priori time-series relationship, and G(l)as the implicit graph
learned from trainable node embeddings, and the vertex of
graph G(l)represents traffic series X, and ARN×Nis the
adjacent matrix of the graph Grepresenting the similarity be-
tween time-series. Thus, the time-series forecasting with the
explicit graph task can be defined as:
min
Wθ
L(XT+1:T+τ,
ˆ
XT+1:T+τ;X0:T,G(0),G(l))(1)
where Wθdenotes all the learnable parameters, ˆ
XT+1:T+τ
denotes the ground truth future values, Lis the loss function.
2.2 Regularized Graph Generation
Regularization method Dropout [Srivastava et al., 2014]
aims at preventing neural networks from overfitting by ran-
domly drop connections during training. However, traditional
Dropout equally treats every connection and drop them with
the same distribution acquired from cross-validation, which
doesn’t consider the different significance of different edges.
In our Regularized Graph Generation(RGG) module, inspired
by [Shang et al., 2021]and works in reinforcement learning,
we simply resolve the regularization problem with Gumble
Softmax to replace Softmax, which is super convenient to
employ, increases the explainability of prediction and shows
nice improvements. Another motivation of applying Gumble
Softmax trick is to alleviate the density of the learned matrix
after training from GNNs.
Let ERN×dbe the learned node embedding matrix, d
is the embedding dimension, θis the probability matrix then
θij θrepresents the probability to preserve the edge of
time-series ito j, which is formulated as:
θ=EE>(2)
Let σbe activation function and sis the temperature variable,
then the sparse adjacency matrix A(l)is defined as:
A(l)=σ((log(θij /(1 θij )) + (g1
ij g2
ij ))/s)
s.t. g1
ij , g2
ij Gumbel(0,1) (3)
Equation 3 is the Gumbel softmax implementation of our task
where the A(l)
ij = 1 with the probability θij and 0 with re-
maining probability. It can be easily proved that Gumbel
Softmax shares the same probability distribution as the nor-
mal Softmax, which ensures that the graph forecasting net-
work keeps consistent with the trainable probability matrix
generation statistically.
At each iteration, we calculate the adjacent matrix θas the
Equation 2 suggests, Gumbel-Max samples the adjacent ma-
trix to determine which edge to preserve and which to dis-
card, which is similar to Dropout. However, dropout ran-
domly selects edges or neurons with equal probability, while
we drop out useful edges with small likelihood and tend to
get rid of those redundant edges. As in Figure. 4(a), all the
non-diagonal entries are non-zero, but substantial amounts of
them are small-value and regarded as useless or even noisy.
摘要:

RegularizedGraphStructureLearningwithSemanticKnowledgeforMulti-variatesTime-SeriesForecastingHongyuanYu123,TingLi1,WeichenYu23,JianguoLi1y,YanHuang23,LiangWang23y,AlexLiu11AntGroup2AISchool,UniversityofChineseAcademyofSciences3CRIPAC,NLPR,InstituteofAutomation,ChineseAcademyofSciences,China{hongy...

展开>> 收起<<
Regularized Graph Structure Learning with Semantic Knowledge for Multi-variates Time-Series Forecasting Hongyuan Yu123 Ting Li1 Weichen Yu23 Jianguo Li1y Yan Huang23 Liang Wang23y Alex Liu1.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:4.13MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注