EDGE-VARYING FOURIER GRAPH NETWORKS FOR MULTIVARIATE TIMESERIES FORECASTING Kun Yi

2025-05-03 0 0 2.14MB 24 页 10玖币
侵权投诉
EDGE-VARYING FOURIER GRAPH NETWORKS FOR
MULTIVARIATE TIME SERIES FORECASTING
Kun Yi
Beijing Institute of Technology
yikun@bit.edu.cn
Qi Zhang
University of Technology Sydney
qi.zhang-13@student.uts.edu.au
Liang Hu
Tongji University
rainmilk@gmail.com
Hui He
Beijing Institute of Technology
hehui617@bit.edu.cn
Ning An
HeFei University of Technology
ning.g.an@acm.org
LongBing Cao
University of Technology Sydney
longbing.cao@uts.edu.au
ZhenDong Niu
Beijing Institute of Technology
zniu@bit.edu.cn
ABSTRACT
The key problem in multivariate time series (MTS) analysis and forecasting aims to
disclose the underlying couplings between variables that drive the co-movements.
Considerable recent successful MTS methods are built with graph neural networks
(GNNs) due to their essential capacity for relational modeling. However, previous
work often used a static graph structure of time-series variables for modeling
MTS failing to capture their ever-changing correlations over time. To this end, a
fully-connected supra-graph connecting any two variables at any two timestamps
is adaptively learned to capture the high-resolution variable dependencies via an
efficient graph convolutional network. Specifically, we construct the Edge-Varying
Fourier Graph Networks (EV-FGN) equipped with Fourier Graph Shift Operator
(FGSO) which efficiently performs graph convolution in the frequency domain.
As a result, a high-efficiency scale-free parameter learning scheme is derived for
MTS analysis and forecasting according to the convolution theorem. Extensive
experiments show that EV-FGN outperforms state-of-the-art methods on seven
real-world MTS datasets.
1 INTRODUCTION
Multivariate time series (MTS) forecasting is a key ingredient in many real-world scenarios, including
weather forecasting Zheng et al. (2015), decision making Borovykh et al. (2017), traffic forecasting
Yu et al. (2018a); Bai et al. (2020), COVID-19 prediction Cao et al. (2020); Chen et al. (2022), etc.
Recently, deep neural networks, such as long short-term memory (LSTM) Hochreiter & Schmid-
huber (1997), convolutional neural network (CNN) Borovykh et al. (2017), Transformer Vaswani
et al. (2017), have dominated MTS modeling. In particular, graph neural networks (GNNs) have
demonstrated promising performance on MTS forecasting with their essential capability to capture
the complex couplings between time-series variables. Some studies enable to adaptively learn the
graph for MTS forecasting even without an explicit graph structure, e.g., by node similarity Mateos
et al. (2019); Bai et al. (2020); Wu et al. (2019) and/or self-attention mechanism Cao et al. (2020).
Despite the success of GNNs on MTS forecasting, three practical challenges are eagerly demanded
to address: 1) the dependencies between each pair of time series variables are generally non-static,
which demands full dependencies modeling with all possible lags; 2) a high-efficiency dense graph
1
arXiv:2210.03093v2 [cs.LG] 9 Oct 2022
learning method is demanded to replace the high-cost operators of graph convolutions and attention
(with quadratic time complexity with the graph size); 3) the graph over MTS varies with different
temporal interactions, which demands an efficient dynamic structure encoding method. In view of
the increasing leverage of the convolution theorem in Fourier theory to design high-efficiency neural
units (operators) to replace costly attention computations Guibas et al. (2022), we are inspired to
perform graph computations in the Fourier space and provide a potential alternative for processing
supra-graphs jointly attending to spatial-temporal dependencies.
Different from most GNN-based MTS that constructs graphs to model the spatial correlation between
variables (i.e. nodes) Bai et al. (2020); Wu et al. (2019; 2020), we attempt to build a supra-graph that
sheds light on the "high-resolution" correlations between any two variables at any two timestamps
and largely enhances the expressiveness on non-static spatial-temporal dependencies. Obviously,
the supra-graph will heavily increase the computational complexity of GNN-based model, then
a high-efficiency learning method is required to reduce the cost for model training. Inspired by
Fourier Neural Operator (FNO) Li et al. (2021), we transform graph convolutions in the time domain
to much lower-complexity matrix multiplication in the frequency domain by defining an efficient
Fourier Graph Shift Operator (FGSO). In addition, it is necessary to consider the multiple iterations
(layers) of graph convolution to expand receptive neighbors and mix the diffusion information in the
graph. To capture time-varying diffusion over the supra-graph, we introduce the edge-varying graph
filter, expressed as a polynomial of multiple graph shift operators (GSOs), to weigh the graph edges
differently from different iterations. We then reformulate the edge-varying graph filter with FGSOs in
the frequency domain to reduce the computational cost. Accordingly, we construct a complex-valued
feed forward network, dubbed as Edge-Varying Fourier Graph Networks (EV-FGN), stacked with
multiple FGSOs to perform high-efficiency multi-layer graph convolutions in the Fourier space.
The main contributions of this paper are summarized as follows:
We adaptively learn a supra-graph, representing non-static correlations between any two variables
at any two timestamps, to capture high-resolution spatial-temporal dependencies.
To efficiently compute graph convolutions over the supra-graph, we define FGSO that has the
capacity of scale-free learning parameters in the Fourier space.
We design EV-FGN evolved from edge varying graph filters to capture the time-varying variable
dependencies where FGSOs are used to reduce the complexity of graph convolutions.
Extensive experimental results on seven MTS datasets demonstrate that EV-FGN achieves state-of-
the-art performance with high efficiency and fewer parameters. Multifaceted visualizations further
interpret the efficacy of EV-FGN in graph representation learning for MTS forecasting.
2 RELATED WORKS
2.1 MULTIVARIATE TIME SERIES FORECASTING
Classic time series forecasting methods are linear models, such as VAR Watson (1993), ARIMA
Asteriou & Hall (2011), state space model (SSM) Hyndman et al. (2008). Recently, deep learning
based methods Lai et al. (2018); Sen et al. (2019); Zhou et al. (2021) have dominated MTS forecasting
due to their capability of fitting any complex nonlinear correlations Lim & Zohren (2021).
MTS with GNN
. More recently, MTS have embraced GNN Wu et al. (2019); Bai et al. (2020); Wu
et al. (2020); Yu et al. (2018b); Chen et al. (2022); Li et al. (2018) due to their best capability of
modeling structural dependencies between variables. Most of these models, such as STGCN Yu et al.
(2018b), DCRNN Li et al. (2018) and TAMP-S2GCNets Chen et al. (2022), require a pre-defined
graph structure which is usually unknown in most cases. In recent years, some GNN-based works
Kipf et al. (2018); Deng & Hooi (2021) account for the dynamic dependencies due to network design
such as the time-varying attention Deng & Hooi (2021). In comparison, our proposed model captures
the dynamic dependencies leveraging the high-resolution correlation in the supra-graph without
introducing specific networks.
MTS with Fourier transform
. Recently, increasing MTS forecasting models have introduced the
Fourier theory into neural networks as high-efficiency convolution operators Guibas et al. (2022);
Chi et al. (2020). SFM Zhang et al. (2017) decomposes the hidden state of LSTM into multiple
2
frequencies by discrete Fourier transform (DFT). mWDN Wang et al. (2018) decomposes the time
series into multilevel sub-series by discrete wavelet decomposition and feeds them to LSTM network,
respectively. ATFN Yang et al. (2022) utilizes a time-domain block to learn the trending feature
of complicated non-stationary time series and a frequency-domain block to capture dynamic and
complicated periodic patterns of time series data. However, these models only capture temporal
dependencies in the frequency domain. StemGNN Cao et al. (2020) takes the advantages of both inter-
series correlations and temporal dependencies by modeling them in the spectral domain, however, it
captures the temporal and spatial dependencies separately. Different from these works, our model
enables to jointly encode spatial-temporal dependencies in the Fourier space.
2.2 GRAPH SHIFT OPERATOR
Graph shift operators (GSOs) (e.g., the adjacency matrix and the Laplacian matrix) are a general
set of linear operators which are used to encode neighbourhood topologies in the graph. Klicpera
et al. (2019) shows that applying the varying GSOs in the message passing step of GNNs can lead
to significant improvement of performance. Dasoulas et al. (2021) proposes a parameterized graph
shift operator to automatically adapt to networks with varying sparsity. Isufi et al. (2021) allows
different nodes to use different GSOs to weigh the information of different neighbors. Hadou et al.
(2022) introduces a linear composition of the graph shift operator and time-shift operator to design
space-time filters for time-varying graph signals. Inspired by these works, in this paper we design a
varying parameterized graph shift operator in Fourier space.
2.3 FOURIER NEURAL OPERATOR
Different from classical neural networks which learn mappings between finite-dimensional Euclidean
spaces, neural operators learn mappings between infinite-dimensional function spaces Kovachki et al.
(2021b). Fourier neural operators (FNOs), currently the most promising of the neural operators, are
universal, in the sense that they can approximate any continuous operator to the desired accuracy
Kovachki et al. (2021a). Li et al. (2021) formulates a new neural operator by parameterizing the
integral kernel directly in the Fourier space, allowing for an expressive and efficient architecture for
partial differential equations. Guibas et al. (2022) proposes an efficient token mixer that learns to mix
in the Fourier domain which is principled architectural modifications to FNO. In this paper, we learn
a Fourier graph shift operator by leveraging the Fourier Neural operator.
3 METHODOLOGY
Let us denote the entire MTS raw data as
XRN×L
with
N
variables and
L
timestamps. Under
the rolling setting, we have window-sized time-series inputs with
T
timestamps, i.e.,
{X|X
X, X RN×T}
. Accordingly, we formulate the problem of MTS forecasting as learning the spatial-
temporal dependencies simultaneously on a supra-graph
G= (X, S)
attributed to each
X
. The
supra-graph
G
contains
NT
nodes that represent values of each variable at each timestamp in
X
, and
SR(NT)×(NT)
is a graph shift operator (GSO) representing the connection structure
of
G
. Since the underlying graph is unknown in most MTS scenarios, we assume all nodes in
the supra-graph are connected with each other, i.e., a fully-connected graph, and perform graph
convolutions on the graph to learn spatial-temporal representation. Then, given the observed values of
previous
T
steps at timestamp
t
, i.e.,
XtT:tRN×T
, the task of multi-step multivariate time series
forecasting is to predict the values of
N
variables for next
τ
steps denoted as
ˆ
Xt+1:t+τRN×τ
on
the supra-graph G, formulated as follows:
ˆ
Xt+1:t+τ= F(XtT:t;G; Θ) (1)
where Fis the forecasting model with parameters Θ.
3.1 OVERALL ARCHITECTURE
The overall architecture of our model is illustrated in Fig. 1. Given input data
XRN×T
, first we
embed the data into embeddings
XRN×T×d
by assigning a
d
-dimension vector for each node
using an embedding matrix
ΦRN×T×d
, i.e.,
X=X×Φ
. Instead of directly learning the huge
3
Figure 1: The network architecture of our proposed model.
embedding matrix, we introduce two small parameter matrices: 1) a variable embedding matrix
φvRN×1×d
, and 2) a temporal embedding matrix
φuR1×T×d
to factorize
Φ
, i.e.,
Φ = φv×φu
.
Subsequently, we perform 2D discrete Fourier transform (DFT) on each discrete
N×T
spatial-
temporal plane of the embeddings
X
and obtain the frequency input
X:= DFT(X)CN×T×d
.
We then feed
X
to
K
-layer Edge-Varying Fourier Graph Networks (denoted as
ΨK
) to perform graph
convolutions for capturing the spatial-temporal dependencies simultaneously in the Fourier space.
To make predictions in time domain, we perform 2D inverse Fourier transform to generate repre-
sentations
XΨ:= IDFT(ΨK(X)) RN×T×d
in the time domain. The representation is then fed
to two-layer feed-forward networks (FFN, see more details in Appendix E.4) parameterized with
weights and biases denoted as
φff
to make predictions for future
τ
steps
ˆ
XRN×τ
by one forward
procedure. The L2 loss function for multi-step forecasting can be formulated as:
L(ˆ
X;X; Θ) = X
t
ˆ
Xt+1:t+τXt+1:t+τ
2
2(2)
with parameters Θ = {φv, φu, φff ,ΨK}and the groudtruth Xt+1:t+τXat timestamp t.
3.2 FOURIER GRAPH SHIFT OPERATOR
According to the discrete signal processing on graphs Sandryhaila & Moura (2013), a graph shift
operator (GSO) is defined as a general family of operators which enable the diffusion of information
over graph structures Gama et al. (2020); Dasoulas et al. (2021).
Definition 1
(
Graph Shift Operator
)
.
Given a graph
G
with
n
nodes, a matrix
SRn×n
is called
a Graph Shift Operator (GSO) if it satisfies Sij = 0 if i6=jand nodes i, j are not connected.
The graph shift operator includes the adjacency, Laplacian matrices and their normalisations as
instances of its class and represents the connection structure of the graph. Accordingly, given the
graph Gattributed to XRn×d, a generally form of spatial-based graph convolution is defined as
O(X) := SXW (3)
with the parameter matrix
WRd×d
. Regarding
S
as
n×n
scores, we can define a matrix-valued
kernel
κ: [n]×[n]Rd×d
with
κ[i, j] = Sij W
, where
[n] = {1,2,· · · , n}
. Then the graph
convolution can be viewed as a kernel summation.
O(X)[i] =
n
X
j=1
X[j]κ[i, j]i[n].(4)
In the special case of the Green’s kernel κ[i, j] = κ[ij], we can rewrite the kernel summation
O(X)[i] =
n
X
j=1
X[j]κ[ij]=(Xκ)[i]i[n].(5)
with
Xκ
denotes the convolution of discrete sequences
X
and
κ
. According to the convolution
theorem Katznelson (1970) (see Appendix B), the graph convolution is rewritten as
O(X)(i) = F1(F(X)F(κ)) (i)i[n].(6)
where
F
and
F1
denote the discrete Fourier transform (DFT) and its inverse (IDFT), respectively.
The multiplication in the Fourier space is a lower-complexity computation compared to the graph
convolution, and DFT can be efficiently implemented by the fast Fourier transform (FFT).
4
Definition 2
(
Fourier Graph Shift Operator
)
.
Given a graph
G= (X, S)
with input
XRn×d
and GSO SRn×nand the weight matrix WRd×d, the graph convolution is formulated as
F(SXW ) = F(X)×nF(κ)(7)
where
F
denotes DFT, satisfies
κ[i, j] = κ[ij]
, and
×n
is matrix multiplication on dimensions
except that of n. We define S:= F(κ)Cn×d×das a Fourier graph shift operator (FGSO).
In particular, turning to our case of the fully-connected supra-graph
G
with an all-one GSO
S
{1}n×n
, it yields the space-invariant kernel
κ[i, j] = Sij W=W
and
F(SXW ) = F(X)F(κ)
.
Accordingly, we can parameterize FGSO
S
with a complex-valued matrix
Cd×d
which is space-
invariant and is computationally low costly compared to a varying kernel resulting a parameterized
matrix of
Cn×d×d
. Furthermore, we can extend Definition 2 to 2D discrete space, i.e., from
[n]
to
[N]×[T], corresponding to the finite discrete spatial-temporal space of multivariate time series.
Remarks
. Compared with GSO and FNO, space-invariant FGSO has several advantages. Assume a
graph with
n
nodes and the embedding dimension
d
(
dn
). 1) Efficiency: the time complexity of
space-invariant FGSO is
O(nd log n+nd2)
for DFT, IDFT and the matrix multiplication compared
with that of
O(n2d+nd2)
on a GSO. 2) Scale-free parameters: space-invariant FGSO strategically
shares
O(d2)
parameters for each node and the parameter volume is agnostic to the data scale, while
the parameter count of FNO is O(nd2).
3.3 EDGE-VARYING FOURIER GRAPH NETWORKS
Graph filters as core operations in signal processing are linear transformations expressed as polyno-
mials of the graph shift operator Isufi et al. (2021); Mateos et al. (2019) and can be used to exactly
model graph convolutions and capture multi-order diffusion on graph structures Segarra et al. (2017).
To capture the time-varying counterparts and adopt different weights to weigh the information of
different neighbors in each diffusion order, the edge-variant graph filters are defined as follows Isufi
et al. (2021); Segarra et al. (2017): given GSO SRn×ncorresponding to a graph with nnodes,
HEV =S0+S1S0+... +SKSK1...S0=
K
X
k=0
Sk:0 (8)
where
S0
denotes the identity matrix,
{Sk}K
k=1
is a collection of
K
edge-weighting GSOs sharing
the sparsity pattern of
S
, and
SkRn×n
corresponds to the
k
-th diffusion step. The edge-variant
graph filters are proved effective to yield a highly discriminative model and lay the foundation for the
unification of GCNs and GATs Isufi et al. (2021). We can extend Equation 7 to
HEV
and reformulate
the multi-order graph convolution in a recursive form.
Proposition 1.
Given a graph input
XRn×d
, the
K
-order graph convolution under the edge-
variant graph filter HEV =PK
k=0 Sk:0 with {SkRn×n}K
k=0 is reformulated as follows:
HEV X=F1 K
X
k=0
F(X)×nS0:k!s.t. S0:k=S0×n· · · SK1×nSK(9)
where
SkCn×d×d
is the
k
-th FGSO satisfying
F(SkX) = F(X)×nSk
,
S0
is the identity matrix,
and Fand F1denote the discrete Fourier transform and its reverse, respectively.
Proposition 1 proved in Appendix C.1 states that we can rewrite the multi-order graph convolution
corresponding to
HEV
as a summation of a recursive multiplication of individual FGSO in the Fourier
space. Considering our case of the supra-graph
G
, we can similarly adopt
K
space-invariant FGSOs
and parameterize each FGSO
Sk
with a complex-valued matrix
Cd×d
. This saves a large amount of
computation costs and results in a concise form:
HEV X=F1 K
X
k=0
F(X)S0:k!s.t. S0:k=
k
Y
i=0
Si(10)
The recursive composition has a nice property that
S0:k=S0:k1Sk
, which inspires us to design a
complex-valued feed forward network with
Sk
being the complex weights for the
k
-th layer. However,
5
摘要:

EDGE-VARYINGFOURIERGRAPHNETWORKSFORMULTIVARIATETIMESERIESFORECASTINGKunYiBeijingInstituteofTechnologyyikun@bit.edu.cnQiZhangUniversityofTechnologySydneyqi.zhang-13@student.uts.edu.auLiangHuTongjiUniversityrainmilk@gmail.comHuiHeBeijingInstituteofTechnologyhehui617@bit.edu.cnNingAnHeFeiUniversityofTe...

展开>> 收起<<
EDGE-VARYING FOURIER GRAPH NETWORKS FOR MULTIVARIATE TIMESERIES FORECASTING Kun Yi.pdf

共24页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:24 页 大小:2.14MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 24
客服
关注