EDGE-VARYING FOURIER GRAPH NETWORKS FOR MULTIVARIATE TIMESERIES FORECASTING Kun Yi

2025-05-03 0 0 2.14MB 24 页 10玖币

侵权投诉

EDGE-VARYING FOURIER GRAPH NETWORKS FOR

MULTIVARIATE TIME SERIES FORECASTING

Kun Yi

Beijing Institute of Technology

yikun@bit.edu.cn

Qi Zhang

University of Technology Sydney

qi.zhang-13@student.uts.edu.au

Liang Hu

Tongji University

rainmilk@gmail.com

Hui He

Beijing Institute of Technology

hehui617@bit.edu.cn

Ning An

HeFei University of Technology

ning.g.an@acm.org

LongBing Cao

University of Technology Sydney

longbing.cao@uts.edu.au

ZhenDong Niu

Beijing Institute of Technology

zniu@bit.edu.cn

ABSTRACT

The key problem in multivariate time series (MTS) analysis and forecasting aims to

disclose the underlying couplings between variables that drive the co-movements.

Considerable recent successful MTS methods are built with graph neural networks

(GNNs) due to their essential capacity for relational modeling. However, previous

work often used a static graph structure of time-series variables for modeling

MTS failing to capture their ever-changing correlations over time. To this end, a

fully-connected supra-graph connecting any two variables at any two timestamps

is adaptively learned to capture the high-resolution variable dependencies via an

efﬁcient graph convolutional network. Speciﬁcally, we construct the Edge-Varying

Fourier Graph Networks (EV-FGN) equipped with Fourier Graph Shift Operator

(FGSO) which efﬁciently performs graph convolution in the frequency domain.

As a result, a high-efﬁciency scale-free parameter learning scheme is derived for

MTS analysis and forecasting according to the convolution theorem. Extensive

experiments show that EV-FGN outperforms state-of-the-art methods on seven

real-world MTS datasets.

1 INTRODUCTION

Multivariate time series (MTS) forecasting is a key ingredient in many real-world scenarios, including

weather forecasting Zheng et al. (2015), decision making Borovykh et al. (2017), trafﬁc forecasting

Yu et al. (2018a); Bai et al. (2020), COVID-19 prediction Cao et al. (2020); Chen et al. (2022), etc.

Recently, deep neural networks, such as long short-term memory (LSTM) Hochreiter & Schmid-

huber (1997), convolutional neural network (CNN) Borovykh et al. (2017), Transformer Vaswani

et al. (2017), have dominated MTS modeling. In particular, graph neural networks (GNNs) have

demonstrated promising performance on MTS forecasting with their essential capability to capture

the complex couplings between time-series variables. Some studies enable to adaptively learn the

graph for MTS forecasting even without an explicit graph structure, e.g., by node similarity Mateos

et al. (2019); Bai et al. (2020); Wu et al. (2019) and/or self-attention mechanism Cao et al. (2020).

Despite the success of GNNs on MTS forecasting, three practical challenges are eagerly demanded

to address: 1) the dependencies between each pair of time series variables are generally non-static,

which demands full dependencies modeling with all possible lags; 2) a high-efﬁciency dense graph

arXiv:2210.03093v2 [cs.LG] 9 Oct 2022

learning method is demanded to replace the high-cost operators of graph convolutions and attention

(with quadratic time complexity with the graph size); 3) the graph over MTS varies with different

temporal interactions, which demands an efﬁcient dynamic structure encoding method. In view of

the increasing leverage of the convolution theorem in Fourier theory to design high-efﬁciency neural

units (operators) to replace costly attention computations Guibas et al. (2022), we are inspired to

perform graph computations in the Fourier space and provide a potential alternative for processing

supra-graphs jointly attending to spatial-temporal dependencies.

Different from most GNN-based MTS that constructs graphs to model the spatial correlation between

variables (i.e. nodes) Bai et al. (2020); Wu et al. (2019; 2020), we attempt to build a supra-graph that

sheds light on the "high-resolution" correlations between any two variables at any two timestamps

and largely enhances the expressiveness on non-static spatial-temporal dependencies. Obviously,

the supra-graph will heavily increase the computational complexity of GNN-based model, then

a high-efﬁciency learning method is required to reduce the cost for model training. Inspired by

Fourier Neural Operator (FNO) Li et al. (2021), we transform graph convolutions in the time domain

to much lower-complexity matrix multiplication in the frequency domain by deﬁning an efﬁcient

Fourier Graph Shift Operator (FGSO). In addition, it is necessary to consider the multiple iterations

(layers) of graph convolution to expand receptive neighbors and mix the diffusion information in the

graph. To capture time-varying diffusion over the supra-graph, we introduce the edge-varying graph

ﬁlter, expressed as a polynomial of multiple graph shift operators (GSOs), to weigh the graph edges

differently from different iterations. We then reformulate the edge-varying graph ﬁlter with FGSOs in

the frequency domain to reduce the computational cost. Accordingly, we construct a complex-valued

feed forward network, dubbed as Edge-Varying Fourier Graph Networks (EV-FGN), stacked with

multiple FGSOs to perform high-efﬁciency multi-layer graph convolutions in the Fourier space.

The main contributions of this paper are summarized as follows:

•

We adaptively learn a supra-graph, representing non-static correlations between any two variables

at any two timestamps, to capture high-resolution spatial-temporal dependencies.

•

To efﬁciently compute graph convolutions over the supra-graph, we deﬁne FGSO that has the

capacity of scale-free learning parameters in the Fourier space.

•

We design EV-FGN evolved from edge varying graph ﬁlters to capture the time-varying variable

dependencies where FGSOs are used to reduce the complexity of graph convolutions.

•

Extensive experimental results on seven MTS datasets demonstrate that EV-FGN achieves state-of-

the-art performance with high efﬁciency and fewer parameters. Multifaceted visualizations further

interpret the efﬁcacy of EV-FGN in graph representation learning for MTS forecasting.

2 RELATED WORKS

2.1 MULTIVARIATE TIME SERIES FORECASTING

Classic time series forecasting methods are linear models, such as VAR Watson (1993), ARIMA

Asteriou & Hall (2011), state space model (SSM) Hyndman et al. (2008). Recently, deep learning

based methods Lai et al. (2018); Sen et al. (2019); Zhou et al. (2021) have dominated MTS forecasting

due to their capability of ﬁtting any complex nonlinear correlations Lim & Zohren (2021).

MTS with GNN

. More recently, MTS have embraced GNN Wu et al. (2019); Bai et al. (2020); Wu

et al. (2020); Yu et al. (2018b); Chen et al. (2022); Li et al. (2018) due to their best capability of

modeling structural dependencies between variables. Most of these models, such as STGCN Yu et al.

(2018b), DCRNN Li et al. (2018) and TAMP-S2GCNets Chen et al. (2022), require a pre-deﬁned

graph structure which is usually unknown in most cases. In recent years, some GNN-based works

Kipf et al. (2018); Deng & Hooi (2021) account for the dynamic dependencies due to network design

such as the time-varying attention Deng & Hooi (2021). In comparison, our proposed model captures

the dynamic dependencies leveraging the high-resolution correlation in the supra-graph without

introducing speciﬁc networks.

MTS with Fourier transform

. Recently, increasing MTS forecasting models have introduced the

Fourier theory into neural networks as high-efﬁciency convolution operators Guibas et al. (2022);

Chi et al. (2020). SFM Zhang et al. (2017) decomposes the hidden state of LSTM into multiple

frequencies by discrete Fourier transform (DFT). mWDN Wang et al. (2018) decomposes the time

series into multilevel sub-series by discrete wavelet decomposition and feeds them to LSTM network,

respectively. ATFN Yang et al. (2022) utilizes a time-domain block to learn the trending feature

of complicated non-stationary time series and a frequency-domain block to capture dynamic and

complicated periodic patterns of time series data. However, these models only capture temporal

dependencies in the frequency domain. StemGNN Cao et al. (2020) takes the advantages of both inter-

series correlations and temporal dependencies by modeling them in the spectral domain, however, it

captures the temporal and spatial dependencies separately. Different from these works, our model

enables to jointly encode spatial-temporal dependencies in the Fourier space.

2.2 GRAPH SHIFT OPERATOR

Graph shift operators (GSOs) (e.g., the adjacency matrix and the Laplacian matrix) are a general

set of linear operators which are used to encode neighbourhood topologies in the graph. Klicpera

et al. (2019) shows that applying the varying GSOs in the message passing step of GNNs can lead

to signiﬁcant improvement of performance. Dasoulas et al. (2021) proposes a parameterized graph

shift operator to automatically adapt to networks with varying sparsity. Isuﬁ et al. (2021) allows

different nodes to use different GSOs to weigh the information of different neighbors. Hadou et al.

(2022) introduces a linear composition of the graph shift operator and time-shift operator to design

space-time ﬁlters for time-varying graph signals. Inspired by these works, in this paper we design a

varying parameterized graph shift operator in Fourier space.

2.3 FOURIER NEURAL OPERATOR

Different from classical neural networks which learn mappings between ﬁnite-dimensional Euclidean

spaces, neural operators learn mappings between inﬁnite-dimensional function spaces Kovachki et al.

(2021b). Fourier neural operators (FNOs), currently the most promising of the neural operators, are

universal, in the sense that they can approximate any continuous operator to the desired accuracy

Kovachki et al. (2021a). Li et al. (2021) formulates a new neural operator by parameterizing the

integral kernel directly in the Fourier space, allowing for an expressive and efﬁcient architecture for

partial differential equations. Guibas et al. (2022) proposes an efﬁcient token mixer that learns to mix

in the Fourier domain which is principled architectural modiﬁcations to FNO. In this paper, we learn

a Fourier graph shift operator by leveraging the Fourier Neural operator.

3 METHODOLOGY

Let us denote the entire MTS raw data as

X∈RN×L

with

variables and

timestamps. Under

the rolling setting, we have window-sized time-series inputs with

timestamps, i.e.,

{X|X⊂

X, X ∈RN×T}

. Accordingly, we formulate the problem of MTS forecasting as learning the spatial-

temporal dependencies simultaneously on a supra-graph

G= (X, S)

attributed to each

. The

supra-graph

contains

N∗T

nodes that represent values of each variable at each timestamp in

, and

S∈R(N∗T)×(N∗T)

is a graph shift operator (GSO) representing the connection structure

. Since the underlying graph is unknown in most MTS scenarios, we assume all nodes in

the supra-graph are connected with each other, i.e., a fully-connected graph, and perform graph

convolutions on the graph to learn spatial-temporal representation. Then, given the observed values of

steps at timestamp

, i.e.,

Xt−T:t∈RN×T

, the task of multi-step multivariate time series

forecasting is to predict the values of

variables for next

steps denoted as

Xt+1:t+τ∈RN×τ

the supra-graph G, formulated as follows:

Xt+1:t+τ= F(Xt−T:t;G; Θ) (1)

where Fis the forecasting model with parameters Θ.

3.1 OVERALL ARCHITECTURE

The overall architecture of our model is illustrated in Fig. 1. Given input data

X∈RN×T

, ﬁrst we

embed the data into embeddings

X∈RN×T×d

by assigning a

-dimension vector for each node

using an embedding matrix

Φ∈RN×T×d

, i.e.,

X=X×Φ

. Instead of directly learning the huge

Figure 1: The network architecture of our proposed model.

embedding matrix, we introduce two small parameter matrices: 1) a variable embedding matrix

φv∈RN×1×d

, and 2) a temporal embedding matrix

φu∈R1×T×d

to factorize

, i.e.,

Φ = φv×φu

Subsequently, we perform 2D discrete Fourier transform (DFT) on each discrete

N×T

spatial-

temporal plane of the embeddings

and obtain the frequency input

X:= DFT(X)∈CN×T×d

We then feed

-layer Edge-Varying Fourier Graph Networks (denoted as

ΨK

) to perform graph

convolutions for capturing the spatial-temporal dependencies simultaneously in the Fourier space.

To make predictions in time domain, we perform 2D inverse Fourier transform to generate repre-

sentations

XΨ:= IDFT(ΨK(X)) ∈RN×T×d

in the time domain. The representation is then fed

to two-layer feed-forward networks (FFN, see more details in Appendix E.4) parameterized with

weights and biases denoted as

φff

to make predictions for future

steps

X∈RN×τ

by one forward

procedure. The L2 loss function for multi-step forecasting can be formulated as:

L(ˆ

X;X; Θ) = X

t



Xt+1:t+τ−Xt+1:t+τ



2(2)

with parameters Θ = {φv, φu, φff ,ΨK}and the groudtruth Xt+1:t+τ⊂Xat timestamp t.

3.2 FOURIER GRAPH SHIFT OPERATOR

According to the discrete signal processing on graphs Sandryhaila & Moura (2013), a graph shift

operator (GSO) is deﬁned as a general family of operators which enable the diffusion of information

over graph structures Gama et al. (2020); Dasoulas et al. (2021).

Deﬁnition 1

(

Graph Shift Operator

)

Given a graph

with

nodes, a matrix

S∈Rn×n

is called

a Graph Shift Operator (GSO) if it satisﬁes Sij = 0 if i6=jand nodes i, j are not connected.

The graph shift operator includes the adjacency, Laplacian matrices and their normalisations as

instances of its class and represents the connection structure of the graph. Accordingly, given the

graph Gattributed to X∈Rn×d, a generally form of spatial-based graph convolution is deﬁned as

O(X) := SXW (3)

with the parameter matrix

W∈Rd×d

. Regarding

n×n

scores, we can deﬁne a matrix-valued

kernel

κ: [n]×[n]−→ Rd×d

with

κ[i, j] = Sij ◦W

, where

[n] = {1,2,· · · , n}

. Then the graph

convolution can be viewed as a kernel summation.

O(X)[i] =

j=1

X[j]κ[i, j]∀i∈[n].(4)

In the special case of the Green’s kernel κ[i, j] = κ[i−j], we can rewrite the kernel summation

O(X)[i] =

j=1

X[j]κ[i−j]=(X∗κ)[i]∀i∈[n].(5)

with

X∗κ

denotes the convolution of discrete sequences

and

. According to the convolution

theorem Katznelson (1970) (see Appendix B), the graph convolution is rewritten as

O(X)(i) = F−1(F(X)F(κ)) (i)∀i∈[n].(6)

where

and

F−1

denote the discrete Fourier transform (DFT) and its inverse (IDFT), respectively.

The multiplication in the Fourier space is a lower-complexity computation compared to the graph

convolution, and DFT can be efﬁciently implemented by the fast Fourier transform (FFT).

Deﬁnition 2

(

Fourier Graph Shift Operator

)

Given a graph

G= (X, S)

with input

X∈Rn×d

and GSO S∈Rn×nand the weight matrix W∈Rd×d, the graph convolution is formulated as

F(SXW ) = F(X)×nF(κ)(7)

where

denotes DFT, satisﬁes

κ[i, j] = κ[i−j]

, and

×n

is matrix multiplication on dimensions

except that of n. We deﬁne S:= F(κ)∈Cn×d×das a Fourier graph shift operator (FGSO).

In particular, turning to our case of the fully-connected supra-graph

with an all-one GSO

S∈

{1}n×n

, it yields the space-invariant kernel

κ[i, j] = Sij ◦W=W

and

F(SXW ) = F(X)F(κ)

Accordingly, we can parameterize FGSO

with a complex-valued matrix

Cd×d

which is space-

invariant and is computationally low costly compared to a varying kernel resulting a parameterized

matrix of

Cn×d×d

. Furthermore, we can extend Deﬁnition 2 to 2D discrete space, i.e., from

[n]

[N]×[T], corresponding to the ﬁnite discrete spatial-temporal space of multivariate time series.

Remarks

. Compared with GSO and FNO, space-invariant FGSO has several advantages. Assume a

graph with

nodes and the embedding dimension

(

d≤n

). 1) Efﬁciency: the time complexity of

space-invariant FGSO is

O(nd log n+nd2)

for DFT, IDFT and the matrix multiplication compared

with that of

O(n2d+nd2)

on a GSO. 2) Scale-free parameters: space-invariant FGSO strategically

O(d2)

parameters for each node and the parameter volume is agnostic to the data scale, while

the parameter count of FNO is O(nd2).

3.3 EDGE-VARYING FOURIER GRAPH NETWORKS

Graph ﬁlters as core operations in signal processing are linear transformations expressed as polyno-

mials of the graph shift operator Isuﬁ et al. (2021); Mateos et al. (2019) and can be used to exactly

model graph convolutions and capture multi-order diffusion on graph structures Segarra et al. (2017).

To capture the time-varying counterparts and adopt different weights to weigh the information of

different neighbors in each diffusion order, the edge-variant graph ﬁlters are deﬁned as follows Isuﬁ

et al. (2021); Segarra et al. (2017): given GSO S∈Rn×ncorresponding to a graph with nnodes,

HEV =S0+S1S0+... +SKSK−1...S0=

k=0

Sk:0 (8)

where

denotes the identity matrix,

{Sk}K

k=1

is a collection of

edge-weighting GSOs sharing

the sparsity pattern of

, and

Sk∈Rn×n

corresponds to the

-th diffusion step. The edge-variant

graph ﬁlters are proved effective to yield a highly discriminative model and lay the foundation for the

uniﬁcation of GCNs and GATs Isuﬁ et al. (2021). We can extend Equation 7 to

HEV

and reformulate

the multi-order graph convolution in a recursive form.

Proposition 1.

Given a graph input

X∈Rn×d

, the

-order graph convolution under the edge-

variant graph ﬁlter HEV =PK

k=0 Sk:0 with {Sk∈Rn×n}K

k=0 is reformulated as follows:

HEV X=F−1 K

k=0

F(X)×nS0:k!s.t. S0:k=S0×n· · · SK−1×nSK(9)

where

Sk∈Cn×d×d

is the

-th FGSO satisfying

F(SkX) = F(X)×nSk

is the identity matrix,

and Fand F−1denote the discrete Fourier transform and its reverse, respectively.

Proposition 1 proved in Appendix C.1 states that we can rewrite the multi-order graph convolution

corresponding to

HEV

as a summation of a recursive multiplication of individual FGSO in the Fourier

space. Considering our case of the supra-graph

, we can similarly adopt

space-invariant FGSOs

and parameterize each FGSO

with a complex-valued matrix

Cd×d

. This saves a large amount of

computation costs and results in a concise form:

HEV X=F−1 K

k=0

F(X)S0:k!s.t. S0:k=

i=0

Si(10)

The recursive composition has a nice property that

S0:k=S0:k−1Sk

, which inspires us to design a

complex-valued feed forward network with

being the complex weights for the

-th layer. However,

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EDGE-VARYINGFOURIERGRAPHNETWORKSFORMULTIVARIATETIMESERIESFORECASTINGKunYiBeijingInstituteofTechnologyyikun@bit.edu.cnQiZhangUniversityofTechnologySydneyqi.zhang-13@student.uts.edu.auLiangHuTongjiUniversityrainmilk@gmail.comHuiHeBeijingInstituteofTechnologyhehui617@bit.edu.cnNingAnHeFeiUniversityofTe...

展开>> 收起<<

EDGE-VARYING FOURIER GRAPH NETWORKS FOR MULTIVARIATE TIMESERIES FORECASTING Kun Yi.pdf

共24页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

EDGE-VARYING FOURIER GRAPH NETWORKS FOR MULTIVARIATE TIMESERIES FORECASTING Kun Yi

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: