SPECTRA NETMULTIVARIATE FORECASTING AND IM - PUTATION UNDER DISTRIBUTION SHIFTS AND MISSING DATA

2025-05-03 0 0 3.45MB 18 页 10玖币

侵权投诉

SPECTRANET:MULTIVARIATE FORECASTING AND IM-

PUTATION UNDER DISTRIBUTION SHIFTS AND MISSING

DATA

Cristian Challu ∗

School of Computer Science

Carnegie Mellon University

cchallu@andrew.cmu.edu

Peihong Jiang

AWS AI Labs

jpeihong@amazon.com

Ying Nian Wu

AWS AI Labs

wunyin@amazon.com

Laurent Callot

AWS AI Labs

lcallot@amazon.com

ABSTRACT

In this work, we tackle two widespread challenges in real applications for time-

series forecasting that have been largely understudied: distribution shifts and miss-

ing data. We propose SpectraNet, a novel multivariate time-series forecast-

ing model that dynamically infers a latent space spectral decomposition to cap-

ture current temporal dynamics and correlations on the recent observed history.

A Convolution Neural Network maps the learned representation by sequentially

mixing its components and reﬁning the output. Our proposed approach can simul-

taneously produce forecasts and interpolate past observations and can, therefore,

greatly simplify production systems by unifying imputation and forecasting tasks

into a single model. SpectraNetachieves SoTA performance simultaneously

on both tasks on ﬁve benchmark datasets, compared to forecasting and imputa-

tion models, with up to 92% fewer parameters and comparable training times. On

settings with up to 80% missing data, SpectraNethas average performance im-

provements of almost 50% over the second-best alternative. Our code is available

at https://github.com/cchallu/spectranet.

1 INTRODUCTION

Multivariate time-series forecasting is an essential task in a wide range of domains. Forecasts are

a key input to optimize the production and distribution of goods (B¨

ose et al., 2017), predict health-

care patient outcomes (Chen et al., 2015), plan electricity production (Olivares et al., 2022), build

ﬁnancial portfolios (Emerson et al., 2019), among other examples. Due to its high potential beneﬁts,

researchers have dedicated many efforts to improving the capabilities of forecasting models, with

breakthroughs in model architectures and performance (Benidis et al., 2022).

The main focus of research in multivariate forecasting has been on accuracy and scalability, to

which the present paper contributes. In addition, we identify two widespread challenges for real

applications which have been largely understudied: distribution shifts and missing data.

We refer to distribution shifts as changes in the time-series behavior. In particular, we focus on

discrepancies in distribution between the train and test data, which can considerably degrade the

accuracy (Kuznetsov & Mohri, 2014; Du et al., 2021). This has become an increasing problem in

recent years with the COVID-19 pandemic, which disrupted all aspects of human activities. Missing

values is a generalized problem in applications. Some common causes include faulty sensors, the

impossibility of gathering data, corruption, and misplacement of information. As we demonstrate in

our experiments, these challenges hinder the performance of current state-of-the-art (SoTA) models,

limiting their use and potential beneﬁts in applications where these problems are predominant.

∗Work completed during internship at Amazon AWS AI Labs.

arXiv:2210.12515v2 [cs.LG] 25 Oct 2022

𝑧!

𝑧"

𝑧#

×⋮

𝑬 = 𝒛∗𝑩

𝑠#/2 𝑑

𝑩

𝒛∗

𝑳𝑺𝑺𝑫 𝑪𝑵𝑵

𝒉𝟏

𝑠#

𝒉𝟐

𝑠#/2 𝑛&𝑛'𝑠#𝑀

ConvTranspose1d

Reference

window

Forecasting

window

𝑡 + 𝐻

𝒀𝒕"𝑳:𝒕%𝑯

𝑡 − 𝐿 𝑡

𝑌

⋮

BN + ReLU

Figure 1: SpectraNet architecture. The Latent Space Spectral Decomposition (LSSD) encodes

shared temporal dynamics of the target window into Fourier waves and polynomial functions. Latent

vector zis inferred with Gradient Descent minimizing reconstruction error on the reference window.

The Convolution Network (CNN) generates the time-series window by sequentially mixing the com-

ponents of the embedding and reﬁning the output.

In this work, we propose SpectraNet, a novel multivariate forecasting model that achieves SoTA

performance in benchmark datasets and is also intrinsically robust to distribution shifts and extreme

cases of missing data. SpectraNet achieves its high accuracy and robustness by dynamically

inferring a latent vector projected on a temporal basis, a process we name latent space spectral

decomposition (LSSD). A series of convolution layers then synthesize both the reference window,

which is used to infer the latent vectors and the forecast window.

To the best of our knowledge, SpectraNet is also the ﬁrst solution that can simultaneously fore-

cast the future values of a multivariate time series and accurately impute the past missing data. In

practice, imputation models are ﬁrst used to ﬁll the missing information for all downstream tasks, in-

cluding forecasting. SpectraNet can greatly simplify production systems by unifying imputation

and forecasting tasks into a single model.

The main contributions are:

•Latent Vector Inference: methodology to dynamically capture current dynamics of the

target time-series into a latent space, replacing parametric encoders.

•Latent Space Spectral Decomposition: representation of a multivariate time-series win-

dow on a shared latent space with temporal dynamics.

•SpectraNet: novel multivariate forecasting model that simultaneously imputes missing

data and forecasts future values, with SoTA performance on several benchmark datasets

and demonstrated robustness to distribution shifts and missing values. We will make our

code publicly available upon acceptance.

The remainder of this paper is structured as follows. Section 2 introduces notation and the problem

deﬁnition, Section 3 presents our method, Section 4 describes and presents our empirical ﬁndings.

Finally, Section 5 concludes the paper. The literature review is included in A.1.

2 NOTATION AND PROBLEM DEFINITION

We introduce a new notation that we believe is lighter than the standard notation while being intu-

itive and formally correct. Let Y∈RM×Tbe a multivariate time-series with Mfeatures and T

timestamps. Let Ya:b∈RM×(b−a)be the observed values for the interval [a, b), that is, Y0:tis

the set of tobservations of Yfrom timestamp 0to timestamp t−1while Yt:t+His the set of H

observations of Yfrom timestamp tto timestamp t+H−1. Let ym,t ∈Rbe the value of feature

mat timestamp t.

In this work we consider the multivariate point forecasting task, which consists of predicting the

future values of a multivariate time-series sequence based on past observations. The main task of

a model FΘwith parameters Θat a timestamp t, is to produce forecasts for the future Hvalues,

denoted by ˆ

Yt:t+H, based on the previous history Y0:t.

Yt:t+H=FΘ(Y0:t)(1)

For the imputation task, to impute a missing value ym,t, models are not constrained to only use past

observations. Moreover, they are evaluated on how well they approximate only the missing values.

We evaluate the performance with two common metrics used in the literature, mean squared error

(MSE) and mean absolute error (MAE), given by equation 2 (Hyndman & Athanasopoulos, 2018).

MSE = 1

H−1

h=0

m=1

(ym,t+h−ˆym,t+h)2MAE = 1

H−1

h=0

m=1

|ym,t+h−ˆym,t+h|.(2)

3SPECTRANE T

We start the description of our approach with a general outline of the model and explain each major

component in detail in the following subsections. The overall architecture is illustrated in Figure 1.

SpectraNet is a top-down model that generates a multivariate time-series window of ﬁxed size

sw=L+H, where Lis the length of the reference window and His the forecast horizon, from a la-

tent vector z∈Rd. To produce the forecasts at timestamp t, the model ﬁrst infers the optimal latent

vector on the reference window, consisting of the last Lvalues, Yt−L:t, by minimizing the recon-

struction error. This inference step is the main difference between our approach to existing models,

which map the input into an embedding or latent space using an encoder network. The model gen-

erates the full time-series window sw, which includes the forecast window Yt:t+H, with a spectral

decomposition and a Convolutional Neural Network (CNN). The main steps of SpectraNet are

given by

z∗= arg min

L(Yt−L:t,ˆ

Yt−L:t(z)) (3)

where z∗is the inferred latent vector, Lis a reconstruction error metric, and ˆ

Yt−L:t(z)is given by,

E=LSSD(z,B)(4)

Yt−L:t,ˆ

Yt:t+H=CNNΘ(E)(5)

where LSSD (latent space spectral decomposition) is a basis expansion operation of zover the

predeﬁned temporal basis Bto produce a temporal embedding E∈Rd×dt, and CNN is a Top-Down

Convolutional Neural Network with learnable parameters Θ. The CNN simultaneously produces

both the reconstruction of the past reference window ˆ

Yt−L:t, used to ﬁnd the optimal latent vector

for the full window and the forecast ˆ

Yt:t+H.

3.1 LATENT VECTORS INFERENCE

The main difference between our approach and existing models is the inference of the latent vectors

instead of relying on encoders. The latent vectors are inferred by minimizing the MSE between the

observed values Yt−L:tand the reconstruction ˆ

Yt−L:t(z)on the reference window. By doing this,

the model dynamically captures the temporal dynamics and correlations between features that best

explain the target time series’ current behavior. Figure 2 demonstrates how SpectraNet’s output

evolves during the inference of z, adapting to current behaviours on the reference window.

This optimization problem is non-convex, as the reconstruction follows equations 4 and 5. However,

given that all operations are differentiable, we can compute the gradient of the objective function

Figure 2: SpectraNet’s output evolution during latent vector inference with Gradient Descent.

The model maps the latent vector to the complete window, including both reference (of size 104)

and forecasting windows (of size 24), using only information from the former. The temporal basis B

imposes strict dependencies between both windows. This inference process allows SpectraNetto

dynamically adapt to new behaviours and forecast with missing data.

w.r.t to z, allowing the use of gradient-based methods. In particular, we rely on gradient descent

(GD), randomly initializing the latent vector with independent and identically distributed Gaussian

distributions and ﬁxing the learning rate and the number of iterations as hyperparameters.

3.2 LATENT SPACE SPECTRAL DECOMPOSITION

The second component of SpectraNet is the mapping from the latent vectors zinto the temporal

embedding E. Each element of the latent vector zcorresponds to the coefﬁcient of one element of

the temporal basis B∈Rd×dt, where dis the number of elements and dtis the temporal length.

The i-th row of Eis given by,

Ei,:=z∗

iBi,:(6)

Each element of the basis consists of a predeﬁned template function. Similarly to the N-BEATS

model (Oreshkin et al., 2019), we split the basis into two types of patterns commonly found in

time series: trends, represented by polynomial functions, and seasonal, represented by harmonic

functions. The ﬁnal basis matrix Bis the row-wise concatenation of the three following matrices.

Btrd

i,t =ti,for i∈ {0, ..., p}, t ∈ {0, ..., dt}

Bcos

i,t = cos(2πit),for i∈ {0, ..., sw/2}, t ∈ {0, ..., dt}

Bsin

i,t = sin(2πit),for i∈ {0, ..., sw/2}, t ∈ {0, ..., dt}

(7)

The temporal embedding Ecorresponds to a latent space spectral decomposition that encodes

shared temporal dynamics of all features in the target window, as the latent vector z∗selects the

relevant trend and frequency bands. Another crucial reason behind using a predeﬁned basis is to

impose strict temporal dependencies between the reference and forecasting windows. While infer-

ring the latent vector, the forecasting window does not provide information (gradients). If all the

elements of Eare inferred (without a basis), given that we use a CNN with short kernels, the last

values of the temporal embedding that determines the forecasting window cannot be optimized.

3.3 TOP-DOWN CONVOLUTION NETWORK

The last component of the model’s architecture is a top-down CNN, which produces the ﬁnal forecast

and reconstruction of the reference window for the Mfeatures simultaneously, ˆ

Yt−L:t+H, from the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

SPECTRANET:MULTIVARIATEFORECASTINGANDIM-PUTATIONUNDERDISTRIBUTIONSHIFTSANDMISSINGDATACristianChalluSchoolofComputerScienceCarnegieMellonUniversitycchallu@andrew.cmu.eduPeihongJiangAWSAILabsjpeihong@amazon.comYingNianWuAWSAILabswunyin@amazon.comLaurentCallotAWSAILabslcallot@amazon.comABSTRACTInthisw...

展开>> 收起<<

SPECTRA NETMULTIVARIATE FORECASTING AND IM - PUTATION UNDER DISTRIBUTION SHIFTS AND MISSING DATA.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

SPECTRA NETMULTIVARIATE FORECASTING AND IM - PUTATION UNDER DISTRIBUTION SHIFTS AND MISSING DATA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: