SPECTRA NETMULTIVARIATE FORECASTING AND IM - PUTATION UNDER DISTRIBUTION SHIFTS AND MISSING DATA

2025-05-03 0 0 3.45MB 18 页 10玖币
侵权投诉
SPECTRANET:MULTIVARIATE FORECASTING AND IM-
PUTATION UNDER DISTRIBUTION SHIFTS AND MISSING
DATA
Cristian Challu
School of Computer Science
Carnegie Mellon University
cchallu@andrew.cmu.edu
Peihong Jiang
AWS AI Labs
jpeihong@amazon.com
Ying Nian Wu
AWS AI Labs
wunyin@amazon.com
Laurent Callot
AWS AI Labs
lcallot@amazon.com
ABSTRACT
In this work, we tackle two widespread challenges in real applications for time-
series forecasting that have been largely understudied: distribution shifts and miss-
ing data. We propose SpectraNet, a novel multivariate time-series forecast-
ing model that dynamically infers a latent space spectral decomposition to cap-
ture current temporal dynamics and correlations on the recent observed history.
A Convolution Neural Network maps the learned representation by sequentially
mixing its components and refining the output. Our proposed approach can simul-
taneously produce forecasts and interpolate past observations and can, therefore,
greatly simplify production systems by unifying imputation and forecasting tasks
into a single model. SpectraNetachieves SoTA performance simultaneously
on both tasks on five benchmark datasets, compared to forecasting and imputa-
tion models, with up to 92% fewer parameters and comparable training times. On
settings with up to 80% missing data, SpectraNethas average performance im-
provements of almost 50% over the second-best alternative. Our code is available
at https://github.com/cchallu/spectranet.
1 INTRODUCTION
Multivariate time-series forecasting is an essential task in a wide range of domains. Forecasts are
a key input to optimize the production and distribution of goods (B¨
ose et al., 2017), predict health-
care patient outcomes (Chen et al., 2015), plan electricity production (Olivares et al., 2022), build
financial portfolios (Emerson et al., 2019), among other examples. Due to its high potential benefits,
researchers have dedicated many efforts to improving the capabilities of forecasting models, with
breakthroughs in model architectures and performance (Benidis et al., 2022).
The main focus of research in multivariate forecasting has been on accuracy and scalability, to
which the present paper contributes. In addition, we identify two widespread challenges for real
applications which have been largely understudied: distribution shifts and missing data.
We refer to distribution shifts as changes in the time-series behavior. In particular, we focus on
discrepancies in distribution between the train and test data, which can considerably degrade the
accuracy (Kuznetsov & Mohri, 2014; Du et al., 2021). This has become an increasing problem in
recent years with the COVID-19 pandemic, which disrupted all aspects of human activities. Missing
values is a generalized problem in applications. Some common causes include faulty sensors, the
impossibility of gathering data, corruption, and misplacement of information. As we demonstrate in
our experiments, these challenges hinder the performance of current state-of-the-art (SoTA) models,
limiting their use and potential benefits in applications where these problems are predominant.
Work completed during internship at Amazon AWS AI Labs.
1
arXiv:2210.12515v2 [cs.LG] 25 Oct 2022
𝑧!
𝑧"
.
.
.
𝑧#
×
𝑬 = 𝒛𝑩
𝑠#/2 𝑑
𝑩
𝒛
𝑳𝑺𝑺𝑫 𝑪𝑵𝑵
𝒉𝟏
𝑠#
𝒉𝟐
𝑠#/2 𝑛&𝑛'𝑠#𝑀
ConvTranspose1d
Reference
window
Forecasting
window
𝑡 + 𝐻
#
𝒀𝒕"𝑳:𝒕%𝑯
𝑡 − 𝐿 𝑡
𝑌
!
𝑌
"
BN + ReLU
Figure 1: SpectraNet architecture. The Latent Space Spectral Decomposition (LSSD) encodes
shared temporal dynamics of the target window into Fourier waves and polynomial functions. Latent
vector zis inferred with Gradient Descent minimizing reconstruction error on the reference window.
The Convolution Network (CNN) generates the time-series window by sequentially mixing the com-
ponents of the embedding and refining the output.
In this work, we propose SpectraNet, a novel multivariate forecasting model that achieves SoTA
performance in benchmark datasets and is also intrinsically robust to distribution shifts and extreme
cases of missing data. SpectraNet achieves its high accuracy and robustness by dynamically
inferring a latent vector projected on a temporal basis, a process we name latent space spectral
decomposition (LSSD). A series of convolution layers then synthesize both the reference window,
which is used to infer the latent vectors and the forecast window.
To the best of our knowledge, SpectraNet is also the first solution that can simultaneously fore-
cast the future values of a multivariate time series and accurately impute the past missing data. In
practice, imputation models are first used to fill the missing information for all downstream tasks, in-
cluding forecasting. SpectraNet can greatly simplify production systems by unifying imputation
and forecasting tasks into a single model.
The main contributions are:
Latent Vector Inference: methodology to dynamically capture current dynamics of the
target time-series into a latent space, replacing parametric encoders.
Latent Space Spectral Decomposition: representation of a multivariate time-series win-
dow on a shared latent space with temporal dynamics.
SpectraNet: novel multivariate forecasting model that simultaneously imputes missing
data and forecasts future values, with SoTA performance on several benchmark datasets
and demonstrated robustness to distribution shifts and missing values. We will make our
code publicly available upon acceptance.
The remainder of this paper is structured as follows. Section 2 introduces notation and the problem
definition, Section 3 presents our method, Section 4 describes and presents our empirical findings.
Finally, Section 5 concludes the paper. The literature review is included in A.1.
2 NOTATION AND PROBLEM DEFINITION
We introduce a new notation that we believe is lighter than the standard notation while being intu-
itive and formally correct. Let YRM×Tbe a multivariate time-series with Mfeatures and T
timestamps. Let Ya:bRM×(ba)be the observed values for the interval [a, b), that is, Y0:tis
the set of tobservations of Yfrom timestamp 0to timestamp t1while Yt:t+His the set of H
observations of Yfrom timestamp tto timestamp t+H1. Let ym,t Rbe the value of feature
mat timestamp t.
2
In this work we consider the multivariate point forecasting task, which consists of predicting the
future values of a multivariate time-series sequence based on past observations. The main task of
a model FΘwith parameters Θat a timestamp t, is to produce forecasts for the future Hvalues,
denoted by ˆ
Yt:t+H, based on the previous history Y0:t.
ˆ
Yt:t+H=FΘ(Y0:t)(1)
For the imputation task, to impute a missing value ym,t, models are not constrained to only use past
observations. Moreover, they are evaluated on how well they approximate only the missing values.
We evaluate the performance with two common metrics used in the literature, mean squared error
(MSE) and mean absolute error (MAE), given by equation 2 (Hyndman & Athanasopoulos, 2018).
MSE = 1
MH
H1
X
h=0
M
X
m=1
(ym,t+hˆym,t+h)2MAE = 1
MH
H1
X
h=0
M
X
m=1
|ym,t+hˆym,t+h|.(2)
3SPECTRANE T
We start the description of our approach with a general outline of the model and explain each major
component in detail in the following subsections. The overall architecture is illustrated in Figure 1.
SpectraNet is a top-down model that generates a multivariate time-series window of fixed size
sw=L+H, where Lis the length of the reference window and His the forecast horizon, from a la-
tent vector zRd. To produce the forecasts at timestamp t, the model first infers the optimal latent
vector on the reference window, consisting of the last Lvalues, YtL:t, by minimizing the recon-
struction error. This inference step is the main difference between our approach to existing models,
which map the input into an embedding or latent space using an encoder network. The model gen-
erates the full time-series window sw, which includes the forecast window Yt:t+H, with a spectral
decomposition and a Convolutional Neural Network (CNN). The main steps of SpectraNet are
given by
z= arg min
z
L(YtL:t,ˆ
YtL:t(z)) (3)
where zis the inferred latent vector, Lis a reconstruction error metric, and ˆ
YtL:t(z)is given by,
E=LSSD(z,B)(4)
ˆ
YtL:t,ˆ
Yt:t+H=CNNΘ(E)(5)
where LSSD (latent space spectral decomposition) is a basis expansion operation of zover the
predefined temporal basis Bto produce a temporal embedding ERd×dt, and CNN is a Top-Down
Convolutional Neural Network with learnable parameters Θ. The CNN simultaneously produces
both the reconstruction of the past reference window ˆ
YtL:t, used to find the optimal latent vector
for the full window and the forecast ˆ
Yt:t+H.
3.1 LATENT VECTORS INFERENCE
The main difference between our approach and existing models is the inference of the latent vectors
instead of relying on encoders. The latent vectors are inferred by minimizing the MSE between the
observed values YtL:tand the reconstruction ˆ
YtL:t(z)on the reference window. By doing this,
the model dynamically captures the temporal dynamics and correlations between features that best
explain the target time series’ current behavior. Figure 2 demonstrates how SpectraNets output
evolves during the inference of z, adapting to current behaviours on the reference window.
This optimization problem is non-convex, as the reconstruction follows equations 4 and 5. However,
given that all operations are differentiable, we can compute the gradient of the objective function
3
Figure 2: SpectraNets output evolution during latent vector inference with Gradient Descent.
The model maps the latent vector to the complete window, including both reference (of size 104)
and forecasting windows (of size 24), using only information from the former. The temporal basis B
imposes strict dependencies between both windows. This inference process allows SpectraNetto
dynamically adapt to new behaviours and forecast with missing data.
w.r.t to z, allowing the use of gradient-based methods. In particular, we rely on gradient descent
(GD), randomly initializing the latent vector with independent and identically distributed Gaussian
distributions and fixing the learning rate and the number of iterations as hyperparameters.
3.2 LATENT SPACE SPECTRAL DECOMPOSITION
The second component of SpectraNet is the mapping from the latent vectors zinto the temporal
embedding E. Each element of the latent vector zcorresponds to the coefficient of one element of
the temporal basis BRd×dt, where dis the number of elements and dtis the temporal length.
The i-th row of Eis given by,
Ei,:=z
iBi,:(6)
Each element of the basis consists of a predefined template function. Similarly to the N-BEATS
model (Oreshkin et al., 2019), we split the basis into two types of patterns commonly found in
time series: trends, represented by polynomial functions, and seasonal, represented by harmonic
functions. The final basis matrix Bis the row-wise concatenation of the three following matrices.
Btrd
i,t =ti,for i∈ {0, ..., p}, t ∈ {0, ..., dt}
Bcos
i,t = cos(2πit),for i∈ {0, ..., sw/2}, t ∈ {0, ..., dt}
Bsin
i,t = sin(2πit),for i∈ {0, ..., sw/2}, t ∈ {0, ..., dt}
(7)
The temporal embedding Ecorresponds to a latent space spectral decomposition that encodes
shared temporal dynamics of all features in the target window, as the latent vector zselects the
relevant trend and frequency bands. Another crucial reason behind using a predefined basis is to
impose strict temporal dependencies between the reference and forecasting windows. While infer-
ring the latent vector, the forecasting window does not provide information (gradients). If all the
elements of Eare inferred (without a basis), given that we use a CNN with short kernels, the last
values of the temporal embedding that determines the forecasting window cannot be optimized.
3.3 TOP-DOWN CONVOLUTION NETWORK
The last component of the model’s architecture is a top-down CNN, which produces the final forecast
and reconstruction of the reference window for the Mfeatures simultaneously, ˆ
YtL:t+H, from the
4
摘要:

SPECTRANET:MULTIVARIATEFORECASTINGANDIM-PUTATIONUNDERDISTRIBUTIONSHIFTSANDMISSINGDATACristianChalluSchoolofComputerScienceCarnegieMellonUniversitycchallu@andrew.cmu.eduPeihongJiangAWSAILabsjpeihong@amazon.comYingNianWuAWSAILabswunyin@amazon.comLaurentCallotAWSAILabslcallot@amazon.comABSTRACTInthisw...

展开>> 收起<<
SPECTRA NETMULTIVARIATE FORECASTING AND IM - PUTATION UNDER DISTRIBUTION SHIFTS AND MISSING DATA.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:3.45MB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注