MTSMAE Masked Autoencoders for Multivariate Time-Series Forecasting Peiwang Tang12 Xianchao Zhang34

2025-05-02 0 0 456.9KB 8 页 10玖币
侵权投诉
MTSMAE: Masked Autoencoders for Multivariate
Time-Series Forecasting
Peiwang Tang1,2, Xianchao Zhang3,4
1Institute of Advanced Technology, University of Science and Technology of China, China
2G60 STI Valley Industry & Innovation Institute, Jiaxing University, China
3Key Laboratory of Medical Electronics and Digital Health of Zhejiang Province,
Jiaxing University, China
4Engineering Research Center of Intelligent Human Health Situation Awareness of Zhejiang Province,
Jiaxing University, China
{tpw}@mail.ustc.edu.cn, {zhangxianchao}@zjxu.edu.cn
Abstract—Large-scale self-supervised pre-training Trans-
former architecture have significantly boosted the performance
for various tasks in natural language processing (NLP) and
computer vision (CV). However, there is a lack of researches on
processing multivariate time-series by pre-trained Transformer,
and especially, current study on masking time-series for self-
supervised learning is still a gap. Different from language
and image processing, the information density of time-series
increases the difficulty of research. The challenge goes further
with the invalidity of the previous patch embedding and mask
methods. In this paper, according to the data characteristics of
multivariate time-series, a patch embedding method is proposed,
and we present an self-supervised pre-training approach based
on Masked Autoencoders (MAE), called MTSMAE, which can
improve the performance significantly over supervised learning
without pre-training. Evaluating our method on several common
multivariate time-series datasets from different fields and with
different characteristics, experiment results demonstrate that the
performance of our method is significantly better than the best
method currently available.
Index Terms—Autoencoder, Pre-Training, Time-Series, Fore-
casting
I. INTRODUCTION
With the fast evolvement of deep learning in recent years
[1]–[3], training a model is anticipated to accommodate hun-
dreds of millions of labeled data [4]. The demand for large
scale data processing has been solved by self-supervised pre-
training in natural language processing (NLP) and computer
vision (CV) fields [5], [6]. Most of these solutions are based on
masked modeling, such as masked language modeling in NLP
[7], [8] or masked image modeling in CV [9]–[11]. Their ideas
are conceptually simple: firstly mask parts of the data based
on the original data and then enable to recovery these parts by
learning [12], [13]. Masked modeling encourages the model
to infer the deleted parts according to the context information,
so that the model can learn the deep semantics, which has
become the benchmark of self-supervised pre-training in NLP
and CV fields [5], [6]. These pre-trained masked modeling has
been proved to be well applied to various downstream tasks,
among which a simpler and more effective way is masked
autoencoders (MAE) [5]. However, despite a widely interests
in this idea from academia and industry following the success
of MAE, the progress of autoencoder methods in the field of
multivariate time-series data (MTSD) lags behind other fields.
One of the main reasons is that the information density
of MTSD is different from that of CV and NLP. The local
information of MTSD seems to be heavy spatial redundancy,
but the multivariate information within each time point has
high specificity. Missing information can be easily learned
from information at adjacent time points with little high-
level understanding. In order to overcome this difference and
encourage learning more useful features, we use the idea
of Vision Transformer (ViT) [4], patch MTSD, and mask
more random patches than the original MAE, e.g. 85%. This
simple strategy has a ideal performance in MSTD, which can
reduce redundancy effectively and futher increase the overall
understanding beyond low-level information of the model.
Another reason is the design of the decoder. The decoder
of the autoencoder maps the latent representation back to
the input. In CV, the decoder can reconstruct the pixel level
representation in patch. In NLP, the decoder predicts missing
words. In MTSD, the decoder recovers completely different
data with information specificity and a dimension that even
highly to 321. For different types of MTSD, the dimensions
of data at each time point are different, which may have 321
dimensions or only 7 dimensions. We find that the design of
decoder plays a key role in the latent representation of learning
for MTSD.
Based on the above analysis, we propose a very simple and
effective MTSMAE for MTSD representation learning. Our
MTSMAE idea is simple: In the pre-training, patch MTSD,
masks random patches from the input and recover the missing
patches; in the fine-tuning, take out the encoder trained in the
previous step, and the input of the decoder is redesigned. In
addition, our encoder only calculates visible patches. Unlike
the decoder with only one layer of MLP in Bert [6], we design
different levels of decoder according to different MTSD, but
compared with MAE, our decoder are all lightweight. We have
conducted extensive experiments on four different datasets
of three types. The final experimental results show that our
proposed MTSMAE can significantly improve the accuracy
of prediction, and is superior to other state-of-the-art models.
arXiv:2210.02199v1 [cs.LG] 4 Oct 2022
GHFRGHU
LQSXW
HQFRGHU
5GHFRGHU
SDWFK
HPEHGGLQJ
SDWFK
HPEHGGLQJ
HQFRGHU
LQSXW HQFRGHU HQFRGHU
RXWSXW
ODEHO
SDUW
]HUR
SDUW
SDWFK
HPEHGGLQJ
QRQSDWFK
HPEHGGLQJ
SRVLWLRQHPEHGGLQJ
3GHFRGHU
SUHGLFWLRQ
G
6FDODU
(PEHGGLQJ
G
G
(PEHGGLQJ
(PEHGGLQJ
)HDWXUH
0DS
*OREDO7LPH/RFDO7LPH
(a) Pre-Training
GHFRGHU
LQSXW
HQFRGHU
5GHFRGHU
SDWFK
HPEHGGLQJ
SDWFK
HPEHGGLQJ
HQFRGHU
LQSXW HQFRGHU HQFRGHU
RXWSXW
ODEHO
SDUW
]HUR
SDUW
SDWFK
HPEHGGLQJ
QRQSDWFK
HPEHGGLQJ
SRVLWLRQHPEHGGLQJ
3GHFRGHU
SUHGLFWLRQ
G
6FDODU
(PEHGGLQJ
G
G
(PEHGGLQJ
(PEHGGLQJ
)HDWXUH
0DS
*OREDO7LPH/RFDO7LPH
(b) Fine-Tuning
Fig. 1. Our MTSMAE architecture. In the pre-training, our model consists of encoder and Rdeocder (the decoder responsible for recovering the original
input); in the fine-tuning, our model consists of encoder and Pdeocder (the decoder responsible for predicting future data).
II. RELATED WORK
A. Masked modeling for Self-supervised learning
Self-supervised learning approaches have aroused great in-
terest in natural language processing and computer vision,
often focusing on different pretext tasks for pre-training [14]–
[16]. BERT [6] and GPT [7], [8], [17] are very successful
masked modeling for pre-training in NLP. They learn repre-
sentations from the original input corrupted by masking, which
have been proved by a large amount of evidence to be highly
extensible [17], and these pre-trained representations can be
well extended to various downstream tasks. As the original
method applied BERT to CV, BEIT [10] firstly “tokenize” the
original image as a visual token and then randomly mask some
image patches and input them into the backbone Transformer
[2]. The goal of pre-training is to restore the original visual
token from the damaged image patch. Based masked modeling
as all autoencoder [18], MAE [5] uses the encoder to map the
observed signal to the potential representation, and the decoder
to reconstruct the original signal from the latent representation.
In turn, different from the classic autoencoder, MAE adopts
an asymmetric design to allow the encoder to operate on only
part of the observed signal (without mask token) and use a
lightweight decoder to reconstruct the complete signal from
the latent representation along with mask tokens.
B. Models for Time-Series Forecasting
Forecasting is one of the most important applications of
time-series. LSTNet [19] uses convolutional neural network
(CNN) [1] and recurrent neural network (RNN) to extract
short-term local dependence patterns between variables and
find long-term patterns of time-series trends. In addition, the
traditional autoregressive model is used to solve the scale
insensitivity problem of neural network model. The temporal
convolutional network (TCN) [20] is proposed to make the
CNN have time-series characteristics, as a variant of CNN
that deals with sequence modeling tasks. It mixes RNN and
CNN architecture to use the causal convolution to simulate
temporal causality. Reformer [21] is proposed to replace the
original dot-product attention with a new one using locality-
sensitive hashing. It decreases the complexity from O(L2)
to O(Llog L)and makes the storage activated only once in
the training process rather than ntimes (here nrefers to
the number of layers) by using reversible residual layers to
replace standard residuals. LogTrans [22] proposes a convolu-
tional self-attention mechanism, which uses causal convolution
to process the local context of the sequence and calculate
the query / key of self-attention. Further more, it uses the
logsparse Transformer architecture to ensure that each cell can
receive signals from other cells in the sequence data, while
reducing the time complexity of the architecture. Informer
[23] is proposed ProbSparse self-attention and encoder with
self-attention distilling. The former is based on query and
key similarity sampling dot-product pairs, which reduces the
computational complexity of Transformer and allows it to
accept longer input. The latter adopts the concept of distillation
to design encoder, so that the model can continuously extract
the feature vectors that focus on, while reducing the memory
occupation.
III. METHODOLOGY
The problem of multivariate time-series forecasting is to in-
put the past sequence Xt=xt
1,· · · , xt
Lx|xt
iRdxat time
t, and output the predict the corresponding future sequence
Yt=nyt
1,· · · , yt
Ly|yt
iRdyo, where Lxand Lyare the
lengths of input and output sequences respectively, and dx
and dyare the feature dimensions of input Xand output Y
respectively.
Our masked autoencoders (MTSMAE) is a simple autoen-
coding method and the training process is divided into two
stages, as shown in the Fig. 1. As all autoencoders, there is
摘要:

MTSMAE:MaskedAutoencodersforMultivariateTime-SeriesForecastingPeiwangTang1;2,XianchaoZhang3;41InstituteofAdvancedTechnology,UniversityofScienceandTechnologyofChina,China2G60STIValleyIndustry&InnovationInstitute,JiaxingUniversity,China3KeyLaboratoryofMedicalElectronicsandDigitalHealthofZhejiangProvi...

展开>> 收起<<
MTSMAE Masked Autoencoders for Multivariate Time-Series Forecasting Peiwang Tang12 Xianchao Zhang34.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:456.9KB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注