TILDE-Q a Transformation Invariant Loss Function for Time-Series Forecasting

2025-05-06 0 0 1.67MB 17 页 10玖币
侵权投诉
TILDE-Q: a Transformation Invariant Loss Function
for Time-Series Forecasting
Hyunwook Lee 1Chunggi Lee 2Hongkyu Lim 3Sungahn Ko 1
Abstract
Time-series forecasting has gained increasing at-
tention in the field of artificial intelligence due to
its potential to address real-world problems across
various domains, including energy, weather, traf-
fic, and economy. While time-series forecasting is
a well-researched field, predicting complex tem-
poral patterns such as sudden changes in sequen-
tial data still poses a challenge with current mod-
els. This difficulty stems from minimizing
Lp
norm distances as loss functions, such as mean ab-
solute error (MAE) or mean square error (MSE),
which are susceptible to both intricate temporal
dynamics modeling and signal shape capturing.
Furthermore, these functions often cause models
to behave aberrantly and generate uncorrelated re-
sults with the original time-series. Consequently,
the development of a shape-aware loss function
that goes beyond mere point-wise comparison is
essential. In this paper, we examine the definition
of shape and distortions, which are crucial for
shape-awareness in time-series forecasting, and
provide a design rationale for the shape-aware
loss function. Based on our design rationale, we
propose a novel, compact loss function called
TILDE-Q (Transformation Invariant Loss func-
tion with Distance EQuilibrium) that considers
not only amplitude and phase distortions but also
allows models to capture the shape of time-series
sequences. Furthermore, TILDE-Q supports the
simultaneous modeling of periodic and nonpe-
riodic temporal dynamics. We evaluate the effi-
cacy of TILDE-Q by conducting extensive experi-
ments under both periodic and nonperiodic con-
ditions with various models ranging from naive
to state-of-the-art. The experimental results show
that the models trained with TILDE-Q surpass
1
Department of Artificial intelligence, Ulsan National Institute
of Science and Technology, Ulsan, Republic of Korea
2
School
of Engineering and Applied Sciences, Harvard University, Mas-
sachusetts, United States of America
3
Hyundai, Seoul, Republic
of Korea. Correspondence to: Sungahn Ko <sako@unist.ac.kr>.
those trained with other metrics, such as MSE
and DILATE, in various real-world applications,
including electricity, traffic, illness, economics,
weather, and electricity transformer temperature
(ETT). Official codes are available at
https:
//github.com/HyunWookL/TILDE-Q
1. Introduction
Time-series forecasting has been a core problem across vari-
ous domains, including traffic domain (Li et al., 2018; Lee
et al., 2020), economy (Zhu & Shasha, 2002), and disease
propagation analysis (Matsubara et al., 2014). One of the
key challenges in time-series forecasting is the modeling of
complex temporal dynamics (e.g., non-stationary signal and
periodicity). Temporal dynamics, intuitively, shape, is the
most emphasized keywords in time-series domains, such
as rush hour of traffic data or abnormal usage of electric-
ity (Keogh et al., 2003; Bakshi & Stephanopoulos, 1994;
Weigend & Gershenfeld, 1994; Wu et al., 2021; Zhou et al.,
2022).
Although deep learning methods are an appealing solution
to model complex non-linear temporal dependencies and
nonstationary signals, recent studies have revealed that even
deep learning is often inadequate to model temporal dy-
namics. To properly model temporal dynamics, novel deep
learning approaches, such as Autoformer (Wu et al., 2021)
and FEDFormer (Zhou et al., 2022), have proposed input
sequence decomposition. Still, they are trained with
Lp
norm-based loss function, which could not properly model
the temporal dynamics, as shown in Fig. 1, (top). On the
other hand, Le Guen & Thome (2019) attempt to model sud-
den changes in a timely and accurate manner with dynamic
time warping (DTW), and Bica et al. (2020) adopt domain
adversarial training to learn balanced representations, which
is a treatment invariant representations over time. Le Guen
& Thome (2019); Bica et al. (2020) try to capture the shape
but still have some limitations, as depicted in Fig. 1 (middle),
implying the need for further investigation of the shape.
The identification of shape, denoting the pattern in time-
series data within a given time interval, plays an important
role in addressing aforementioned limitation in time-series
1
arXiv:2210.15050v2 [cs.LG] 13 Mar 2024
Under Review for International Conference on Machine Learning 2024
Figure 1.
Ground-truth and forecasting results of Informer model with three training metrics, as shown in the blue box: (top) MSE,
(middle) DTW-based, and (bottom) TILDE-Q loss function. (top, middle) The blue boxes indicates the original intention of loss function
(desired) and misbehaviors.
forecasting problem. It can provide valuable information,
such as rise, drop, trough, peak, and plateau. We refer to the
prediction as informative when it can appropriately model
the shape. In real-world applications, including economics,
informative prediction is invaluable for decision-making. To
achieve such informative forecasting, a model should ac-
count for shape instead of solely aiming to forecast accurate
value for each time step. However, existing methods inad-
equately consider the shape (Wu et al., 2021; Zhou et al.,
2022; Bica et al., 2020; Le Guen & Thome, 2019). More-
over, deep learning model tends to opt for an easy learning
path (Karras et al., 2019), yielding inaccurate and uninfor-
mative forecasting results disregarding the characteristics
of time-series data. Fig. 1 illustrates three real forecasting
results obtained with Informer (Zhou et al., 2021) and differ-
ent training metrics. When the mean squared error (MSE) is
used as an objective, the model aims to reduce the gap be-
tween prediction and ground truth for each time-step. This
“point-wise” distance-based optimization has less ability to
model shape, resulting in generating uninformative predic-
tions regardless of temporal dynamics (Fig. 1 (top)); the
model rarely provides information about the time-series. In
contrast, if both gap and shape of the prediction and ground
truth are taken into account, the model can achieve high
accuracy with proper temporal dynamics, as shown in Fig. 1
(bottom). Consequently, time-series forecasting requires a
loss function that consider both point-wise distance (i.e.,
traditional goal) and shape.
In this work, we aim to design a novel objective function
that guides models in improving forecasting performance
by learning shapes in time-series data. To design a shape-
aware loss function, we review existing literature (Esling
& Agon, 2012; Bakshi & Stephanopoulos, 1994; Keogh,
2003) and explore the concepts of shapes and distortions
that impede appropriate measurement of similarity between
two time-series data in terms of shapes (Sec. 3.1, Sec. 3.2,
and Sec. 3.3). Based on our investigation, we propose the
necessary conditions for constructing an objective function
for shape-aware time-series forecasting (Sec. 4.1). Subse-
quently, we present a novel loss function, TILDE-Q (Trans-
formation Invariant Loss function with Distance EQualib-
rium), which enables shape-aware representation learning
by utilizing three loss terms that are invariant to distortions
(Sec. 4.2). For evaluation, we conduct extensive experiments
with state-of-the-art deep learning models with TILDE-Q.
The experimental results indicate that TILDE-Q is model-
agnostic and outperforms MSE and DILATE in MSE and
shape-related metrics.
Contributions In summary, our study makes the follow-
ing contributions. (1) We delve into the concept of shape
awareness and distortion invariances in the context of time-
series forecasting. By thoroughly investigating these distor-
tions, we enhance our understanding of their impact on time-
series forecasting problems. (2) We propose and implement
TILDE-Q, which has invariances to three distortions and
achieves shape-awareness, empowering informative fore-
casting in a timely manner. (3) We empirically demonstrate
that the proposed TILDE-Q allows models to have higher
accuracy compared to the models trained with other existing
metrics, such as MSE and DILATE.
2. Related Work
2.1. Time-Series Forecasting
Many time-series forecasting methods are available, rang-
ing from traditional models, such as ARIMA model (Box
et al., 2015) and hidden Markov model (Pesaran et al., 2004),
2
Under Review for International Conference on Machine Learning 2024
to recent deep learning models. In this section, we briefly
describe the recent deep learning models for time-series
forecasting. Motivated by the huge success of recurrent neu-
ral networks (RNNs) (Clevert et al., 2016; Li et al., 2018; Yu
et al., 2017), many novel deep learning architectures have
been developed for improving forecasting performance. To
effectively capture long-term dependency, which is a limi-
tation of RNNs, Stoller et al. (2020) have proposed convo-
lutional neural networks (CNNs). However, it is required
to stack lots of the same CNNs to capture long-term depen-
dency (Zhou et al., 2021). Attention-based models, includ-
ing Transformer (Vaswani et al., 2017) and Informer (Zhou
et al., 2021), have been another popular research direction in
time-series forecasting. Although these models effectively
capture temporal dependencies, they incur high computa-
tional costs and often struggle to obtain appropriate temporal
information (Wu et al., 2021). To cope with the problem,
Wu et al. (2021); Zhou et al. (2022) have adopted the input
decomposition method, which helps models better encode
appropriate information. Other state-of-the-art models adopt
neural memory networks (Kaiser et al., 2017; Sukhbaatar
et al., 2015; Madotto et al., 2018; Lee et al., 2022), which
refer to historical data stored in the memory to generate
meaningful representation.
2.2. Training Metrics
Conventionally, mean squared error (MSE),
Lp
norm and its
variants are mainstream metrics used to optimize forecasting
models. However, they are not optimal for training forecast-
ing models (Esling & Agon, 2012) because the time-series is
temporally continuous. Moreover, the
Lp
norm provides less
information about temporal correlation among time-series
data. To better model temporal dynamics in time-series data,
researchers have used differentiable, approximated dynamic
time warping (DTW) as an alternative metric of MSE (Cu-
turi & Blondel, 2017; Abid & Zou, 2018; Mensch & Blondel,
2018). However, using DTW as a loss function results in
temporal localization of changes being ignored. Recently,
Le Guen & Thome (2019) have suggested DILATE, a train-
ing metric to catch sudden changes of nonstationary signals
in a timely manner with smooth approximation of DTW
and penalized temporal distortion index (TDI). To guaran-
tee DILATE’s operation in a timely manner, penalized TDI
issues a harsh penalty when predictions showed high tem-
poral distortion. However, the TDI relies on the DTW path,
and DTW often showed misalignment because of noise and
scale sensitivity. Thus, DILATE often loses its advantage
with complex data, showing disadvantages at the training. In
this paper, we discuss distortions and transformation invari-
ances and design a new loss function that enables models to
learn shapes in the data and produce noise-robust forecasting
results.
3. Preliminary
In this section, we investigate common distortions focus-
ing on the goal of time-series forecasting (i.e., modeling
temporal dynamics and accurate forecasting). To clarify the
concepts of time-series forecasting and related terms, we
first define the notations and terms used (Sec. 3.1). We then
discuss common distortions in time-series from the transfor-
mation perspective that need to be considered for building
a shape-aware loss function (Sec. 3.2) and describe how
other loss functions (e.g., dynamic time warping (DTW)
and temporal distortion index (TDI)) handle shapes during
learning (Sec. 3.3). We will discuss the conditions for effec-
tive time-series forecasting in the next session (Sec. 4.1).
3.1. Notations and Definitions
Let
Xt
denote a data point at a time step
t
. We define a
time-series forecasting problem as follows:
Definition 3.1. Given
T
-length historical time-series
X=
[XtT+1, . . . , Xt], XiRF
at time
i
and a corresponding
T
-length future time-series
Y= [Yt+1, . . . , Yt+T], Yi
RC
, time-series forecasting aims to learn the mapping func-
tion f:RT×FRT×C.
To distinguish between the label (i.e., ground truth) and
prediction time-series data, we note the label data as
Y
and
prediction data as
ˆ
Y
. Next, we set up two goals for time-
series forecasting, which require not only precise but also
informative forecasting (Wu et al., 2021; Zhou et al., 2022;
Le Guen & Thome, 2019) as follows:
The mapping function
f
should be learnt to point-
wisely reduce distance between ˆ
Yand Y;
The output
ˆ
Y
should have similar temporal dynamics
with Y.
Temporal dynamics are informative patterns in a time-series,
such as rise, drop, peak, and plateau. The optimization for
point-wise distance reduction is a conventional method used
in the deep learning domain, which can be obtained using
the MAE or MSE. However, in a real-world problem, such as
traffic speed or stock market prediction, accurate forecasting
of temporal dynamics is required. Esling & Agon (2012)
also emphasized the measurement of temporal dynamics, as
“...allowing the recognition of perceptually similar objects
even though they are not mathematically identical.In this
paper, we define temporal dynamics as follows:
Definition 3.2. Temporal dynamics (or shapes) are informa-
tive periodic and nonperiodic patterns in time-series data.
In this work, we aim to design a shape-aware loss function
that satisfies both goals. To this end, we first discuss distor-
tions that two time-series with similar shapes can have.
3
Under Review for International Conference on Machine Learning 2024
Figure 2. Example of the six distortions on the amplitude axis (top) and temporal axis (bottom).
Definition 3.3. Given two time-series
F
and
G
having
similar shapes but not being mathematically identical, let
H
is transformation that satisfies
F=H(G)
. Then, the
time-series
F
and
G
are considered to have a distortion,
which can be represented by the transformation H.
A distortion can generally be classified as a temporal distor-
tion (i.e., warping) or an amplitude distortion (i.e., scaling)
depending on its dimension–time and amplitude. Existing
distortions in the data lead to misbehavior of the model, as
they distort the measurements to be inaccurate. For exam-
ple, if we have two time-series
F
and
G=F+k
, which
have similar shapes but different means,
G
could represent
many temporal dynamics of
F
. However, measurements
often evaluate
F
and
G
as completely different signals and
cause misguidance of the model in training (e.g., measuring
the distance of
F
and
G
with MSE). As such, it is important
to have measurements that consider a similar shape invariant
to distortion. We define a measurement for distortion as:
Definition 3.4. Let transformation
H
represent a distortion
H
. Then, we call measurement
D
invariant to
H
if
δ > 0 :
D(T,H(T)) < δ for any time-series T.
3.2. Time-Series Distortions in Transformation
Perspectives
Distortion, a gap between two similar time-series, affects
shape capturing in time-series data. Thus, it is important
to investigate different distortions and their impacts on rep-
resentation learning aspects. There are six common time-
series distortions that models encounter during learning (Es-
ling & Agon, 2012; Batista et al., 2014; Berkhin, 2006;
Warren Liao, 2005; Kerr et al., 2008)–Amplitude Shift-
ing, Phase Shifting, Uniform Amplification, Uniform Time
Scaling, Dynamic Amplification, and Dynamic Time Scal-
ing. Next, we explain each common time-series distortion
in terms of transformation with an
n
-length time-series
F(t)=[f(t1), f(t2), . . . , f(tn)]
, where t =
[t1, t2, . . . , tn]
.
Fig. 2 presents example distortions, categorized by ampli-
tude and time dimensions.
Amplitude Shifting describes how much a time-series
shifts against another time-series. This can be de-
scribed with two time-series and the degree of shifting
k
:
G(t) = F(t) + k= [f(t1) + k, . . . , f(tn) + k]
,
where kRis constant.
Phase Shifting is the same type of transformation (i.e.,
translation) as amplitude shifting, but it occurs along
the temporal dimension. This distortion can be repre-
sented by two time-series functions with the degree of
shift
k
:
G(t) = F(t+k) = [f(t1+k), . . . , f(tn+k)]
,
where
kR
is constant. Cross-correlation (Paparrizos
& Gravano, 2015; Vlachos et al., 2005) is the most pop-
ular measure method that is invariant to this distortion.
Uniform Amplification is a transformation that changes
the amplitude by multiplication of
kR
. This distor-
tion can be described with two functions and a multi-
plication factor
k
:
G(t) = k·F(t)=[k·f(t1), . . . , k ·
f(tn)].
Uniform Time Scaling refers to a uniformly shortened
or lengthened
F(t)
on the temporal axis. This distor-
tion can be represented as
G(t)=[g(t1), . . . , g(tm)]
,
where
g(ti) = f(tk·i)
and
kR+
. Although Keogh
et al. (2004) have proposed uniform time warping meth-
ods to handle this distortion, it still remains a challeng-
ing distortion type to measure because of the difficulty
in identifying the scaling factor
k
without testing all
possible cases (Keogh, 2003).
Dynamic Amplification is any distortion that occurs
through non-zero multiplication along the amplitude
dimension. This distortion can be described as follows:
4
摘要:

TILDE-Q:aTransformationInvariantLossFunctionforTime-SeriesForecastingHyunwookLee1ChunggiLee2HongkyuLim3SungahnKo1AbstractTime-seriesforecastinghasgainedincreasingat-tentioninthefieldofartificialintelligenceduetoitspotentialtoaddressreal-worldproblemsacrossvariousdomains,includingenergy,weather,traf-...

展开>> 收起<<
TILDE-Q a Transformation Invariant Loss Function for Time-Series Forecasting.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:17 页 大小:1.67MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注