TILDE-Q a Transformation Invariant Loss Function for Time-Series Forecasting

2025-05-06 0 0 1.67MB 17 页 10玖币

侵权投诉

TILDE-Q: a Transformation Invariant Loss Function

for Time-Series Forecasting

Hyunwook Lee 1Chunggi Lee 2Hongkyu Lim 3Sungahn Ko 1

Abstract

Time-series forecasting has gained increasing at-

tention in the ﬁeld of artiﬁcial intelligence due to

its potential to address real-world problems across

various domains, including energy, weather, traf-

ﬁc, and economy. While time-series forecasting is

a well-researched ﬁeld, predicting complex tem-

poral patterns such as sudden changes in sequen-

tial data still poses a challenge with current mod-

els. This difﬁculty stems from minimizing

norm distances as loss functions, such as mean ab-

solute error (MAE) or mean square error (MSE),

which are susceptible to both intricate temporal

dynamics modeling and signal shape capturing.

Furthermore, these functions often cause models

to behave aberrantly and generate uncorrelated re-

sults with the original time-series. Consequently,

the development of a shape-aware loss function

that goes beyond mere point-wise comparison is

essential. In this paper, we examine the deﬁnition

of shape and distortions, which are crucial for

shape-awareness in time-series forecasting, and

provide a design rationale for the shape-aware

loss function. Based on our design rationale, we

propose a novel, compact loss function called

TILDE-Q (Transformation Invariant Loss func-

tion with Distance EQuilibrium) that considers

not only amplitude and phase distortions but also

allows models to capture the shape of time-series

sequences. Furthermore, TILDE-Q supports the

simultaneous modeling of periodic and nonpe-

riodic temporal dynamics. We evaluate the efﬁ-

cacy of TILDE-Q by conducting extensive experi-

ments under both periodic and nonperiodic con-

ditions with various models ranging from naive

to state-of-the-art. The experimental results show

that the models trained with TILDE-Q surpass

Department of Artiﬁcial intelligence, Ulsan National Institute

of Science and Technology, Ulsan, Republic of Korea

School

of Engineering and Applied Sciences, Harvard University, Mas-

sachusetts, United States of America

Hyundai, Seoul, Republic

of Korea. Correspondence to: Sungahn Ko <sako@unist.ac.kr>.

those trained with other metrics, such as MSE

and DILATE, in various real-world applications,

including electricity, trafﬁc, illness, economics,

weather, and electricity transformer temperature

(ETT). Ofﬁcial codes are available at

https:

//github.com/HyunWookL/TILDE-Q

1. Introduction

Time-series forecasting has been a core problem across vari-

ous domains, including trafﬁc domain (Li et al., 2018; Lee

et al., 2020), economy (Zhu & Shasha, 2002), and disease

propagation analysis (Matsubara et al., 2014). One of the

key challenges in time-series forecasting is the modeling of

complex temporal dynamics (e.g., non-stationary signal and

periodicity). Temporal dynamics, intuitively, shape, is the

most emphasized keywords in time-series domains, such

as rush hour of trafﬁc data or abnormal usage of electric-

ity (Keogh et al., 2003; Bakshi & Stephanopoulos, 1994;

Weigend & Gershenfeld, 1994; Wu et al., 2021; Zhou et al.,

2022).

Although deep learning methods are an appealing solution

to model complex non-linear temporal dependencies and

nonstationary signals, recent studies have revealed that even

deep learning is often inadequate to model temporal dy-

namics. To properly model temporal dynamics, novel deep

learning approaches, such as Autoformer (Wu et al., 2021)

and FEDFormer (Zhou et al., 2022), have proposed input

sequence decomposition. Still, they are trained with

norm-based loss function, which could not properly model

the temporal dynamics, as shown in Fig. 1, (top). On the

other hand, Le Guen & Thome (2019) attempt to model sud-

den changes in a timely and accurate manner with dynamic

time warping (DTW), and Bica et al. (2020) adopt domain

adversarial training to learn balanced representations, which

is a treatment invariant representations over time. Le Guen

& Thome (2019); Bica et al. (2020) try to capture the shape

but still have some limitations, as depicted in Fig. 1 (middle),

implying the need for further investigation of the shape.

The identiﬁcation of shape, denoting the pattern in time-

series data within a given time interval, plays an important

role in addressing aforementioned limitation in time-series

arXiv:2210.15050v2 [cs.LG] 13 Mar 2024

Under Review for International Conference on Machine Learning 2024

Figure 1.

Ground-truth and forecasting results of Informer model with three training metrics, as shown in the blue box: (top) MSE,

(middle) DTW-based, and (bottom) TILDE-Q loss function. (top, middle) The blue boxes indicates the original intention of loss function

(desired) and misbehaviors.

forecasting problem. It can provide valuable information,

such as rise, drop, trough, peak, and plateau. We refer to the

prediction as informative when it can appropriately model

the shape. In real-world applications, including economics,

informative prediction is invaluable for decision-making. To

achieve such informative forecasting, a model should ac-

count for shape instead of solely aiming to forecast accurate

value for each time step. However, existing methods inad-

equately consider the shape (Wu et al., 2021; Zhou et al.,

2022; Bica et al., 2020; Le Guen & Thome, 2019). More-

over, deep learning model tends to opt for an easy learning

path (Karras et al., 2019), yielding inaccurate and uninfor-

mative forecasting results disregarding the characteristics

of time-series data. Fig. 1 illustrates three real forecasting

results obtained with Informer (Zhou et al., 2021) and differ-

ent training metrics. When the mean squared error (MSE) is

used as an objective, the model aims to reduce the gap be-

tween prediction and ground truth for each time-step. This

“point-wise” distance-based optimization has less ability to

model shape, resulting in generating uninformative predic-

tions regardless of temporal dynamics (Fig. 1 (top)); the

model rarely provides information about the time-series. In

contrast, if both gap and shape of the prediction and ground

truth are taken into account, the model can achieve high

accuracy with proper temporal dynamics, as shown in Fig. 1

(bottom). Consequently, time-series forecasting requires a

loss function that consider both point-wise distance (i.e.,

traditional goal) and shape.

In this work, we aim to design a novel objective function

that guides models in improving forecasting performance

by learning shapes in time-series data. To design a shape-

aware loss function, we review existing literature (Esling

& Agon, 2012; Bakshi & Stephanopoulos, 1994; Keogh,

2003) and explore the concepts of shapes and distortions

that impede appropriate measurement of similarity between

two time-series data in terms of shapes (Sec. 3.1, Sec. 3.2,

and Sec. 3.3). Based on our investigation, we propose the

necessary conditions for constructing an objective function

for shape-aware time-series forecasting (Sec. 4.1). Subse-

quently, we present a novel loss function, TILDE-Q (Trans-

formation Invariant Loss function with Distance EQualib-

rium), which enables shape-aware representation learning

by utilizing three loss terms that are invariant to distortions

(Sec. 4.2). For evaluation, we conduct extensive experiments

with state-of-the-art deep learning models with TILDE-Q.

The experimental results indicate that TILDE-Q is model-

agnostic and outperforms MSE and DILATE in MSE and

shape-related metrics.

Contributions In summary, our study makes the follow-

ing contributions. (1) We delve into the concept of shape

awareness and distortion invariances in the context of time-

series forecasting. By thoroughly investigating these distor-

tions, we enhance our understanding of their impact on time-

series forecasting problems. (2) We propose and implement

TILDE-Q, which has invariances to three distortions and

achieves shape-awareness, empowering informative fore-

casting in a timely manner. (3) We empirically demonstrate

that the proposed TILDE-Q allows models to have higher

accuracy compared to the models trained with other existing

metrics, such as MSE and DILATE.

2. Related Work

2.1. Time-Series Forecasting

Many time-series forecasting methods are available, rang-

ing from traditional models, such as ARIMA model (Box

et al., 2015) and hidden Markov model (Pesaran et al., 2004),

Under Review for International Conference on Machine Learning 2024

to recent deep learning models. In this section, we brieﬂy

describe the recent deep learning models for time-series

forecasting. Motivated by the huge success of recurrent neu-

ral networks (RNNs) (Clevert et al., 2016; Li et al., 2018; Yu

et al., 2017), many novel deep learning architectures have

been developed for improving forecasting performance. To

effectively capture long-term dependency, which is a limi-

tation of RNNs, Stoller et al. (2020) have proposed convo-

lutional neural networks (CNNs). However, it is required

to stack lots of the same CNNs to capture long-term depen-

dency (Zhou et al., 2021). Attention-based models, includ-

ing Transformer (Vaswani et al., 2017) and Informer (Zhou

et al., 2021), have been another popular research direction in

time-series forecasting. Although these models effectively

capture temporal dependencies, they incur high computa-

tional costs and often struggle to obtain appropriate temporal

information (Wu et al., 2021). To cope with the problem,

Wu et al. (2021); Zhou et al. (2022) have adopted the input

decomposition method, which helps models better encode

appropriate information. Other state-of-the-art models adopt

neural memory networks (Kaiser et al., 2017; Sukhbaatar

et al., 2015; Madotto et al., 2018; Lee et al., 2022), which

refer to historical data stored in the memory to generate

meaningful representation.

2.2. Training Metrics

Conventionally, mean squared error (MSE),

norm and its

variants are mainstream metrics used to optimize forecasting

models. However, they are not optimal for training forecast-

ing models (Esling & Agon, 2012) because the time-series is

temporally continuous. Moreover, the

norm provides less

information about temporal correlation among time-series

data. To better model temporal dynamics in time-series data,

researchers have used differentiable, approximated dynamic

time warping (DTW) as an alternative metric of MSE (Cu-

turi & Blondel, 2017; Abid & Zou, 2018; Mensch & Blondel,

2018). However, using DTW as a loss function results in

temporal localization of changes being ignored. Recently,

Le Guen & Thome (2019) have suggested DILATE, a train-

ing metric to catch sudden changes of nonstationary signals

in a timely manner with smooth approximation of DTW

and penalized temporal distortion index (TDI). To guaran-

tee DILATE’s operation in a timely manner, penalized TDI

issues a harsh penalty when predictions showed high tem-

poral distortion. However, the TDI relies on the DTW path,

and DTW often showed misalignment because of noise and

scale sensitivity. Thus, DILATE often loses its advantage

with complex data, showing disadvantages at the training. In

this paper, we discuss distortions and transformation invari-

ances and design a new loss function that enables models to

learn shapes in the data and produce noise-robust forecasting

results.

3. Preliminary

In this section, we investigate common distortions focus-

ing on the goal of time-series forecasting (i.e., modeling

temporal dynamics and accurate forecasting). To clarify the

concepts of time-series forecasting and related terms, we

ﬁrst deﬁne the notations and terms used (Sec. 3.1). We then

discuss common distortions in time-series from the transfor-

mation perspective that need to be considered for building

a shape-aware loss function (Sec. 3.2) and describe how

other loss functions (e.g., dynamic time warping (DTW)

and temporal distortion index (TDI)) handle shapes during

learning (Sec. 3.3). We will discuss the conditions for effec-

tive time-series forecasting in the next session (Sec. 4.1).

3.1. Notations and Deﬁnitions

Let

denote a data point at a time step

. We deﬁne a

time-series forecasting problem as follows:

Deﬁnition 3.1. Given

-length historical time-series

[Xt−T+1, . . . , Xt], Xi∈RF

at time

and a corresponding

T′

-length future time-series

Y= [Yt+1, . . . , Yt+T′], Yi∈

, time-series forecasting aims to learn the mapping func-

tion f:RT×F→RT′×C.

To distinguish between the label (i.e., ground truth) and

prediction time-series data, we note the label data as

and

prediction data as

. Next, we set up two goals for time-

series forecasting, which require not only precise but also

informative forecasting (Wu et al., 2021; Zhou et al., 2022;

Le Guen & Thome, 2019) as follows:

•

The mapping function

should be learnt to point-

wisely reduce distance between ˆ

Yand Y;

•

The output

should have similar temporal dynamics

with Y.

Temporal dynamics are informative patterns in a time-series,

such as rise, drop, peak, and plateau. The optimization for

point-wise distance reduction is a conventional method used

in the deep learning domain, which can be obtained using

the MAE or MSE. However, in a real-world problem, such as

trafﬁc speed or stock market prediction, accurate forecasting

of temporal dynamics is required. Esling & Agon (2012)

also emphasized the measurement of temporal dynamics, as

“...allowing the recognition of perceptually similar objects

even though they are not mathematically identical.” In this

paper, we deﬁne temporal dynamics as follows:

Deﬁnition 3.2. Temporal dynamics (or shapes) are informa-

tive periodic and nonperiodic patterns in time-series data.

In this work, we aim to design a shape-aware loss function

that satisﬁes both goals. To this end, we ﬁrst discuss distor-

tions that two time-series with similar shapes can have.

Under Review for International Conference on Machine Learning 2024

Figure 2. Example of the six distortions on the amplitude axis (top) and temporal axis (bottom).

Deﬁnition 3.3. Given two time-series

and

having

similar shapes but not being mathematically identical, let

is transformation that satisﬁes

F=H(G)

. Then, the

time-series

and

are considered to have a distortion,

which can be represented by the transformation H.

A distortion can generally be classiﬁed as a temporal distor-

tion (i.e., warping) or an amplitude distortion (i.e., scaling)

depending on its dimension–time and amplitude. Existing

distortions in the data lead to misbehavior of the model, as

they distort the measurements to be inaccurate. For exam-

ple, if we have two time-series

and

G=F+k

, which

have similar shapes but different means,

could represent

many temporal dynamics of

. However, measurements

often evaluate

and

as completely different signals and

cause misguidance of the model in training (e.g., measuring

the distance of

and

with MSE). As such, it is important

to have measurements that consider a similar shape invariant

to distortion. We deﬁne a measurement for distortion as:

Deﬁnition 3.4. Let transformation

represent a distortion

. Then, we call measurement

invariant to

∃δ > 0 :

D(T,H(T)) < δ for any time-series T.

3.2. Time-Series Distortions in Transformation

Perspectives

Distortion, a gap between two similar time-series, affects

shape capturing in time-series data. Thus, it is important

to investigate different distortions and their impacts on rep-

resentation learning aspects. There are six common time-

series distortions that models encounter during learning (Es-

ling & Agon, 2012; Batista et al., 2014; Berkhin, 2006;

Warren Liao, 2005; Kerr et al., 2008)–Amplitude Shift-

ing, Phase Shifting, Uniform Ampliﬁcation, Uniform Time

Scaling, Dynamic Ampliﬁcation, and Dynamic Time Scal-

ing. Next, we explain each common time-series distortion

in terms of transformation with an

-length time-series

F(t)=[f(t1), f(t2), . . . , f(tn)]

, where t =

[t1, t2, . . . , tn]

Fig. 2 presents example distortions, categorized by ampli-

tude and time dimensions.

•

Amplitude Shifting describes how much a time-series

shifts against another time-series. This can be de-

scribed with two time-series and the degree of shifting

G(t) = F(t) + k= [f(t1) + k, . . . , f(tn) + k]

where k∈Ris constant.

•

Phase Shifting is the same type of transformation (i.e.,

translation) as amplitude shifting, but it occurs along

the temporal dimension. This distortion can be repre-

sented by two time-series functions with the degree of

shift

G(t) = F(t+k) = [f(t1+k), . . . , f(tn+k)]

where

k∈R

is constant. Cross-correlation (Paparrizos

& Gravano, 2015; Vlachos et al., 2005) is the most pop-

ular measure method that is invariant to this distortion.

•

Uniform Ampliﬁcation is a transformation that changes

the amplitude by multiplication of

k∈R

. This distor-

tion can be described with two functions and a multi-

plication factor

G(t) = k·F(t)=[k·f(t1), . . . , k ·

f(tn)].

•

Uniform Time Scaling refers to a uniformly shortened

or lengthened

F(t)

on the temporal axis. This distor-

tion can be represented as

G(t)=[g(t1), . . . , g(tm)]

where

g(ti) = f(t⌈k·i⌉)

and

k∈R+

. Although Keogh

et al. (2004) have proposed uniform time warping meth-

ods to handle this distortion, it still remains a challeng-

ing distortion type to measure because of the difﬁculty

in identifying the scaling factor

without testing all

possible cases (Keogh, 2003).

•

Dynamic Ampliﬁcation is any distortion that occurs

through non-zero multiplication along the amplitude

dimension. This distortion can be described as follows:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TILDE-Q:aTransformationInvariantLossFunctionforTime-SeriesForecastingHyunwookLee1ChunggiLee2HongkyuLim3SungahnKo1AbstractTime-seriesforecastinghasgainedincreasingat-tentioninthefieldofartificialintelligenceduetoitspotentialtoaddressreal-worldproblemsacrossvariousdomains,includingenergy,weather,traf-...

展开>> 收起<<

TILDE-Q a Transformation Invariant Loss Function for Time-Series Forecasting.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

TILDE-Q a Transformation Invariant Loss Function for Time-Series Forecasting

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: