
Training loss
Forecast step
Average
Training loss
Forecast step
Training loss
Forecast step
(a) Flooding (original) (b) Flooding (modified) (c) WaveBound (ours)
Figure 1: The conceptual examples for different methods. (a) The original flooding provides the
lower bound of the average loss, rather than considering each time step and feature individually. (b)
Even if the lower bounds of training loss are provided for each time step and feature, the bound of
constant value cannot reflect the nature of time series forecasting. (c) Our proposed WaveBound
method provides the lower bound of the training loss for each time step and feature. This lower bound
is dynamically adjusted to give a tighter error bound during the training process.
data, error bounds should be dynamically changed for different patterns. Intuitively, a higher error
should be tolerated for unpredictable patterns.
To properly address the overfitting issue in time series forecasting, the difficulty of prediction, i.e.,
how unpredictable the current label is, should be measured in the training procedure. To this end, we
introduce the target network updated with an exponential moving average of the original network,
i.e.,source network. At each iteration, the target network can guide a reasonable level of training
loss to the source network — the larger the error of the target network, the more unpredictable the
pattern. In current studies, a slow-moving average target network is commonly used to produce
stable targets in the self-supervised setting [
11
,
12
]. By using the training loss of the target network
for our lower bound, we derive a novel regularization method called WaveBound which faithfully
estimates the error bounds for each time step and feature. By dynamically adjusting the error bounds,
our regularization prevents the model from overly fitting to a certain pattern and further improves
generalization. Figure 1 shows the conceptual difference between the original flooding and our
WaveBound method. The originally proposed flooding determines the direction of the update step for
all points by comparing the average loss and its flood level. In contrast, WaveBound individually
decides the direction of the update step for each point by using the dynamic error bound of the
training loss. The difference between these methods is further discussed in Section 3. Our main
contributions are threefold:
•
We propose a simple yet effective regularization method called WaveBound that dynamically
provides the error bounds of training loss in time series forecasting.
•
We show that our proposed regularization method consistently improves upon the existing
state-of-the-art time series forecasting model on six real-world benchmarks.
•
By conducting extensive experiments, we verify the significance of adjusting the error bounds
for each time step, feature, and pattern, thus addressing the overfitting issue in time series
forecasting.
2 Preliminary
2.1 Time Series Forecasting
We consider the rolling forecasting setting with a fixed window size [
5
–
7
]. The aim of time series
forecasting is to learn a forecaster
g:RL×K→RM×K
which predicts the future series
yt=
{zt+1, zt+2, ..., zt+M:zi∈RK}
given the past series
xt={zt−L+1, zt−L+2, ..., zt:zi∈RK}
at time
t
where
K
is the feature dimension and
L
and
M
are the input length and output length,
respectively. We mainly address the error bounding in the multivariate regression problem where the
input series
x
and output series
y
jointly come from the underlying density
p(x, y)
. For a given loss
function
ℓ
, the risk of
g
is
R(g):=E(x,y)∼p(x,y)[ℓ(g(x), y)]
. Since we cannot directly access the
distribution
p
, we instead minimize its empirical version
ˆ
R(g):=1
NPN
i=1 ℓ(g(xi), yi)
using training
data
X:={(xi, yi)}N
i=1
. In the analysis, we assume that the errors are independent and identically
distributed. We mainly consider using the mean squared error (MSE) loss, which is widely used as an
2