IEEE 3
Explicit Learning: As mentioned in the introduction, there
are a variety of approaches learning from historical data.
For instance, support vector regression [10] and feed-forward
ANNs [18] were employed to capture temporal correlations via
multiple previous bus travel times. Employing link length (a
static input) and rate of road usage and speed (dynamic inputs),
[20] proposes an SVR based prediction. However current bus
position OR previous bus inputs are not considered there.
A speed based prediction scheme is proposed in [29] which
uses a weighted average of current bus speed and historically
averaged section speed as inputs. As previous method, it
ignores information from previous bus. A dynamic SVR based
prediction scheme is proposed in [9] which exploits spatio-
temporal (ST) correlations in a minimal manner. In particular,
it considers current bus travel time at the previous section and
previous bus travel time at the current section.
A single feed-forward ANN model is built to predict travel
times between any two bus stops on the route in [12]. On
account of this, target travel time variable’s dynamic range
would be very large and can lead to poor predictions for very
short and very long routes. An approach using (non-stationary)
linear statistical models which captures ST correlations was
proposed in [4]. It uses a linear kalman filter for prediction.
Linear models here are used to capture spatial correlations.
The temporal correlations come from the (currently plying)
previous bus section travel time. Another approach using
linear statistical models and exploiting real-time temporal
correlations (from previous buses) was proposed in [8]. A
nonlinear generalization of [4] using support vector function
approximators capturing ST correlations was proposed in [11].
Recently, a CNN approach capturing ST correlations was
proposed in [5]. It uses masked-CNNs to parameterize the
predictive distribution, while a quantized travel-time is used
as CNN outputs.
[13] proposes a novel approach by combining CNNs and
RNNs in an interesting fashion. In particular spatial correla-
tions from the adjacent sections of the 1-D route are captured
by the convolutional layer, while the recurrent structure cap-
tures the temporal correlations. It employs a convolutional-
RNN based ED architecture to make multi-step predictions in
time. [14] considers an attention-based extension of [13]. [30]
employs a simplified RNN with attention but no state feedback
(even though weight sharing is present across time-steps). It
only captures single time-step predictions. A common feature
of all these RNN approaches is that the time axis is uniformly
partitioned into time bins of a fixed width.
A recent computationally interesting approach where BATP
is recast as a value function estimation problem under a suit-
ably constructed Markov reward process is proposed in [15].
This enables exploring a family of value-function predictors
using temporal-difference (TD) learning.
B. Related ED based RNN approaches
The ED architecture was first successfully proposed for
language translation applications[23], [24]. The proposed ar-
chitecture was relatively simple with the context from the last
time-step of the encoder fed as initial state and explicit input
for each time-step of the decoder. Over the years, machine
translation literature has seen intelligent improvements over
this base structure by employing attention layer, bidirectional
layer etc. in the encoder. Further, the ED framework has
been successfully applied in many other tasks like speech
recognition[31], image captioning etc.
Given the variable length Seq2Seq mapping ability, the ED
framework naturally can be utilized for multi-step (target)
time-series prediction where the raw data is real-valued and
target vector length can be independent of the input vector. An
attention based ED approach (with a bidirectional layer in the
encoder) for multi-step TS prediction was proposed in [22]
which could potentially capture seasonal correlations as well.
However, this architecture doesn’t consider exogenous inputs.
An approach to incorporate exogenous inputs into predictive
model was proposed in [21], where the exogenous inputs
in the forecast horizon are fed in a synchronized fashion at
the decoder steps. Our approach is close to the above TS
approaches.
C. Proposed approach in perspective of related approaches
From the prior discussion, one can summarize that many
existing approaches either fail to exploit historical data suffi-
ciently OR fail to capture spatial or temporal correlations. The
rest of the approaches do exploit spatio-temporal correlations
in different ways [9], [4], [11], [5], [13], [14], but suffer their
own drawbacks. For instance, [9] while exploits the previous
bus travel time at the current section (temporal correlation),
completely ignores when (time of day) the traversal happened.
The spatial correlation here comes from current bus travel
time of only one previous section. [4] (denoted as LNKF
in our experiments) addresses the issues of [9] as follows.
To better capture spatial correlations, it considers current bus
travel time measurements from multiple previous sections. The
temporal correlations here also take into account the previous
bus’s proximity by assuming a functional (parameterized) form
dependent on current section travel time and start time dif-
ference. It adopts a predominantly linear modelling approach
culminating in a Linear Kalman filter for prediction. As ex-
plained earlier, a support-vector based nonlinear generalization
of [4] is considered in [11] (referred to as SVKF in our
experiments). It learns the potentially non-linear spatial and
temporal correlations at a single-step level and then employs
an extended kalman filter for spatial multi-step prediction.
Compared to our non-linear ED (Seq2Seq) approach here, [4]
considers mainly a linear modelling. While [11] adopts a non-
linear modelling, the model training happens with single-step
targets in both [4], [11]. However, both these KF approaches
adopt a recursive sequential multi-step prediction which can
be prone to error accumulation. On the other hand, our ED
approach circumvents this issue of both these KFs by training
with vector targets where the predictions across all subsequent
sections are padded together into one target-vector.
CNN approach of [5] models travel time targets as categor-
ical values via a soft-max output layer. Hence it is sensitive
to the quantization level. A coarse quantization can lead to
high errors when the true target value is exactly between two