Explainable classification of Astronomical Time Series
result was the same. We also tried to take uncertainty into account by using the Uncertain Shapelet Transform algorithm
[11], but as expected, this method also failed since it is an extension of STC for uncertain time series.
In this paper, we propose the Uncertain Scalable and Accurate Subsequence Transform (or uSAST for short) method
which is able to achieve an F1-score of
70%
while providing faithful explanation similarly to STC. The rest of this
paper is organized as follows: we start by presenting the background and the related works. We continue by describing
the uSAST method. Finally, we detail our experiments and the obtained results before concluding this work.
2 Background
Definition 1 (Time series).A time series (TS) of length mis a finite sequence of ordered values.
T= (t1, t2, .., tm), ti∈R, m > 0
Definition 2
(Uncertain time series)
.
An uncertain time series (uTS) is defined similarly to a time series, but each value
has an uncertainty represented by a positive real number.
T= (t1±δt1, t2±δt2, .., tm±δtm), ti∈R, m > 0, δti∈R+
Definition 3
(Subsequence)
.
A subsequence (respectively an uncertain subsequence) is a sequence of consecutive
values extracted from a TS (respectively an uTS).
Definition 4
(Distance)
.
The distance between a subsequence
S
of length
l
and a time series of length
m
is defined as
follows:
Dist(S, T ) = min
P∈Tldist(S, P ),
where Tl={(ti, ti+1, ..., ti+l)|1≤i≤m−l+ 1}
The
dist(·,·)
function in Definition 4 could be any distance metric. In practice the Euclidean Distance (ED) and the
Dynamic Time Warping (DTW) are generally used. The definition is also applicable between uTS and uncertain
subsequence by ignoring the uncertainty or by taking it into account using an uncertain distance, the UED distance [
11
].
Definition 5
(Uncertain Euclidean Distance)
.
The Uncertain Euclidean Distance (UED) between two uncertain
subsequences S1and S2of same length lis defined as:
UED(S1, S2) =
l
X
i=1
(s1,i −s2,i)2±2
l
X
i=1 |s1,i −s2,i|(δs1,i +δs2,i)
Let
D={(Ti, ci)|1≤i≤n}
be a dataset of
n
time series
Ti
(repectively uncertain time series) with their class labels
ci
taken from a discrete finite set
C
such that the cardinality of
C
is much less than
n
. We can define the notions of
separator and shapelet for this dataset.
Definition 6
(Separator)
.
A separator (respectively uncertain separator) is a pair of a subsequence
S
(respectively
uncertain subsequence) and a threshold that divide the dataset in two groups Dlef t and Dright such that:
Dlef t ={(Ti, ci)|Dist(S, Ti)< , 1≤i≤n}
Dright ={(Ti, ci)|Dist(S, Ti)≥, 1≤i≤n}
Definition 7
(Shapelet)
.
A shapelet (respectively uncertain shapelet) is a separator (respectively uncertain separator)
that maximizes the information gain similarly to splitting nodes in decision trees [14].
3 Related works
Time series classification is performed regarding global features, local features, or both. Historically, only global
features were considered; in particular, the classification was done using the one nearest neighbor (1-NN) classifier and
the Dynamic Time Warping (DTW) distance. The Elastic Ensemble (EE) is an improvement of the global features
classification, obtained by ensembling several distance measures [
15
]. The Fast Ensemble of Elastic Distances (FastEE)
significantly reduces the computation time of the Elastic Ensemble [16].
Local feature-based methods are organized as dictionary-based, interval-based or subsequence-based. Dictionary-based
methods proceeds by representing each time series using a finite set of discrete symbols using techniques such as
Symbolic Fourier Approximation (SFA) [
17
] and Symbolic Aggregate approXimation (SAX) [
18
]. Some methods
that implement these techniques are BOSS [
19
], MUSE [
20
] and TDE [
21
]. Interval-based methods assume that the
3