EXPLAINABLE CLASSIFICATION OF ASTRONOMICAL UNCERTAIN TIMESERIES Michael Franklin MBOUOPDA

2025-04-27 0 0 449.83KB 12 页 10玖币
侵权投诉
EXPLAINABLE CLASSIFICATION OF ASTRONOMICAL
UNCERTAIN TIME SERIES
Michael Franklin MBOUOPDA
LIMOS
University Clermont Auvergne
Clermont-Ferrand
michael.mbouopda@uca.fr
Emille E. O. ISHIDA
Laboratory of Physics of Clermont
University Clermont Auvergne
Clermont-Ferrand
emille.ishida@clermont.in2p3.fr
Engelbert MEPHU NGUIFO
LIMOS
University Clermont Auvergne
Clermont-Ferrand
engelbert.mephu_nguifo@uca.fr
Emmanuel GANGLER
Laboratory of Physics of Clermont
University Clermont Auvergne
Clermont-Ferrand
emmanuel.gangler@clermont.in2p3.fr
ABSTRACT
Exploring the expansion history of the universe, understanding its evolutionary stages, and predicting
its future evolution are important goals in astrophysics. Today, machine learning tools are used
to help achieving these goals by analyzing transient sources, which are modeled as uncertain time
series. Although black-box methods achieve appreciable performance, existing interpretable time
series methods failed to obtain acceptable performance for this type of data. Furthermore, data
uncertainty is rarely taken into account in these methods. In this work, we propose an uncertainty-
aware subsequence based model which achieves a classification comparable to that of state-of-the-art
methods. Unlike conformal learning which estimates model uncertainty on predictions, our method
takes data uncertainty as additional input. Moreover, our approach is explainable-by-design, giving
domain experts the ability to inspect the model and explain its predictions. The explainability of
the proposed method has also the potential to inspire new developments in theoretical astrophysics
modeling by suggesting important subsequences which depict details of light curve shapes. The
dataset, the source code of our experiment, and the results are made available on a public repository.
Keywords Time series ·Classification ·Explainability ·Astronomy ·Uncertainty
1 Introduction
teamMachine learning (ML) has become an ineluctable tool for analyzing and extracting meaningful information
from data. Classically exclusively applied on tabular data, it is nowadays also effective on image, video, text, and
also time series data. The latter is the type of data we will focus on in this paper. Specifically, this work is about
time series classification, a ML task whose goal is to learn a function (i.e a classifier) that maps time series to a set
of discrete classes. A time series is an ordered and finite sequence of values. Some examples of time series are the
daily COVID cases and the monthly groundwater level. Time series classification has been applied in several domains
including online harassment detection [
1
], medicine [
2
,
3
], emotion recognition [
4
], anomaly detection [
5
], and in
physics [
6
,
7
,
8
,
9
]. This usability is facilitated by toolkits such as Sktime [
10
], which unifies the existing time series
classification algorithms under the same user-friendly API. However, the existing methods are generally not applicable
to uncertain time series; In fact, as far as we know, the uncertain shapelet transform (or simply UST) method [
11
] is the
only one that has been designed for uncertain time series classification.
An uncertain time series (uTS) is a time series of imprecise values. Unlike a regular time series which is an ordered
sequence of real numbers, an uTS is a sequence of pairs of numbers such that the first number of a pair is the best
arXiv:2210.00869v1 [cs.LG] 28 Sep 2022
Mbouopda and al.
(a) A simulated uTS (b) An uTS from PLAsTiCC
Figure 1: Uncertain time series illustrations
estimate and the second one is the error on that estimate; therefore the exact values of an uTS are unknown. Figure
1a illustrates a simulated uTS: the blue line is the best estimate and the vertical red bars represent the uncertainty
intervals (i.e the exact unknown values are somewhere on the vertical red bars). Figure 1b is a real uTS extracted from
the PLAsTiCC [
6
] dataset: any time series that lies in the red region could be the exact unknown time series. uTS
classification should not be confused with an application of conformal learning; In fact, in conformal learning, the
input data is assumed to be uncertainty-free, and the goal is to compute the uncertainty of model’s predictions. On the
opposite, the goal of uTS classification is to infer a classifier from and for uncertain data.
Uncertain time series are preponderant in transient astrophysics. Astronomical objects whose brightness vary with time
(a.k.a transients) are primarily characterized by the presence or absence of specific chemical elements found in their
spectra. This data taking process (called spectroscopy) is very time-consuming and requires very good observation
conditions to be performed. Moreover, since transients are objects which appear in the sky for a limited period of time
then disappear forever, there is a small time-window of opportunities when such measurements can be taken.
Alternatively, we can also associate different classes of astronomical transients to the respective shape of their light
curves (brightness variation as a function of time). In this case, we need to repeatedly measure the brightness of the
source in a relatively broad region of the wavelength spectrum. This process, called photometry, is less expensive
and imposes more manageable constraints on observation conditions. However, measurements are more prone to
uncertainties (due to moonlight, twilight, clouds, etc) in the flux determination and the distinction between light curves
from different classes is subtle, resulting in less accurate classifications. Nevertheless, since there is not enough
spectroscopic resources to provide definite label for all photometric observed objects, being able to effectively analyze
uncertain photometric light curves means that a wider range of the universe can be quickly understood and at a lower
cost.
The Vera C. Rubin Observatory
1
is a ground-based observatory, currently under construction in Chile, whose goal is to
conduct the 10-year Legacy Survey of Space and Time (LSST) in order to produce the deepest and widest images of the
universe. The observatory is expected to start producing data in early 2024, and in order to prepare the community for
the arrival of its data, one important data challenge was put in place: the Photometric LSST Astronomical Time-Series
Classification Challenge or simply PLAsTiCC [
6
]. The goal was to identify machine learning models able to classify
14
types of transients in simulated data, represented by uncertain time series, or light curves. The ultimate goal behind
the challenge was to understand which methods are expected to perform better in LSST-like data, thus preparing
the community to the arrival of its data and help understanding the universe’s expansion history. Therefore, using
interpretable approaches was very important. However, contributors focused on minimizing the classification loss by
employing techniques such as mixture of classifiers and data augmentation [
12
] while neglecting explainability. In this
paper, we address this problem with explainability in mind.
We consider two approaches to classify uTS in an explainable manner : the first one ignores uncertainty and uses only
the best estimates, while the second one takes uncertainty into account. Ignoring uncertainty makes the task a regular
time series classification task, allowing the usage of Shapelet Transform Classification or simply STC [
13
], an effective
and explainable regular time series classification algorithm. This model failed to find any valid shapelet on PLAsTiCC,
and therefore could not perform the classification task. We performed extensive hyper-parameter tuning tests, but the
1https://lsst.org/
2
Explainable classification of Astronomical Time Series
result was the same. We also tried to take uncertainty into account by using the Uncertain Shapelet Transform algorithm
[11], but as expected, this method also failed since it is an extension of STC for uncertain time series.
In this paper, we propose the Uncertain Scalable and Accurate Subsequence Transform (or uSAST for short) method
which is able to achieve an F1-score of
70%
while providing faithful explanation similarly to STC. The rest of this
paper is organized as follows: we start by presenting the background and the related works. We continue by describing
the uSAST method. Finally, we detail our experiments and the obtained results before concluding this work.
2 Background
Definition 1 (Time series).A time series (TS) of length mis a finite sequence of ordered values.
T= (t1, t2, .., tm), tiR, m > 0
Definition 2
(Uncertain time series)
.
An uncertain time series (uTS) is defined similarly to a time series, but each value
has an uncertainty represented by a positive real number.
T= (t1±δt1, t2±δt2, .., tm±δtm), tiR, m > 0, δtiR+
Definition 3
(Subsequence)
.
A subsequence (respectively an uncertain subsequence) is a sequence of consecutive
values extracted from a TS (respectively an uTS).
Definition 4
(Distance)
.
The distance between a subsequence
S
of length
l
and a time series of length
m
is defined as
follows:
Dist(S, T ) = min
PTldist(S, P ),
where Tl={(ti, ti+1, ..., ti+l)|1iml+ 1}
The
dist(·,·)
function in Definition 4 could be any distance metric. In practice the Euclidean Distance (ED) and the
Dynamic Time Warping (DTW) are generally used. The definition is also applicable between uTS and uncertain
subsequence by ignoring the uncertainty or by taking it into account using an uncertain distance, the UED distance [
11
].
Definition 5
(Uncertain Euclidean Distance)
.
The Uncertain Euclidean Distance (UED) between two uncertain
subsequences S1and S2of same length lis defined as:
UED(S1, S2) =
l
X
i=1
(s1,i s2,i)2±2
l
X
i=1 |s1,i s2,i|(δs1,i +δs2,i)
Let
D={(Ti, ci)|1in}
be a dataset of
n
time series
Ti
(repectively uncertain time series) with their class labels
ci
taken from a discrete finite set
C
such that the cardinality of
C
is much less than
n
. We can define the notions of
separator and shapelet for this dataset.
Definition 6
(Separator)
.
A separator (respectively uncertain separator) is a pair of a subsequence
S
(respectively
uncertain subsequence) and a threshold that divide the dataset in two groups Dlef t and Dright such that:
Dlef t ={(Ti, ci)|Dist(S, Ti)< , 1in}
Dright ={(Ti, ci)|Dist(S, Ti), 1in}
Definition 7
(Shapelet)
.
A shapelet (respectively uncertain shapelet) is a separator (respectively uncertain separator)
that maximizes the information gain similarly to splitting nodes in decision trees [14].
3 Related works
Time series classification is performed regarding global features, local features, or both. Historically, only global
features were considered; in particular, the classification was done using the one nearest neighbor (1-NN) classifier and
the Dynamic Time Warping (DTW) distance. The Elastic Ensemble (EE) is an improvement of the global features
classification, obtained by ensembling several distance measures [
15
]. The Fast Ensemble of Elastic Distances (FastEE)
significantly reduces the computation time of the Elastic Ensemble [16].
Local feature-based methods are organized as dictionary-based, interval-based or subsequence-based. Dictionary-based
methods proceeds by representing each time series using a finite set of discrete symbols using techniques such as
Symbolic Fourier Approximation (SFA) [
17
] and Symbolic Aggregate approXimation (SAX) [
18
]. Some methods
that implement these techniques are BOSS [
19
], MUSE [
20
] and TDE [
21
]. Interval-based methods assume that the
3
摘要:

EXPLAINABLECLASSIFICATIONOFASTRONOMICALUNCERTAINTIMESERIESMichaelFranklinMBOUOPDALIMOSUniversityClermontAuvergneClermont-Ferrandmichael.mbouopda@uca.frEmilleE.O.ISHIDALaboratoryofPhysicsofClermontUniversityClermontAuvergneClermont-Ferrandemille.ishida@clermont.in2p3.frEngelbertMEPHUNGUIFOLIMOSUniver...

展开>> 收起<<
EXPLAINABLE CLASSIFICATION OF ASTRONOMICAL UNCERTAIN TIMESERIES Michael Franklin MBOUOPDA.pdf

共12页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:12 页 大小:449.83KB 格式:PDF 时间:2025-04-27

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 12
客服
关注