EXPLAINABLE CLASSIFICATION OF ASTRONOMICAL UNCERTAIN TIMESERIES Michael Franklin MBOUOPDA

2025-04-27 0 0 449.83KB 12 页 10玖币

侵权投诉

EXPLAINABLE CLASSIFICATION OF ASTRONOMICAL

UNCERTAIN TIME SERIES

Michael Franklin MBOUOPDA

LIMOS

University Clermont Auvergne

Clermont-Ferrand

michael.mbouopda@uca.fr

Emille E. O. ISHIDA

Laboratory of Physics of Clermont

University Clermont Auvergne

Clermont-Ferrand

emille.ishida@clermont.in2p3.fr

Engelbert MEPHU NGUIFO

LIMOS

University Clermont Auvergne

Clermont-Ferrand

engelbert.mephu_nguifo@uca.fr

Emmanuel GANGLER

Laboratory of Physics of Clermont

University Clermont Auvergne

Clermont-Ferrand

emmanuel.gangler@clermont.in2p3.fr

ABSTRACT

Exploring the expansion history of the universe, understanding its evolutionary stages, and predicting

its future evolution are important goals in astrophysics. Today, machine learning tools are used

to help achieving these goals by analyzing transient sources, which are modeled as uncertain time

series. Although black-box methods achieve appreciable performance, existing interpretable time

series methods failed to obtain acceptable performance for this type of data. Furthermore, data

uncertainty is rarely taken into account in these methods. In this work, we propose an uncertainty-

aware subsequence based model which achieves a classiﬁcation comparable to that of state-of-the-art

methods. Unlike conformal learning which estimates model uncertainty on predictions, our method

takes data uncertainty as additional input. Moreover, our approach is explainable-by-design, giving

domain experts the ability to inspect the model and explain its predictions. The explainability of

the proposed method has also the potential to inspire new developments in theoretical astrophysics

modeling by suggesting important subsequences which depict details of light curve shapes. The

dataset, the source code of our experiment, and the results are made available on a public repository.

Keywords Time series ·Classiﬁcation ·Explainability ·Astronomy ·Uncertainty

1 Introduction

teamMachine learning (ML) has become an ineluctable tool for analyzing and extracting meaningful information

from data. Classically exclusively applied on tabular data, it is nowadays also effective on image, video, text, and

also time series data. The latter is the type of data we will focus on in this paper. Speciﬁcally, this work is about

time series classiﬁcation, a ML task whose goal is to learn a function (i.e a classiﬁer) that maps time series to a set

of discrete classes. A time series is an ordered and ﬁnite sequence of values. Some examples of time series are the

daily COVID cases and the monthly groundwater level. Time series classiﬁcation has been applied in several domains

including online harassment detection [

], medicine [

], emotion recognition [

], anomaly detection [

], and in

physics [

]. This usability is facilitated by toolkits such as Sktime [

], which uniﬁes the existing time series

classiﬁcation algorithms under the same user-friendly API. However, the existing methods are generally not applicable

to uncertain time series; In fact, as far as we know, the uncertain shapelet transform (or simply UST) method [

] is the

only one that has been designed for uncertain time series classiﬁcation.

An uncertain time series (uTS) is a time series of imprecise values. Unlike a regular time series which is an ordered

sequence of real numbers, an uTS is a sequence of pairs of numbers such that the ﬁrst number of a pair is the best

arXiv:2210.00869v1 [cs.LG] 28 Sep 2022

Mbouopda and al.

(a) A simulated uTS (b) An uTS from PLAsTiCC

Figure 1: Uncertain time series illustrations

estimate and the second one is the error on that estimate; therefore the exact values of an uTS are unknown. Figure

1a illustrates a simulated uTS: the blue line is the best estimate and the vertical red bars represent the uncertainty

intervals (i.e the exact unknown values are somewhere on the vertical red bars). Figure 1b is a real uTS extracted from

the PLAsTiCC [

] dataset: any time series that lies in the red region could be the exact unknown time series. uTS

classiﬁcation should not be confused with an application of conformal learning; In fact, in conformal learning, the

input data is assumed to be uncertainty-free, and the goal is to compute the uncertainty of model’s predictions. On the

opposite, the goal of uTS classiﬁcation is to infer a classiﬁer from and for uncertain data.

Uncertain time series are preponderant in transient astrophysics. Astronomical objects whose brightness vary with time

(a.k.a transients) are primarily characterized by the presence or absence of speciﬁc chemical elements found in their

spectra. This data taking process (called spectroscopy) is very time-consuming and requires very good observation

conditions to be performed. Moreover, since transients are objects which appear in the sky for a limited period of time

then disappear forever, there is a small time-window of opportunities when such measurements can be taken.

Alternatively, we can also associate different classes of astronomical transients to the respective shape of their light

curves (brightness variation as a function of time). In this case, we need to repeatedly measure the brightness of the

source in a relatively broad region of the wavelength spectrum. This process, called photometry, is less expensive

and imposes more manageable constraints on observation conditions. However, measurements are more prone to

uncertainties (due to moonlight, twilight, clouds, etc) in the ﬂux determination and the distinction between light curves

from different classes is subtle, resulting in less accurate classiﬁcations. Nevertheless, since there is not enough

spectroscopic resources to provide deﬁnite label for all photometric observed objects, being able to effectively analyze

uncertain photometric light curves means that a wider range of the universe can be quickly understood and at a lower

cost.

The Vera C. Rubin Observatory

is a ground-based observatory, currently under construction in Chile, whose goal is to

conduct the 10-year Legacy Survey of Space and Time (LSST) in order to produce the deepest and widest images of the

universe. The observatory is expected to start producing data in early 2024, and in order to prepare the community for

the arrival of its data, one important data challenge was put in place: the Photometric LSST Astronomical Time-Series

Classiﬁcation Challenge or simply PLAsTiCC [

]. The goal was to identify machine learning models able to classify

types of transients in simulated data, represented by uncertain time series, or light curves. The ultimate goal behind

the challenge was to understand which methods are expected to perform better in LSST-like data, thus preparing

the community to the arrival of its data and help understanding the universe’s expansion history. Therefore, using

interpretable approaches was very important. However, contributors focused on minimizing the classiﬁcation loss by

employing techniques such as mixture of classiﬁers and data augmentation [

] while neglecting explainability. In this

paper, we address this problem with explainability in mind.

We consider two approaches to classify uTS in an explainable manner : the ﬁrst one ignores uncertainty and uses only

the best estimates, while the second one takes uncertainty into account. Ignoring uncertainty makes the task a regular

time series classiﬁcation task, allowing the usage of Shapelet Transform Classiﬁcation or simply STC [

], an effective

and explainable regular time series classiﬁcation algorithm. This model failed to ﬁnd any valid shapelet on PLAsTiCC,

and therefore could not perform the classiﬁcation task. We performed extensive hyper-parameter tuning tests, but the

1https://lsst.org/

Explainable classiﬁcation of Astronomical Time Series

result was the same. We also tried to take uncertainty into account by using the Uncertain Shapelet Transform algorithm

[11], but as expected, this method also failed since it is an extension of STC for uncertain time series.

In this paper, we propose the Uncertain Scalable and Accurate Subsequence Transform (or uSAST for short) method

which is able to achieve an F1-score of

70%

while providing faithful explanation similarly to STC. The rest of this

paper is organized as follows: we start by presenting the background and the related works. We continue by describing

the uSAST method. Finally, we detail our experiments and the obtained results before concluding this work.

2 Background

Deﬁnition 1 (Time series).A time series (TS) of length mis a ﬁnite sequence of ordered values.

T= (t1, t2, .., tm), ti∈R, m > 0

Deﬁnition 2

(Uncertain time series)

An uncertain time series (uTS) is deﬁned similarly to a time series, but each value

has an uncertainty represented by a positive real number.

T= (t1±δt1, t2±δt2, .., tm±δtm), ti∈R, m > 0, δti∈R+

Deﬁnition 3

(Subsequence)

A subsequence (respectively an uncertain subsequence) is a sequence of consecutive

values extracted from a TS (respectively an uTS).

Deﬁnition 4

(Distance)

The distance between a subsequence

of length

and a time series of length

is deﬁned as

follows:

Dist(S, T ) = min

P∈Tldist(S, P ),

where Tl={(ti, ti+1, ..., ti+l)|1≤i≤m−l+ 1}

The

dist(·,·)

function in Deﬁnition 4 could be any distance metric. In practice the Euclidean Distance (ED) and the

Dynamic Time Warping (DTW) are generally used. The deﬁnition is also applicable between uTS and uncertain

subsequence by ignoring the uncertainty or by taking it into account using an uncertain distance, the UED distance [

Deﬁnition 5

(Uncertain Euclidean Distance)

The Uncertain Euclidean Distance (UED) between two uncertain

subsequences S1and S2of same length lis deﬁned as:

UED(S1, S2) =

i=1

(s1,i −s2,i)2±2

i=1 |s1,i −s2,i|(δs1,i +δs2,i)

Let

D={(Ti, ci)|1≤i≤n}

be a dataset of

time series

(repectively uncertain time series) with their class labels

taken from a discrete ﬁnite set

such that the cardinality of

is much less than

. We can deﬁne the notions of

separator and shapelet for this dataset.

Deﬁnition 6

(Separator)

A separator (respectively uncertain separator) is a pair of a subsequence

(respectively

uncertain subsequence) and a threshold that divide the dataset in two groups Dlef t and Dright such that:

Dlef t ={(Ti, ci)|Dist(S, Ti)< , 1≤i≤n}

Dright ={(Ti, ci)|Dist(S, Ti)≥, 1≤i≤n}

Deﬁnition 7

(Shapelet)

A shapelet (respectively uncertain shapelet) is a separator (respectively uncertain separator)

that maximizes the information gain similarly to splitting nodes in decision trees [14].

3 Related works

Time series classiﬁcation is performed regarding global features, local features, or both. Historically, only global

features were considered; in particular, the classiﬁcation was done using the one nearest neighbor (1-NN) classiﬁer and

the Dynamic Time Warping (DTW) distance. The Elastic Ensemble (EE) is an improvement of the global features

classiﬁcation, obtained by ensembling several distance measures [

]. The Fast Ensemble of Elastic Distances (FastEE)

signiﬁcantly reduces the computation time of the Elastic Ensemble [16].

Local feature-based methods are organized as dictionary-based, interval-based or subsequence-based. Dictionary-based

methods proceeds by representing each time series using a ﬁnite set of discrete symbols using techniques such as

Symbolic Fourier Approximation (SFA) [

] and Symbolic Aggregate approXimation (SAX) [

]. Some methods

that implement these techniques are BOSS [

], MUSE [

] and TDE [

]. Interval-based methods assume that the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

EXPLAINABLECLASSIFICATIONOFASTRONOMICALUNCERTAINTIMESERIESMichaelFranklinMBOUOPDALIMOSUniversityClermontAuvergneClermont-Ferrandmichael.mbouopda@uca.frEmilleE.O.ISHIDALaboratoryofPhysicsofClermontUniversityClermontAuvergneClermont-Ferrandemille.ishida@clermont.in2p3.frEngelbertMEPHUNGUIFOLIMOSUniver...

展开>> 收起<<

EXPLAINABLE CLASSIFICATION OF ASTRONOMICAL UNCERTAIN TIMESERIES Michael Franklin MBOUOPDA.pdf

共12页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

EXPLAINABLE CLASSIFICATION OF ASTRONOMICAL UNCERTAIN TIMESERIES Michael Franklin MBOUOPDA

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: