Tracking changes using Kullback-Leibler divergence for the continual learning

2025-05-06 0 0 1.56MB 14 页 10玖币

侵权投诉

TRACKING CHANGES USING KULLBACK-LEIBLER DIVERGENCE

FOR THE CONTINUAL LEARNING ∗

(ACCEPTED MANUSCRIPT AT SMC’2022)

Sebastián Basterrech

Faculty of Electrical Engineering and Computer Science

VŠB-Technical University of Ostrava

Ostrava, Czech Republic

Sebastian.Basterrech@vsb.cz

Michał Wo´zniak

Department of Systems and Computer Networks

Wroclaw University of Science and Technology

Wroclaw, Poland

Michal.Wozniak@pwr.edu.pl

ABSTRACT

Recently, continual learning has received a lot of attention. One of the signiﬁcant problems is the

occurrence of concept drift, which consists of changing probabilistic characteristics of the incoming

data. In the case of the classiﬁcation task, this phenomenon destabilizes the model’s performance and

negatively affects the achieved prediction quality. Most current methods apply statistical learning

and similarity analysis over the raw data. However, similarity analysis in streaming data remains

a complex problem due to time limitation, non-precise values, fast decision speed, scalability, etc.

This article introduces a novel method for monitoring changes in the probabilistic distribution of

multi-dimensional data streams. As a measure of the rapidity of changes, we analyze the popular

Kullback-Leibler divergence. During the experimental study, we show how to use this metric to

predict the concept drift occurrence and understand its nature. The obtained results encourage further

work on the proposed methods and its application in the real tasks where the prediction of the future

appearance of concept drift plays a crucial role, such as predictive maintenance.

Keywords Drift detection ·Continual Learning ·Relative Entropy ·Data Stream Learning ·Lifelong Learning

1 Introduction

One of the critical problems in the analysis of streaming data is the possibility of changing the probabilistic characteristics

of the task during the prediction model run. This phenomenon, called concept drift [

], may deteriorate the classiﬁcation

quality of the using predictor, and usually, its occurrence is unpredictable. Its appearance is typical in many daily

decision-making tasks, as fraudsters may change the content of the e-mail to get past spam ﬁlters.

We may propose several concept drift taxonomies. The ﬁrst one focuses on how the drift impacts the probability

characteristics of the learning task. If it changes decision boundary shapes, i.e., the posterior probabilities have been

changed [

] then we face with so-called real concept drift.Virtual drift does not change the decision boundary shape.

However, it changes the unconditional probability density function [28]. Nevertheless, Oliveira et al. [20] pointed out

∗

This work was supported by the CEUS-UNISONO programme, which has received funding from the National Science Centre,

Poland under grant agreement No. 2020/02/Y/ST6/00037, and the GACR-Czech Science Foundation project No. 21-33574K

“Lifelong Machine Learning on Data Streams”. Authors are also thanksful to Paweł Zyblewski for his helpful constructive feedbacks.

arXiv:2210.04865v1 [cs.LG] 10 Oct 2022

Tracking changes using KL divergence for the continual learning

that while virtual concept drift does not change the shape of decision boundaries, it can reduce the usefulness of the

decision boundaries used by classiﬁers.

Another taxonomy considers the change speed. Mainly, we may distinguish sudden concept drift, when a new concept

abruptly replaces an old one; and incremental concept drift, when we may observe a steady progression from an old

concept toward a new one. We should also mention gradual concept drift, when during the transition between an old and

a new concept, the two concepts may occur with different intensities. An interesting phenomenon is periodic changes,

referred to as recurring concept drift [

], when previously occurring concepts may recur, which is very typical for

seasonal phenomena (cyclic concept [14]).

Currently, most approaches are reactive, i.e., they focus on the problem of concept drift detection or continuous model

adaptation to emerging changes. These approaches include concept drift detectors, which act as triggers for signiﬁcant

shifts in the probability distribution, and methods that continuously try to adapt to changes in the probability distribution.

The second group of the mentioned methods mainly use the classiﬁer ensemble [

], where the model consistency with

the current probability distribution is ensured either by changing the composition of the classiﬁer ensemble or updating

individual base classiﬁers.

This paper focuses on the concept drift monitoring problem and proposes a novel drift detection method. The signiﬁcant

advantage of the proposed approach is that it can compare distributions of unknown types to detect a drift without any

assumptions about distribution parameters or distribution types.

This work offers the following contributions:

• Using Kullback-Leibler divergence for ongoing monitoring of changes in probability distributions.

•

Introducing a novel concept drift detector based on the mentioned metric over the raw data, so-called KL-

divergence- based concept drift detector (KLD).

•

Fast and robust decision rule with a control real-value parameter that can be dynamically adjusted for improving

performance of the matching matrix.

• Initial experimental evaluation of the proposed method for non-stationary data streams.

The remainder of the paper is organized as follows. The next section presents a background over the main concepts of

this article. Section 3 fully describes our contribution. Experimental study is presented in Section 4. We close with a

summary of our contributions and discussing directions for future work.

2 Preliminaries

The following section presents the key information about data stream, concept drift, and Kullback-Leibler divergence

necessary to explain the proposed approach.

2.1 Streaming data

Continual learning of streaming data is usually associated to problems, where datasets become available in chunks. A

data stream often is deﬁned as an ordered sequence of chunks

{S1, S2, . . . , Sk, . . .}

, where the chunk

consists in a

ﬁnite data sequence

Si={z(1)

i,z(2)

i,...,z(K)

. Since our goal is to analyze continual learning over a data stream in

a supervised context, then we assume data points

z(k)

deﬁned as input-output pairs

(u(k)

i,y(k)

i)∈U×Y

. The input

vectors are in a

-dimensional space

, and the outputs belongs to a

-dimensional space

. In case of a classiﬁcation

problem, the space

describes a small number of classes. Otherwise, it is a regression problem where

is typically a

subset in a d-dimensional real space.

Tracking changes in a data stream often are made using drift detection methods. Most of the drift detection techniques

use a base predictor to classify incoming instances. In general, the methods work as follows. For each input instance

Tracking changes using KL divergence for the continual learning

the base predictor outputs a class label which is compared to the true class label. Then, the accuracy is evaluated and

used as tool for deciding whether a drift has occurred or not. There are several metrics to evaluate the accuracy of such

prediction. The choice of the selected metric, conventionally called loss function, depends on both the domain of

and

the optimization method applied for adjusting the base predictor. Some commonly used functions are

0/1

loss function

and cross-entropy in the case of classiﬁcation problems and quadratic errors in the case of regression problems [25].

2.2 Drift detection

A concept drift detector is an algorithm that can inform on data distribution changes. To detect a real drift, the labeled

data or a classiﬁer’s performance are usually required, but also some proposition employing unlabeled data only,

usually based on statistical tests as Smirnov-Kolmogorov are also proposed [

]. Nevertheless, we should be aware

of its limitations. Such detectors are good at detecting virtual concept drift and are not guaranteed to detect real drift.

However, changes in the unconditional probability density distributions are rarely not accompanied by changes in the

conditional distributions. Moreover, as mentioned in [

], virtual drift often affects the utility of the prediction models

used.

Some drift detectors can also return a warning signal, i.e., the distribution changing, but the change is insigniﬁcant. A

warning signal could trigger collecting new data for upcoming updates or rebuilding the current predictor.

Detecting the drift should be done as quickly as possible to replace an outdated model, thus minimizing the restoration

time. On the other hand, false positives are also unacceptable because they can cause the model to be corrected

unnecessarily, resulting in model adjustment to unrepresentative samples and increased, unwarranted consumption of

computational resources [13].

Nevertheless, according to the authors’ best knowledge, the recommended drift detector’s testing method should focus

on the predictive performance of the classiﬁer that employs the examined detectors. It is recommended to use the

following framework [12].

Let us shortly review the most important drift detection algorithms. CUSUM (Cumulative sum) [

] is a simple sequential

analysis technique based on the measurement of the mean value of the input data. If the mean is signiﬁcantly bigger

than zero drift is. Exponentially Weighted Moving Average (EWMA)[

] combines current and historical observations

that allow detecting changes in the mean value quickly by using aggregated charting statistics. A weight factor is

introduced that promotes the most current observations. DDM (Drift Detection Method) [

] incrementally estimates

a classiﬁer performance, which (assuming the convergence of the learning algorithm) has to diminish along with

new instances from the given distribution being continuously presented to the learner [

]. If the reverse behavior is

observed, we may suspect a change of probability distributions. EDDM (Early Drift Detection Methods) [

] extends

the DDM. It proposed the heuristic window size selection procedure, used different performance metric to follow, and

implemented new warning and drift levels . Blanco et al. [

] propose detectors that use the non-parametric estimation of

classiﬁer error employing Hoeffding’s and McDiarmid’s inequalities. ADWIN [

] employs an adapting sliding window,

being highly suitable for handling sudden drifts. When no change is apparent, it automatically grows the window

size and shrinks it when data stream distribution changes. It tests the hypothesis about the average equality in two

subwindows obtained using Hoeffding bounds. Nishida and Yamauchi [

] developed STEPD that employs statistics,

equivalent to the chi-square test with Yates’s continuity correction to infer whether the classiﬁcation quality of the

current observations had changed since the current model was used. We have also mention the compound detection

models which employ ensemble approach, that could be found, e.g., in [

], [

]. Concept drift detection has also

been developed over high-dimensional and sparse spaces (e.g. sparse time series) [30, 26].

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TRACKINGCHANGESUSINGKULLBACK-LEIBLERDIVERGENCEFORTHECONTINUALLEARNING(ACCEPTEDMANUSCRIPTATSMC'2022)SebastiánBasterrechFacultyofElectricalEngineeringandComputerScienceVB-TechnicalUniversityofOstravaOstrava,CzechRepublicSebastian.Basterrech@vsb.czMichaWo´zniakDepartmentofSystemsandComputerNetworksW...

展开>> 收起<<

Tracking changes using Kullback-Leibler divergence for the continual learning.pdf

共14页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Tracking changes using Kullback-Leibler divergence for the continual learning

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: