Tracking changes using Kullback-Leibler divergence for the continual learning

2025-05-06 0 0 1.56MB 14 页 10玖币
侵权投诉
TRACKING CHANGES USING KULLBACK-LEIBLER DIVERGENCE
FOR THE CONTINUAL LEARNING
(ACCEPTED MANUSCRIPT AT SMC’2022)
Sebastián Basterrech
Faculty of Electrical Engineering and Computer Science
VŠB-Technical University of Ostrava
Ostrava, Czech Republic
Sebastian.Basterrech@vsb.cz
Michał Wo´zniak
Department of Systems and Computer Networks
Wroclaw University of Science and Technology
Wroclaw, Poland
Michal.Wozniak@pwr.edu.pl
ABSTRACT
Recently, continual learning has received a lot of attention. One of the significant problems is the
occurrence of concept drift, which consists of changing probabilistic characteristics of the incoming
data. In the case of the classification task, this phenomenon destabilizes the model’s performance and
negatively affects the achieved prediction quality. Most current methods apply statistical learning
and similarity analysis over the raw data. However, similarity analysis in streaming data remains
a complex problem due to time limitation, non-precise values, fast decision speed, scalability, etc.
This article introduces a novel method for monitoring changes in the probabilistic distribution of
multi-dimensional data streams. As a measure of the rapidity of changes, we analyze the popular
Kullback-Leibler divergence. During the experimental study, we show how to use this metric to
predict the concept drift occurrence and understand its nature. The obtained results encourage further
work on the proposed methods and its application in the real tasks where the prediction of the future
appearance of concept drift plays a crucial role, such as predictive maintenance.
Keywords Drift detection ·Continual Learning ·Relative Entropy ·Data Stream Learning ·Lifelong Learning
1 Introduction
One of the critical problems in the analysis of streaming data is the possibility of changing the probabilistic characteristics
of the task during the prediction model run. This phenomenon, called concept drift [
29
], may deteriorate the classification
quality of the using predictor, and usually, its occurrence is unpredictable. Its appearance is typical in many daily
decision-making tasks, as fraudsters may change the content of the e-mail to get past spam filters.
We may propose several concept drift taxonomies. The first one focuses on how the drift impacts the probability
characteristics of the learning task. If it changes decision boundary shapes, i.e., the posterior probabilities have been
changed [
24
] then we face with so-called real concept drift.Virtual drift does not change the decision boundary shape.
However, it changes the unconditional probability density function [28]. Nevertheless, Oliveira et al. [20] pointed out
This work was supported by the CEUS-UNISONO programme, which has received funding from the National Science Centre,
Poland under grant agreement No. 2020/02/Y/ST6/00037, and the GACR-Czech Science Foundation project No. 21-33574K
“Lifelong Machine Learning on Data Streams”. Authors are also thanksful to Paweł Zyblewski for his helpful constructive feedbacks.
arXiv:2210.04865v1 [cs.LG] 10 Oct 2022
Tracking changes using KL divergence for the continual learning
that while virtual concept drift does not change the shape of decision boundaries, it can reduce the usefulness of the
decision boundaries used by classifiers.
Another taxonomy considers the change speed. Mainly, we may distinguish sudden concept drift, when a new concept
abruptly replaces an old one; and incremental concept drift, when we may observe a steady progression from an old
concept toward a new one. We should also mention gradual concept drift, when during the transition between an old and
a new concept, the two concepts may occur with different intensities. An interesting phenomenon is periodic changes,
referred to as recurring concept drift [
27
], when previously occurring concepts may recur, which is very typical for
seasonal phenomena (cyclic concept [14]).
Currently, most approaches are reactive, i.e., they focus on the problem of concept drift detection or continuous model
adaptation to emerging changes. These approaches include concept drift detectors, which act as triggers for significant
shifts in the probability distribution, and methods that continuously try to adapt to changes in the probability distribution.
The second group of the mentioned methods mainly use the classifier ensemble [
15
], where the model consistency with
the current probability distribution is ensured either by changing the composition of the classifier ensemble or updating
individual base classifiers.
This paper focuses on the concept drift monitoring problem and proposes a novel drift detection method. The significant
advantage of the proposed approach is that it can compare distributions of unknown types to detect a drift without any
assumptions about distribution parameters or distribution types.
This work offers the following contributions:
Using Kullback-Leibler divergence for ongoing monitoring of changes in probability distributions.
Introducing a novel concept drift detector based on the mentioned metric over the raw data, so-called KL-
divergence- based concept drift detector (KLD).
Fast and robust decision rule with a control real-value parameter that can be dynamically adjusted for improving
performance of the matching matrix.
Initial experimental evaluation of the proposed method for non-stationary data streams.
The remainder of the paper is organized as follows. The next section presents a background over the main concepts of
this article. Section 3 fully describes our contribution. Experimental study is presented in Section 4. We close with a
summary of our contributions and discussing directions for future work.
2 Preliminaries
The following section presents the key information about data stream, concept drift, and Kullback-Leibler divergence
necessary to explain the proposed approach.
2.1 Streaming data
Continual learning of streaming data is usually associated to problems, where datasets become available in chunks. A
data stream often is defined as an ordered sequence of chunks
{S1, S2, . . . , Sk, . . .}
, where the chunk
Si
consists in a
finite data sequence
Si={z(1)
i,z(2)
i,...,z(K)
i}
. Since our goal is to analyze continual learning over a data stream in
a supervised context, then we assume data points
z(k)
i
defined as input-output pairs
(u(k)
i,y(k)
i)U×Y
. The input
vectors are in a
p
-dimensional space
U
, and the outputs belongs to a
d
-dimensional space
Y
. In case of a classification
problem, the space
Y
describes a small number of classes. Otherwise, it is a regression problem where
Y
is typically a
subset in a d-dimensional real space.
Tracking changes in a data stream often are made using drift detection methods. Most of the drift detection techniques
use a base predictor to classify incoming instances. In general, the methods work as follows. For each input instance
2
Tracking changes using KL divergence for the continual learning
the base predictor outputs a class label which is compared to the true class label. Then, the accuracy is evaluated and
used as tool for deciding whether a drift has occurred or not. There are several metrics to evaluate the accuracy of such
prediction. The choice of the selected metric, conventionally called loss function, depends on both the domain of
Y
and
the optimization method applied for adjusting the base predictor. Some commonly used functions are
0/1
loss function
and cross-entropy in the case of classification problems and quadratic errors in the case of regression problems [25].
2.2 Drift detection
A concept drift detector is an algorithm that can inform on data distribution changes. To detect a real drift, the labeled
data or a classifier’s performance are usually required, but also some proposition employing unlabeled data only,
usually based on statistical tests as Smirnov-Kolmogorov are also proposed [
27
]. Nevertheless, we should be aware
of its limitations. Such detectors are good at detecting virtual concept drift and are not guaranteed to detect real drift.
However, changes in the unconditional probability density distributions are rarely not accompanied by changes in the
conditional distributions. Moreover, as mentioned in [
20
], virtual drift often affects the utility of the prediction models
used.
Some drift detectors can also return a warning signal, i.e., the distribution changing, but the change is insignificant. A
warning signal could trigger collecting new data for upcoming updates or rebuilding the current predictor.
Detecting the drift should be done as quickly as possible to replace an outdated model, thus minimizing the restoration
time. On the other hand, false positives are also unacceptable because they can cause the model to be corrected
unnecessarily, resulting in model adjustment to unrepresentative samples and increased, unwarranted consumption of
computational resources [13].
Nevertheless, according to the authors’ best knowledge, the recommended drift detector’s testing method should focus
on the predictive performance of the classifier that employs the examined detectors. It is recommended to use the
following framework [12].
Let us shortly review the most important drift detection algorithms. CUSUM (Cumulative sum) [
21
] is a simple sequential
analysis technique based on the measurement of the mean value of the input data. If the mean is significantly bigger
than zero drift is. Exponentially Weighted Moving Average (EWMA)[
23
] combines current and historical observations
that allow detecting changes in the mean value quickly by using aggregated charting statistics. A weight factor is
introduced that promotes the most current observations. DDM (Drift Detection Method) [
11
] incrementally estimates
a classifier performance, which (assuming the convergence of the learning algorithm) has to diminish along with
new instances from the given distribution being continuously presented to the learner [
22
]. If the reverse behavior is
observed, we may suspect a change of probability distributions. EDDM (Early Drift Detection Methods) [
1
] extends
the DDM. It proposed the heuristic window size selection procedure, used different performance metric to follow, and
implemented new warning and drift levels . Blanco et al. [
5
] propose detectors that use the non-parametric estimation of
classifier error employing Hoeffding’s and McDiarmid’s inequalities. ADWIN [
4
] employs an adapting sliding window,
being highly suitable for handling sudden drifts. When no change is apparent, it automatically grows the window
size and shrinks it when data stream distribution changes. It tests the hypothesis about the average equality in two
subwindows obtained using Hoeffding bounds. Nishida and Yamauchi [
19
] developed STEPD that employs statistics,
equivalent to the chi-square test with Yates’s continuity correction to infer whether the classification quality of the
current observations had changed since the current model was used. We have also mention the compound detection
models which employ ensemble approach, that could be found, e.g., in [
18
], [
10
], [
17
]. Concept drift detection has also
been developed over high-dimensional and sparse spaces (e.g. sparse time series) [30, 26].
3
摘要:

TRACKINGCHANGESUSINGKULLBACK-LEIBLERDIVERGENCEFORTHECONTINUALLEARNING(ACCEPTEDMANUSCRIPTATSMC'2022)SebastiánBasterrechFacultyofElectricalEngineeringandComputerScienceVŠB-TechnicalUniversityofOstravaOstrava,CzechRepublicSebastian.Basterrech@vsb.czMichaWo´zniakDepartmentofSystemsandComputerNetworksW...

展开>> 收起<<
Tracking changes using Kullback-Leibler divergence for the continual learning.pdf

共14页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:14 页 大小:1.56MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 14
客服
关注