1 Granular-Ball Fuzzy Set and Its Implementation in SVM

2025-04-30 0 0 2MB 11 页 10玖币
侵权投诉
1
Granular-Ball Fuzzy Set and Its Implementation in
SVM
Shuyin Xia, Xiaoyu Lian, Guoyin Wang*, Xinbo Gao, Yabin Shao
Abstract—Most existing fuzzy set methods use points as their
input, which is the finest granularity from the perspective of
granular computing. Consequently, these methods are neither
efficient nor robust to label noise. Therefore, we propose a frame-
work called granular-ball fuzzy set by introducing granular-ball
computing into fuzzy set. The computational framework is based
on the granular-balls input rather than points; therefore, it is
more efficient and robust than traditional fuzzy methods, and
can be used in various fields of fuzzy data processing according
to its extensibility. Furthermore, the framework is extended to
the classifier fuzzy support vector machine (FSVM), to derive the
granular ball fuzzy SVM (GBFSVM). The experimental results
demonstrate the effectiveness and efficiency of GBFSVM. The
source codes and data sets are available on the public link:
http://www.cquptshuyinxia.com/GBFSVM.html.
Index Terms—Fuzzy set, granular-ball, SVM, granular com-
puting, label noise.
I. INTRODUCTION
IN the practical world, there are numerous fuzzy phenomena
or concepts in the objective world, such as big and small,
light and heavy, fast and slow, dynamic and static, deep and
shallow, beauty and ugliness, etc., which cannot be clearly
and completely distinguished. In fact, fuzzy information is
also reliable information. In order to quantitatively describe
the objective laws of fuzzy concepts and fuzzy phenomena,
Professor L.A. Zadeh, an American computer and cybernetics
expert, put forward the important concept of fuzzy set [1] in
1965. He used membership functions to represent fuzzy sets,
which are functions of [0,1] closed intervals, to describe the
degree to which elements belong to fuzzy sets. The greater the
function value, the greater the degree of membership. Since
Zadeh introduced fuzzy sets [2], it has been applied to various
fields such as control systems, pattern recognition, machine
learning, etc, and its another branch, fuzzy rough set, has also
been developed rapidly. Several scholars have conducted in-
depth research in the direction of feature selection [3], [4],
[5], [6], [7], [8], clustering [9], decision making [10], [11] ,
classification [12] and so on.
Considering the classification problem of fuzzy data sets,
Lin et al. [1] proposed a fuzzy support vector machine (FSVM)
model by applying fuzzy membership to each input point.
The model can make full use of the sample information,
however, the complexity of the training stage is still high
for a large number of data classification problems. For the
research on fuzzy set classification tasks in the field of machine
S. Xia, X. Lian, G. Wang, X. Gao and Y. Shao are with the Chongqing
Key Laboratory of Computational Intelligence, Chongqing University of
Telecommunications and Posts, 400065, Chongqing, China. E-mail: xi-
asy@cqupt.edu.cn, 1258852995@qq.com, shaoyb@cqupt.edu.cn.
Fig. 1. Human cognition the coarse-grained large range is preferred.
learning, Aydogan et al. [13] proposed a hybrid heuristic
method based on the genetic algorithm (GA) and integer
programming formula (IPF) to solve the high-dimensional
classification problem in the classification system of linguistic
fuzzy rules. The method can find accurate and concise classi-
fication rules, but can not flexibly consider the number of rule
sets generated in the classification. Sanz et al. [14] directly
learned interval-valued fuzzy rules by defining a packaging
method to obtain a classification system based on the interval
valued fuzzy principle. Compared with the existing algorithm
at that time, the accuracy of this method has been significantly
improved, but the unbalanced classification problem can not
be well tested. The algorithm is inefficient owing to its two
evolutionary processes. Li et al. [15] proposed an interval
extreme learning machine for interval fuzzy set classification
of continuous-valued attributes, in which the discretization
of conditional attributes and fuzzification of class labels are
considered. Recently, an associative fuzzy classifier called
CFM-BD [16] was been developed, which has shown ro-
bust predictive performance against more complex algorithms
such as fuzzy decision trees [17]. To simplify the rule set,
Aghaeipoor et al. [18] proposed a new scalable fuzzy classifier
for big data, namely Chi-BD-DRF, which added the method of
"dynamic rule filtering (DRF)" to supplement fuzzy big data
learning.
The aforementioned mentioned processing methods are
based on the finest granularity from the perspective of granular
computing [19], [20], as shown in Fig. 2(a), therefore, it is not
efficient and robust. Human cognition has the rule of "large
scope first," and the visual system is particularly sensitive
to the global topological characteristics, from large to small,
from coarse-grained to fine-grained as shown in Fig. 1 [21]. In
granular computing, the larger the granularity size, the higher
the efficiency and the better the robustness to noise. However,
arXiv:2210.11675v2 [cs.LG] 26 Nov 2022
2
Fig. 2. The existing fuzzy set and the comparison of the granular-ball fuzzy
set.
this is also more likely to lead to a lack of detail and loss of
accuracy. Smaller granularity allows more attention to detail,
but may reduce the efficiency and robustness to label noise.
In the past decades, scholars worldwide have been constantly
studying [22], [23], [24], [25], who granulate huge amounts
of data and knowledge into different granularities according
to different tasks. The relationship between these granular-
ities was then used to solve this problem [26], [27], [28],
[29]. Selecting different granularities according to various
scenarios can improve the performance of multi-granularity
learning methods and solve practical problems [30], [31], [32].
Therefore, Xia et al.[33] proposed granular-ball classifiers
using some hyper-balls to granulate the dataset into different
sizes of granular-balls [34]. The granular-ball support vector
machine (GBSVM) [35] is further proposed, and exhibits
higher accuracy and efficiency than the traditional SVM. In
order to improve the efficiency of fuzzy data processing,
the idea of granular-ball computing can be introduced into
fuzzy data processing by defining the fuzzy granular-ball, as
shown in 2(b). The concept of fuzzy granular-balls was briefly
proposed in our previous work [36], but its algorithm is not
designed; besides, its SVM model is incorrect [36], [37], too
complex and not consistent with the SVM. In order to improve
the efficiency and robustness of fuzzy classifiers by combining
granular-ball computing, the main contributions of the paper
are as follows:
We propose a framework called the granular-ball fuzzy
set by introducing the concept of the fuzzy granular-ball.
It is different from the traditional fuzzy data processing
method.
GBFSVM is proposed based on the fuzzy granular-ball
framework. The framework uses granular-balls as the
basic analysis unit instead of data points.
Considering the classification problem with the character-
istics of triangular fuzzy numbers, the GBFSVM based
on triangular fuzzy numbers is derived in detail using the
possibility measure theory.
Particle swarm optimization (PSO) is used to solve the
dual model of GBSVM. Experimental results indicate that
GBFSVM performs better than the traditional SVM and
FSVM both in robustness and effectiveness.
The rest of this paper is organized as follows: we introduce
the concepts of fuzzy sets and the work related to granular-
ball computing II. Section III details the granular-ball fuzzy set
framework and the definition of fuzzy granular-ball. Section IV
introduces the application of granular-ball fuzzy set in fuzzy
support vector machines and support vector machines based
on triangular fuzzy numbers. The experimental results and
analysis are presented in Section V. Finally, some concluding
remarks are given in Section VI.
II. RELATED WORK
A. Related concepts of fuzzy sets
With the development of modern science and technology, the
system we are facing is becoming more and more complex. For
complex problems in the fields of humanities, social sciences
and other "soft sciences," it is often difficult to provide an
accurate evaluation owing to insufficient cognition or informa-
tion content in the decision-making process. For multi-attribute
decision making without specific decision information, it is
difficult for decision makers to accurately evaluate the scheme,
thus, the concept of the fuzzy set is generated. This concept
is as follows:
Definition 1. (Fuzzy set [38]) If Xis a collection of objects
denoted generically by x, then a fuzzy set ˜
Ain Xis a set of
ordered pairs:
˜
A={x, µ ˜
A(x)|xX}.(1)
µ˜
A(x)is called the membership function (generalized charac-
teristic function) which maps Xto the membership space M.
Generally speaking, the range of the membership function is
[0,1].
The most important role of fuzzy sets is to represent various
uncertainties in the data and data processing. In particular, the
introduction of fuzzy sets in big data improves the represen-
tation ability of the information samples.
In particular, the triangular fuzzy number is the concept of
the fuzzy set proposed by Professor Lotfi A. Zadeh in 1965
in order to solve these problems in an uncertain environment.
The concept of triangular fuzzy number is as follows:
Definition 2. (Triangular fuzzy number [39]) Suppose ˜ais
a triangular fuzzy number, when its membership function is
expressed as follows:
µ˜a(x) =
xr1
r2r1
, r1x<r2,
1, x =a,
xr3
r2r3
, r2< x r3,
(2)
where r1r2r3, rjR(j= 1,2,3) and ˜ais called
a triangular fuzzy number, denoted by ˜a= (r1, r2, r3). The
real numbers r2,r1and r3are called the center, left and
right endpoints of the triangular fuzzy number ˜a, respectively.
The center reflects the main position of the triangular fuzzy
number, and the real number acan be expressed as a special
triangular fuzzy number a= (a, a, a).
The probability of occurrence of a fuzzy event ˜acan
be measured using the possibility measure. This possibility
3
measure was proposed by Professor Lotfi A. Zadeh in 1978.
It is defined as follows:
Definition 3. (Possibility measure [40]) Let (Γ,A) be a
backup domain space, and Pos is a set function defined on the
backup domain A. If Pos satisfies the following conditions:
(1) Pos()=0, and Pos(Γ) = 1;
(2) For any subclass {Ai|iI}of A, there are
Pos S
iI
Ai= sup
iI
Pos(Ai). This is called a possibility
measure, and the triples (Γ,A,Pos) are called possibility
spaces.
When the triangular fuzzy number is used to represent the
fuzzy event a, the likelihood of the fuzzy event is measured
as follows:
Definition 4. [41] Let ˜a= (r1, r2, r3)be a triangular fuzzy
number, then:
Pos(˜a0) =
1, r20,
r1
r1r2
, r10, r2>0,
0, r1>0.
(3)
Lemma 1. [42] Let ˜a= (r1, r2, r3)be a triangular fuzzy
number, then for any given confidence level λ(0 < λ 1),
we have:
Pos {˜a0} ≥ λ(1 λ)r1+λr20.(4)
SVM is a powerful tool for solving classification problems,
however, this theory still has some limitations. All training
points of the same class are uniformly treated using SVM the-
ory. In various real-world applications, the effects of training
points are different, and all have an ambiguous membership
relationship. Specifically, each training point no longer belongs
to one of these two classes entirely. Whereas the parameter ξ
is a measure of the error in SVM, and FSVM considers adding
different weights(i.e., membership degree δi) to the error [1],
and its model is as follows:
min 1
2kwk2+C
l
X
i=1
δiξi,
s.t. yi(wxi+b)>1ξi,
ξi0, i = 1,2, ..., n.
(5)
B. Granular-ball Computing
Granular-ball computing is a big data processing method
proposed by Wang and Xia to meet the scalability of high-
dimensional data [33]. The core idea of granular-ball com-
puting involves using the hyper ball to cover all or part of
the sample space, and use "granular-ball" as the input to
represent the sample space, so as to achieve multi-granularity
learning characteristics and accurate characterization of the
sample space. The great advantage of this method is that it
only needs two data representations of center and radius in
any dimension.
Fig. 3. Process of the existing granular-ball generation in granular-ball
computing.
In any dimensional space Rd, each granular-ball can be
described by two parameters i.e. center cand radius r. The
detailed definition is as follows:
Definition 5. [33] Given a data set D={x1, x2, ..., xn} ∈
Rd, the center cof a granular-ball is the center of gravity for
all sample points in the ball, and ris equal to the average
distance from all points in the granular-ball to c. Specifically,
we have:
c=1
n
n
X
i=1
xi;r=1
n
n
X
i=1
|xic|.
The radius ris defined as the average distance rather than
the maximum distance. The balls generated with the average
distance are not easily affected by the outlier sample and better
fit the data distribution. The label of a granular-ball is defined
as the label with the most appearances in a granular-ball. To
quantitatively analyze the mass of the split granular-ball, the
concept of "purity threshold" is proposed, and it is defined as
the percentage of majority samples with the same label in a
granular-ball.
Given the training set D={xi, i = 1,2, ..., n}, taking the
reciprocal of the granular-ball covering is taken to optimize its
minimum value. The optimization goal of the granular-balls
can be expressed as:
min λ1n
Pm
j=1 |GBj|+λ2m,
s.t. quality(GBj)T, j = 1,2, ..., m,
(6)
where λ1and λ2are the corresponding weight coefficients and
Tis the purity threshold. mrepresents the number of granular-
balls, GB, and m<n[21]. The existing granular-ball splitting
method currently uses the efficient k-means method (kis the
number of labels in a certain ball) to ensure the efficiency of
the granular-ball classification process. Fig. 3 is a heuristic
algorithm to solve the model (6). The dataset as a whole can
be considered as a granular-ball at the beginning, as shown in
摘要:

1Granular-BallFuzzySetandItsImplementationinSVMShuyinXia,XiaoyuLian,GuoyinWang*,XinboGao,YabinShaoAbstract—Mostexistingfuzzysetmethodsusepointsastheirinput,whichisthenestgranularityfromtheperspectiveofgranularcomputing.Consequently,thesemethodsareneitherefcientnorrobusttolabelnoise.Therefore,wepro...

展开>> 收起<<
1 Granular-Ball Fuzzy Set and Its Implementation in SVM.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:2MB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注