1 Granular-Ball Fuzzy Set and Its Implementation in SVM

2025-04-30 0 0 2MB 11 页 10玖币

侵权投诉

Granular-Ball Fuzzy Set and Its Implementation in

SVM

Shuyin Xia, Xiaoyu Lian, Guoyin Wang*, Xinbo Gao, Yabin Shao

Abstract—Most existing fuzzy set methods use points as their

input, which is the ﬁnest granularity from the perspective of

granular computing. Consequently, these methods are neither

efﬁcient nor robust to label noise. Therefore, we propose a frame-

work called granular-ball fuzzy set by introducing granular-ball

computing into fuzzy set. The computational framework is based

on the granular-balls input rather than points; therefore, it is

more efﬁcient and robust than traditional fuzzy methods, and

can be used in various ﬁelds of fuzzy data processing according

to its extensibility. Furthermore, the framework is extended to

the classiﬁer fuzzy support vector machine (FSVM), to derive the

granular ball fuzzy SVM (GBFSVM). The experimental results

demonstrate the effectiveness and efﬁciency of GBFSVM. The

source codes and data sets are available on the public link:

http://www.cquptshuyinxia.com/GBFSVM.html.

Index Terms—Fuzzy set, granular-ball, SVM, granular com-

puting, label noise.

I. INTRODUCTION

IN the practical world, there are numerous fuzzy phenomena

or concepts in the objective world, such as big and small,

light and heavy, fast and slow, dynamic and static, deep and

shallow, beauty and ugliness, etc., which cannot be clearly

and completely distinguished. In fact, fuzzy information is

also reliable information. In order to quantitatively describe

the objective laws of fuzzy concepts and fuzzy phenomena,

Professor L.A. Zadeh, an American computer and cybernetics

expert, put forward the important concept of fuzzy set [1] in

1965. He used membership functions to represent fuzzy sets,

which are functions of [0,1] closed intervals, to describe the

degree to which elements belong to fuzzy sets. The greater the

function value, the greater the degree of membership. Since

Zadeh introduced fuzzy sets [2], it has been applied to various

ﬁelds such as control systems, pattern recognition, machine

learning, etc, and its another branch, fuzzy rough set, has also

been developed rapidly. Several scholars have conducted in-

depth research in the direction of feature selection [3], [4],

[5], [6], [7], [8], clustering [9], decision making [10], [11] ,

classiﬁcation [12] and so on.

Considering the classiﬁcation problem of fuzzy data sets,

Lin et al. [1] proposed a fuzzy support vector machine (FSVM)

model by applying fuzzy membership to each input point.

The model can make full use of the sample information,

however, the complexity of the training stage is still high

for a large number of data classiﬁcation problems. For the

research on fuzzy set classiﬁcation tasks in the ﬁeld of machine

S. Xia, X. Lian, G. Wang, X. Gao and Y. Shao are with the Chongqing

Key Laboratory of Computational Intelligence, Chongqing University of

Telecommunications and Posts, 400065, Chongqing, China. E-mail: xi-

asy@cqupt.edu.cn, 1258852995@qq.com, shaoyb@cqupt.edu.cn.

Fig. 1. Human cognition the coarse-grained large range is preferred.

learning, Aydogan et al. [13] proposed a hybrid heuristic

method based on the genetic algorithm (GA) and integer

programming formula (IPF) to solve the high-dimensional

classiﬁcation problem in the classiﬁcation system of linguistic

fuzzy rules. The method can ﬁnd accurate and concise classi-

ﬁcation rules, but can not ﬂexibly consider the number of rule

sets generated in the classiﬁcation. Sanz et al. [14] directly

learned interval-valued fuzzy rules by deﬁning a packaging

method to obtain a classiﬁcation system based on the interval

valued fuzzy principle. Compared with the existing algorithm

at that time, the accuracy of this method has been signiﬁcantly

improved, but the unbalanced classiﬁcation problem can not

be well tested. The algorithm is inefﬁcient owing to its two

evolutionary processes. Li et al. [15] proposed an interval

extreme learning machine for interval fuzzy set classiﬁcation

of continuous-valued attributes, in which the discretization

of conditional attributes and fuzziﬁcation of class labels are

considered. Recently, an associative fuzzy classiﬁer called

CFM-BD [16] was been developed, which has shown ro-

bust predictive performance against more complex algorithms

such as fuzzy decision trees [17]. To simplify the rule set,

Aghaeipoor et al. [18] proposed a new scalable fuzzy classiﬁer

for big data, namely Chi-BD-DRF, which added the method of

"dynamic rule ﬁltering (DRF)" to supplement fuzzy big data

learning.

The aforementioned mentioned processing methods are

based on the ﬁnest granularity from the perspective of granular

computing [19], [20], as shown in Fig. 2(a), therefore, it is not

efﬁcient and robust. Human cognition has the rule of "large

scope ﬁrst," and the visual system is particularly sensitive

to the global topological characteristics, from large to small,

from coarse-grained to ﬁne-grained as shown in Fig. 1 [21]. In

granular computing, the larger the granularity size, the higher

the efﬁciency and the better the robustness to noise. However,

arXiv:2210.11675v2 [cs.LG] 26 Nov 2022

Fig. 2. The existing fuzzy set and the comparison of the granular-ball fuzzy

set.

this is also more likely to lead to a lack of detail and loss of

accuracy. Smaller granularity allows more attention to detail,

but may reduce the efﬁciency and robustness to label noise.

In the past decades, scholars worldwide have been constantly

studying [22], [23], [24], [25], who granulate huge amounts

of data and knowledge into different granularities according

to different tasks. The relationship between these granular-

ities was then used to solve this problem [26], [27], [28],

[29]. Selecting different granularities according to various

scenarios can improve the performance of multi-granularity

learning methods and solve practical problems [30], [31], [32].

Therefore, Xia et al.[33] proposed granular-ball classiﬁers

using some hyper-balls to granulate the dataset into different

sizes of granular-balls [34]. The granular-ball support vector

machine (GBSVM) [35] is further proposed, and exhibits

higher accuracy and efﬁciency than the traditional SVM. In

order to improve the efﬁciency of fuzzy data processing,

the idea of granular-ball computing can be introduced into

fuzzy data processing by deﬁning the fuzzy granular-ball, as

shown in 2(b). The concept of fuzzy granular-balls was brieﬂy

proposed in our previous work [36], but its algorithm is not

designed; besides, its SVM model is incorrect [36], [37], too

complex and not consistent with the SVM. In order to improve

the efﬁciency and robustness of fuzzy classiﬁers by combining

granular-ball computing, the main contributions of the paper

are as follows:

•We propose a framework called the granular-ball fuzzy

set by introducing the concept of the fuzzy granular-ball.

It is different from the traditional fuzzy data processing

method.

•GBFSVM is proposed based on the fuzzy granular-ball

framework. The framework uses granular-balls as the

basic analysis unit instead of data points.

•Considering the classiﬁcation problem with the character-

istics of triangular fuzzy numbers, the GBFSVM based

on triangular fuzzy numbers is derived in detail using the

possibility measure theory.

•Particle swarm optimization (PSO) is used to solve the

dual model of GBSVM. Experimental results indicate that

GBFSVM performs better than the traditional SVM and

FSVM both in robustness and effectiveness.

The rest of this paper is organized as follows: we introduce

the concepts of fuzzy sets and the work related to granular-

ball computing II. Section III details the granular-ball fuzzy set

framework and the deﬁnition of fuzzy granular-ball. Section IV

introduces the application of granular-ball fuzzy set in fuzzy

support vector machines and support vector machines based

on triangular fuzzy numbers. The experimental results and

analysis are presented in Section V. Finally, some concluding

remarks are given in Section VI.

II. RELATED WORK

A. Related concepts of fuzzy sets

With the development of modern science and technology, the

system we are facing is becoming more and more complex. For

complex problems in the ﬁelds of humanities, social sciences

and other "soft sciences," it is often difﬁcult to provide an

accurate evaluation owing to insufﬁcient cognition or informa-

tion content in the decision-making process. For multi-attribute

decision making without speciﬁc decision information, it is

difﬁcult for decision makers to accurately evaluate the scheme,

thus, the concept of the fuzzy set is generated. This concept

is as follows:

Deﬁnition 1. (Fuzzy set [38]) If Xis a collection of objects

denoted generically by x, then a fuzzy set ˜

Ain Xis a set of

ordered pairs:

A={x, µ ˜

A(x)|x∈X}.(1)

µ˜

A(x)is called the membership function (generalized charac-

teristic function) which maps Xto the membership space M.

Generally speaking, the range of the membership function is

[0,1].

The most important role of fuzzy sets is to represent various

uncertainties in the data and data processing. In particular, the

introduction of fuzzy sets in big data improves the represen-

tation ability of the information samples.

In particular, the triangular fuzzy number is the concept of

the fuzzy set proposed by Professor Lotﬁ A. Zadeh in 1965

in order to solve these problems in an uncertain environment.

The concept of triangular fuzzy number is as follows:

Deﬁnition 2. (Triangular fuzzy number [39]) Suppose ˜ais

a triangular fuzzy number, when its membership function is

expressed as follows:

µ˜a(x) = 









x−r1

r2−r1

, r1≤x<r2,

1, x =a,

x−r3

r2−r3

, r2< x ≤r3,

(2)

where r1≤r2≤r3, rj∈R(j= 1,2,3) and ˜ais called

a triangular fuzzy number, denoted by ˜a= (r1, r2, r3). The

real numbers r2,r1and r3are called the center, left and

right endpoints of the triangular fuzzy number ˜a, respectively.

The center reﬂects the main position of the triangular fuzzy

number, and the real number acan be expressed as a special

triangular fuzzy number a= (a, a, a).

The probability of occurrence of a fuzzy event ˜acan

be measured using the possibility measure. This possibility

measure was proposed by Professor Lotﬁ A. Zadeh in 1978.

It is deﬁned as follows:

Deﬁnition 3. (Possibility measure [40]) Let (Γ,A) be a

backup domain space, and Pos is a set function deﬁned on the

backup domain A. If Pos satisﬁes the following conditions:

(1) Pos(∅)=0, and Pos(Γ) = 1;

(2) For any subclass {Ai|i∈I}of A, there are

Pos S

i∈I

Ai= sup

i∈I

Pos(Ai). This is called a possibility

measure, and the triples (Γ,A,Pos) are called possibility

spaces.

When the triangular fuzzy number is used to represent the

fuzzy event a, the likelihood of the fuzzy event is measured

as follows:

Deﬁnition 4. [41] Let ˜a= (r1, r2, r3)be a triangular fuzzy

number, then:

Pos(˜a≤0) = 









1, r2≤0,

r1−r2

, r1≤0, r2>0,

0, r1>0.

(3)

Lemma 1. [42] Let ˜a= (r1, r2, r3)be a triangular fuzzy

number, then for any given conﬁdence level λ(0 < λ ≤1),

we have:

Pos {˜a≤0} ≥ λ⇔(1 −λ)r1+λr2≤0.(4)

SVM is a powerful tool for solving classiﬁcation problems,

however, this theory still has some limitations. All training

points of the same class are uniformly treated using SVM the-

ory. In various real-world applications, the effects of training

points are different, and all have an ambiguous membership

relationship. Speciﬁcally, each training point no longer belongs

to one of these two classes entirely. Whereas the parameter ξ

is a measure of the error in SVM, and FSVM considers adding

different weights(i.e., membership degree δi) to the error [1],

and its model is as follows:











min 1

2kwk2+C

i=1

δiξi,

s.t. yi(wxi+b)>1−ξi,

ξi≥0, i = 1,2, ..., n.

(5)

B. Granular-ball Computing

Granular-ball computing is a big data processing method

proposed by Wang and Xia to meet the scalability of high-

dimensional data [33]. The core idea of granular-ball com-

puting involves using the hyper ball to cover all or part of

the sample space, and use "granular-ball" as the input to

represent the sample space, so as to achieve multi-granularity

learning characteristics and accurate characterization of the

sample space. The great advantage of this method is that it

only needs two data representations of center and radius in

any dimension.

Fig. 3. Process of the existing granular-ball generation in granular-ball

computing.

In any dimensional space Rd, each granular-ball can be

described by two parameters i.e. center cand radius r. The

detailed deﬁnition is as follows:

Deﬁnition 5. [33] Given a data set D={x1, x2, ..., xn} ∈

Rd, the center cof a granular-ball is the center of gravity for

all sample points in the ball, and ris equal to the average

distance from all points in the granular-ball to c. Speciﬁcally,

we have:

c=1

i=1

xi;r=1

i=1

|xi−c|.

The radius ris deﬁned as the average distance rather than

the maximum distance. The balls generated with the average

distance are not easily affected by the outlier sample and better

ﬁt the data distribution. The label of a granular-ball is deﬁned

as the label with the most appearances in a granular-ball. To

quantitatively analyze the mass of the split granular-ball, the

concept of "purity threshold" is proposed, and it is deﬁned as

the percentage of majority samples with the same label in a

granular-ball.

Given the training set D={xi, i = 1,2, ..., n}, taking the

reciprocal of the granular-ball covering is taken to optimize its

minimum value. The optimization goal of the granular-balls

can be expressed as:







min λ1∗n

j=1 |GBj|+λ2∗m,

s.t. quality(GBj)≥T, j = 1,2, ..., m,

(6)

where λ1and λ2are the corresponding weight coefﬁcients and

Tis the purity threshold. mrepresents the number of granular-

balls, GB, and m<n[21]. The existing granular-ball splitting

method currently uses the efﬁcient k-means method (kis the

number of labels in a certain ball) to ensure the efﬁciency of

the granular-ball classiﬁcation process. Fig. 3 is a heuristic

algorithm to solve the model (6). The dataset as a whole can

be considered as a granular-ball at the beginning, as shown in

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

1Granular-BallFuzzySetandItsImplementationinSVMShuyinXia,XiaoyuLian,GuoyinWang*,XinboGao,YabinShaoAbstractMostexistingfuzzysetmethodsusepointsastheirinput,whichisthenestgranularityfromtheperspectiveofgranularcomputing.Consequently,thesemethodsareneitherefcientnorrobusttolabelnoise.Therefore,wepro...

展开>> 收起<<

1 Granular-Ball Fuzzy Set and Its Implementation in SVM.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

1 Granular-Ball Fuzzy Set and Its Implementation in SVM

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: