How To Overcome Richness Axiom Fallacy Mieczys law A. K lopotek 10000000346857045and Robert A. K lopotek20000000197834914

2025-05-06 1 0 414.6KB 18 页 10玖币

侵权投诉

How To Overcome Richness Axiom Fallacy

Mieczys law A. K lopotek 1[0000−0003−4685−7045] and

Robert A. K lopotek2[0000−0001−9783−4914]

1Institute of Computer Science,

Polish Academy of Sciences, Warsaw, Poland

mieczyslaw.klopotek@ipipan.waw.pl

2Faculty of Mathematics and Natural Sciences. School of Exact Sciences,

Cardinal Stefan Wyszy´nski University in Warsaw, Poland

r.klopotek@uksw.edu.pl

Abstract. The paper points at the grieving problems implied by the

richness axiom in the Kleinberg’s axiomatic system and suggests reso-

lutions. The richness induces learnability problem in general and leads

to conﬂicts with consistency axiom. As a resolution, learnability con-

straints and usage of centric consistency or restriction of the domain of

considered clusterings to super-ball-clusterings is proposed.

Keywords: richness axiom ·consistency axiom ·centric consistency ax-

iom ·clustering algorithms ·learnability ·theoretical foundations .

1 Introduction

This is an extended version of ISMIS 2022 Conference paper [14].

Kleinberg [8] introduced an axiomatic system for distance based clustering

functions consisting of three axioms: scale-invariance (same clustering should

be obtained if all distances between objects are multiplied by the same positive

number), consistency (same clustering should be obtained, if distances within a

cluster are decreased and distances between elements from distinct clusters are

increased) and richness (for any partition of the dataset into non-empty subsets

there should exist a set of distances between the datapoints so that the clustering

function delivers this partition). We shall say that (1) decreasing of distances

within a cluster and increasing distances between elements from distinct clus-

ters is called consistency transformation, and (2) multiplying distances between

objects by the same positive number is called scale-invariance transformation.

This set of axioms was controversial from the very beginning. Kleinberg

proved himself that the entire axiomatic system is contradictory, and only pairs

of these axioms are not contradictory. But also the axiom of consistency turned

out to be controversial as it excludes the k-means algorithm from the family of

clustering functions. It has been shown in [10] that consistency is contradictory

by itself if we consider ﬁxed dimensional Euclidean spaces which are natural

domain of k-means application.

It seems to be disastrous for the domain of clustering algorithms if an ax-

iomatic system consisting of ”natural axioms” is self-contradictory. It renders the

arXiv:2210.15507v1 [cs.AI] 27 Oct 2022

2 M.A. K lopotek and R.A. K lopotek

entire domain questionable. Therefore numerous eﬀorts have been made to cure

such a situation by proposing diﬀerent axiom sets or modifying Kleinberg’s one,

just to mention [1–4, 7, 16–18, 22] etc. Kleinberg himself introduced the concept

of partition Γ0being a reﬁnement of a partition Γ, if for every set C0∈Γ0, there

is a set C∈Γsuch that C0⊆C. He deﬁnes Reﬁnement-Consistency, a relax-

ation of Consistency, to require that if distance d’ is an f (d)-transformation of d,

then f(d’) should be a reﬁnement of f(d). Though there is no clustering function

that satisﬁes Scale-Invariance, Richness, and Reﬁnement-Consistency, but if one

deﬁnes Near-Richness as Richness without the partition in which each element

is in a separate cluster, then there exist clustering functions f that satisfy Scale-

Invariance and Reﬁnement-Consistency, and Near-Richness (e.g. single-linkage

with the distance-(αδ) stopping condition, where δ=mini,j d(i, j) and α≥1.

In this paper, we look at more detail at the richness property and investigate

its counter-intuitiveness. The richness induces learnability problem in general

(see Sec.2) and leads to conﬂicts with consistency axiom (see Sec.4). In the

past, we applied embedding into Euclidean space to resolve some problems with

consistency axiom, but it does not work for richness (Sec.3). Therefore, as a

resolution, usage of centric consistency (Sec.4) or restriction of the domain of

considered clusterings to super-ball-clusterings (Sec.5) is proposed.

2 Richness and Learnability

Following [8] let us deﬁne: A partition Γof the set of datapoints Sinto kpar-

titions is the set Γ={C1, . . . , Ck}such that Ci∩Cj=∅for i6=j,Ci6=∅

and S=C1∪C2∪ · · · ∪ Ck. A clustering function fis a function assigning a

partition Γto any dataset Swith at least two datapoints. That is given a clus-

tering function operates on the dataset Xfrom which samples Sare taken, then

f: 2X→22X, where for any S⊂X,card(S)≥2, f(S) is a partition of S. A

clustering quality function Qis a function assigning a real value to a partition.

That is Q: 22X→R, where Q(Γ) is deﬁned only if Qis a partition of some set.

Richness is a concept from the realm of clustering, that is unsupervised learn-

ing. Nonetheless, if we have an explicit clustering quality function Q, but the

clustering function fhas to be deduced from it assuming that Q(f(S)) is op-

timal among all possible partitions of S, then we have to do with the typical

supervised learning task. Hence we are transferred to the supervised learning

domain. In this domain, a function is deemed learnable if it can be discovered

in polynomial time of problem parameters.

Theorem 1. There exist rich clustering functions that cannot be learned in poly-

nomial time.

Proof. The proof is by construction of such a function. Assume we have a set S

of cardinality nwith a distance function d(assume ”distances” can be deﬁned

as Kleinberg assumes, being greater than zero for distinct points, symmetric

and arbitrary otherwise). Let Gbe the set of all possible partitions of S. Let

m:G→N∪ {0}be a function assigning each element of Ga unique consecutive

How To Overcome Richness Axiom Fallacy 3

integer starting from 0. This mapping is created as follows. Let the number of

all possible partitions of Sinto nonempty sets be M+ 1. Assume P, R is a pair

of points with the maximal distance within d. For each possible clustering Cj,

j= 0, ..., M (possible partition into nonempty sets) we compute the average

distance of datapoints within clusters, but neglecting the distance between the

two points P,R, if they happen to fall into the same cluster. We sort C1,...Cm

on them assigning 0 to the cluster with the lowest average distance and so on.

On ties, for the clusterings aﬀected, we try averages of squares of distances an so

on. If no resolution possible, arbitrary decision is made, based e.g. on alphabetic

ordering of clusters, cluster sizes etc. Create a function hassigning each integer

i∈[0, M] an interval (i/M, (i+ 1)/M] Imagine the following clustering function

f. It computes the quotient qof the smallest distance between data points to

the highest one, that is between P and R. Then the clustering function f(S, d) =

m−1(h−1(q)). The richness of fis easy to achieve. In case of equidistant points

in a set S,qamounts to 1. By increasing the distance d(P, Q) alone (permitted

by Kleinberg’s deﬁnition), qcan take any value from (0,1] interval. So there

exists always a set Sand distance function dsuch that freturns the desired

clustering. So the clustering function has the richness property. Because of relying

on quotients q, it is also scale-invariant. Though the function is simple in principle

(and useless also), and meets axioms of richness and scale -invariance, we have

a practical problem: As no other limitations are imposed, one has to check up

to M+ 1 = Pn

k=2 1

k!Pk

j=1(−1)k−jk

jjnpossible partitions (Bell number) in

order to verify which one of them is the best for a given distance function because

there must exist at least one distance function suitable for each of them. This

cannot be done in reasonable time even if computation of qis polynomial (even

linear) in the dimensions of the task (n).

The idea of learnability is based on the assumption that the learned concept

may be veriﬁed against new data. For example, if one learned k-means clustering

on a data sample, then with newly incoming elements the clustering should not

change (too much). With the richness requirement, if you learnt the model from

a sample of size n, then it consists of no more than nclusters. But there exists

a sample of size 2nsuch that it deﬁes the learned structure, because it has more

than nclusters. That is the clustering structure of the domain is not learnable

by deﬁnition of richness.

Furthermore, most algorithms of cluster analysis are constructed in an incre-

mental way. But this can be useless if the clustering quality function is designed

in a very unfriendly way, for example based on modulo computation.

Theorem 2. There exist rich clustering functions that cannot be learnt incre-

mentally

Proof. The proof is by construction of such a function. Consider the follow-

ing clustering quality function Qs(Γ). For each datapoint i∈Swe com-

pute its distance dito its cluster center under Γ. Let dmx = maxi∈Sdi.

q=Pi∈Sround(10000di/dmx). Then Qs(Γ) = p−qmod p, where pis a prime

4 M.A. K lopotek and R.A. K lopotek

number (here 3041). This Qswas applied in order to partition ﬁrst npoints

from sample data from Table 1, as illustrated in Table 2. It turns out that the

best partition for npoints does not give any hint for the best partition for n+ 1

points therefore each possible partition needs to be investigated in order to ﬁnd

the best one.3

Table 1. Data points to be clustered using a ridiculous clustering quality function

id xcoordinate ycoordinate

1 4.022346 5.142886

2 3.745942 4.646777

3 4.442992 5.164956

4 3.616975 5.188107

5 3.807503 5.010183

6 4.169602 4.874328

7 3.557578 5.248182

8 3.876208 4.507264

9 4.102748 5.073515

10 3.895329 4.878176

Table 2. Partition of the best quality (the lower the value the better) after including

nﬁrst points from Table 1.

nquality partition

2 1270 {1, 2 }

3 1270 {1, 2 } { 3}

4 823 {1, 3, 4 } { 2}

5 315 {1, 4 } { 2, 3, 5 }

6 13 {1, 5 } { 2, 4, 6 } { 3}

7 3 {1, 6 } { 2, 7 } { 3, 5 } { 4}

8 2 {1, 2, 4, 5, 6, 8 } { 3} { 7}

9 1 {1, 2, 4, 5 } { 3, 8 } { 6, 9 } { 7}

10 1 {1, 2, 3, 5, 9 } { 4, 6 } { 7, 10 } { 8}

Under these circumstances let us point at the implications of the so-called

learnability theory [19],[6] for the richness axiom. On the one hand the hypothesis

space is too big for learning a clustering from a sample. On the other hand an

exhaustive search in this space is prohibitive so that some theoretical clustering

functions do not make practical sense.

3Strict separation [5] mentioned earlier is another kind of a weird cluster quality

function, requiring visits to all the partitions

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

HowToOvercomeRichnessAxiomFallacyMieczyslawA.Klopotek1[0000000346857045]andRobertA.Klopotek2[0000000197834914]1InstituteofComputerScience,PolishAcademyofSciences,Warsaw,Polandmieczyslaw.klopotek@ipipan.waw.pl2FacultyofMathematicsandNaturalSciences.SchoolofExactSciences,CardinalStefanWyszynskiUniver...

展开>> 收起<<

How To Overcome Richness Axiom Fallacy Mieczys law A. K lopotek 10000000346857045and Robert A. K lopotek20000000197834914.pdf

共18页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

How To Overcome Richness Axiom Fallacy Mieczys law A. K lopotek 10000000346857045and Robert A. K lopotek20000000197834914

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: