How To Overcome Richness Axiom Fallacy Mieczys law A. K lopotek 10000000346857045and Robert A. K lopotek20000000197834914

2025-05-06 0 0 414.6KB 18 页 10玖币
侵权投诉
How To Overcome Richness Axiom Fallacy
Mieczys law A. K lopotek 1[0000000346857045] and
Robert A. K lopotek2[0000000197834914]
1Institute of Computer Science,
Polish Academy of Sciences, Warsaw, Poland
mieczyslaw.klopotek@ipipan.waw.pl
2Faculty of Mathematics and Natural Sciences. School of Exact Sciences,
Cardinal Stefan Wyszy´nski University in Warsaw, Poland
r.klopotek@uksw.edu.pl
Abstract. The paper points at the grieving problems implied by the
richness axiom in the Kleinberg’s axiomatic system and suggests reso-
lutions. The richness induces learnability problem in general and leads
to conflicts with consistency axiom. As a resolution, learnability con-
straints and usage of centric consistency or restriction of the domain of
considered clusterings to super-ball-clusterings is proposed.
Keywords: richness axiom ·consistency axiom ·centric consistency ax-
iom ·clustering algorithms ·learnability ·theoretical foundations .
1 Introduction
This is an extended version of ISMIS 2022 Conference paper [14].
Kleinberg [8] introduced an axiomatic system for distance based clustering
functions consisting of three axioms: scale-invariance (same clustering should
be obtained if all distances between objects are multiplied by the same positive
number), consistency (same clustering should be obtained, if distances within a
cluster are decreased and distances between elements from distinct clusters are
increased) and richness (for any partition of the dataset into non-empty subsets
there should exist a set of distances between the datapoints so that the clustering
function delivers this partition). We shall say that (1) decreasing of distances
within a cluster and increasing distances between elements from distinct clus-
ters is called consistency transformation, and (2) multiplying distances between
objects by the same positive number is called scale-invariance transformation.
This set of axioms was controversial from the very beginning. Kleinberg
proved himself that the entire axiomatic system is contradictory, and only pairs
of these axioms are not contradictory. But also the axiom of consistency turned
out to be controversial as it excludes the k-means algorithm from the family of
clustering functions. It has been shown in [10] that consistency is contradictory
by itself if we consider fixed dimensional Euclidean spaces which are natural
domain of k-means application.
It seems to be disastrous for the domain of clustering algorithms if an ax-
iomatic system consisting of ”natural axioms” is self-contradictory. It renders the
arXiv:2210.15507v1 [cs.AI] 27 Oct 2022
2 M.A. K lopotek and R.A. K lopotek
entire domain questionable. Therefore numerous efforts have been made to cure
such a situation by proposing different axiom sets or modifying Kleinberg’s one,
just to mention [1–4, 7, 16–18, 22] etc. Kleinberg himself introduced the concept
of partition Γ0being a refinement of a partition Γ, if for every set C0Γ0, there
is a set CΓsuch that C0C. He defines Refinement-Consistency, a relax-
ation of Consistency, to require that if distance d’ is an f (d)-transformation of d,
then f(d’) should be a refinement of f(d). Though there is no clustering function
that satisfies Scale-Invariance, Richness, and Refinement-Consistency, but if one
defines Near-Richness as Richness without the partition in which each element
is in a separate cluster, then there exist clustering functions f that satisfy Scale-
Invariance and Refinement-Consistency, and Near-Richness (e.g. single-linkage
with the distance-(αδ) stopping condition, where δ=mini,j d(i, j) and α1.
In this paper, we look at more detail at the richness property and investigate
its counter-intuitiveness. The richness induces learnability problem in general
(see Sec.2) and leads to conflicts with consistency axiom (see Sec.4). In the
past, we applied embedding into Euclidean space to resolve some problems with
consistency axiom, but it does not work for richness (Sec.3). Therefore, as a
resolution, usage of centric consistency (Sec.4) or restriction of the domain of
considered clusterings to super-ball-clusterings (Sec.5) is proposed.
2 Richness and Learnability
Following [8] let us define: A partition Γof the set of datapoints Sinto kpar-
titions is the set Γ={C1, . . . , Ck}such that CiCj=for i6=j,Ci6=
and S=C1C2 · · · Ck. A clustering function fis a function assigning a
partition Γto any dataset Swith at least two datapoints. That is given a clus-
tering function operates on the dataset Xfrom which samples Sare taken, then
f: 2X22X, where for any SX,card(S)2, f(S) is a partition of S. A
clustering quality function Qis a function assigning a real value to a partition.
That is Q: 22XR, where Q(Γ) is defined only if Qis a partition of some set.
Richness is a concept from the realm of clustering, that is unsupervised learn-
ing. Nonetheless, if we have an explicit clustering quality function Q, but the
clustering function fhas to be deduced from it assuming that Q(f(S)) is op-
timal among all possible partitions of S, then we have to do with the typical
supervised learning task. Hence we are transferred to the supervised learning
domain. In this domain, a function is deemed learnable if it can be discovered
in polynomial time of problem parameters.
Theorem 1. There exist rich clustering functions that cannot be learned in poly-
nomial time.
Proof. The proof is by construction of such a function. Assume we have a set S
of cardinality nwith a distance function d(assume ”distances” can be defined
as Kleinberg assumes, being greater than zero for distinct points, symmetric
and arbitrary otherwise). Let Gbe the set of all possible partitions of S. Let
m:GN∪ {0}be a function assigning each element of Ga unique consecutive
How To Overcome Richness Axiom Fallacy 3
integer starting from 0. This mapping is created as follows. Let the number of
all possible partitions of Sinto nonempty sets be M+ 1. Assume P, R is a pair
of points with the maximal distance within d. For each possible clustering Cj,
j= 0, ..., M (possible partition into nonempty sets) we compute the average
distance of datapoints within clusters, but neglecting the distance between the
two points P,R, if they happen to fall into the same cluster. We sort C1,...Cm
on them assigning 0 to the cluster with the lowest average distance and so on.
On ties, for the clusterings affected, we try averages of squares of distances an so
on. If no resolution possible, arbitrary decision is made, based e.g. on alphabetic
ordering of clusters, cluster sizes etc. Create a function hassigning each integer
i[0, M] an interval (i/M, (i+ 1)/M] Imagine the following clustering function
f. It computes the quotient qof the smallest distance between data points to
the highest one, that is between P and R. Then the clustering function f(S, d) =
m1(h1(q)). The richness of fis easy to achieve. In case of equidistant points
in a set S,qamounts to 1. By increasing the distance d(P, Q) alone (permitted
by Kleinberg’s definition), qcan take any value from (0,1] interval. So there
exists always a set Sand distance function dsuch that freturns the desired
clustering. So the clustering function has the richness property. Because of relying
on quotients q, it is also scale-invariant. Though the function is simple in principle
(and useless also), and meets axioms of richness and scale -invariance, we have
a practical problem: As no other limitations are imposed, one has to check up
to M+ 1 = Pn
k=2 1
k!Pk
j=1(1)kjk
jjnpossible partitions (Bell number) in
order to verify which one of them is the best for a given distance function because
there must exist at least one distance function suitable for each of them. This
cannot be done in reasonable time even if computation of qis polynomial (even
linear) in the dimensions of the task (n).
The idea of learnability is based on the assumption that the learned concept
may be verified against new data. For example, if one learned k-means clustering
on a data sample, then with newly incoming elements the clustering should not
change (too much). With the richness requirement, if you learnt the model from
a sample of size n, then it consists of no more than nclusters. But there exists
a sample of size 2nsuch that it defies the learned structure, because it has more
than nclusters. That is the clustering structure of the domain is not learnable
by definition of richness.
Furthermore, most algorithms of cluster analysis are constructed in an incre-
mental way. But this can be useless if the clustering quality function is designed
in a very unfriendly way, for example based on modulo computation.
Theorem 2. There exist rich clustering functions that cannot be learnt incre-
mentally
Proof. The proof is by construction of such a function. Consider the follow-
ing clustering quality function Qs(Γ). For each datapoint iSwe com-
pute its distance dito its cluster center under Γ. Let dmx = maxiSdi.
q=PiSround(10000di/dmx). Then Qs(Γ) = pqmod p, where pis a prime
4 M.A. K lopotek and R.A. K lopotek
number (here 3041). This Qswas applied in order to partition first npoints
from sample data from Table 1, as illustrated in Table 2. It turns out that the
best partition for npoints does not give any hint for the best partition for n+ 1
points therefore each possible partition needs to be investigated in order to find
the best one.3
Table 1. Data points to be clustered using a ridiculous clustering quality function
id xcoordinate ycoordinate
1 4.022346 5.142886
2 3.745942 4.646777
3 4.442992 5.164956
4 3.616975 5.188107
5 3.807503 5.010183
6 4.169602 4.874328
7 3.557578 5.248182
8 3.876208 4.507264
9 4.102748 5.073515
10 3.895329 4.878176
Table 2. Partition of the best quality (the lower the value the better) after including
nfirst points from Table 1.
nquality partition
2 1270 {1, 2 }
3 1270 {1, 2 } { 3}
4 823 {1, 3, 4 } { 2}
5 315 {1, 4 } { 2, 3, 5 }
6 13 {1, 5 } { 2, 4, 6 } { 3}
7 3 {1, 6 } { 2, 7 } { 3, 5 } { 4}
8 2 {1, 2, 4, 5, 6, 8 } { 3} { 7}
9 1 {1, 2, 4, 5 } { 3, 8 } { 6, 9 } { 7}
10 1 {1, 2, 3, 5, 9 } { 4, 6 } { 7, 10 } { 8}
Under these circumstances let us point at the implications of the so-called
learnability theory [19],[6] for the richness axiom. On the one hand the hypothesis
space is too big for learning a clustering from a sample. On the other hand an
exhaustive search in this space is prohibitive so that some theoretical clustering
functions do not make practical sense.
3Strict separation [5] mentioned earlier is another kind of a weird cluster quality
function, requiring visits to all the partitions
摘要:

HowToOvercomeRichnessAxiomFallacyMieczyslawA.Klopotek1[0000000346857045]andRobertA.Klopotek2[0000000197834914]1InstituteofComputerScience,PolishAcademyofSciences,Warsaw,Polandmieczyslaw.klopotek@ipipan.waw.pl2FacultyofMathematicsandNaturalSciences.SchoolofExactSciences,CardinalStefanWyszynskiUniver...

展开>> 收起<<
How To Overcome Richness Axiom Fallacy Mieczys law A. K lopotek 10000000346857045and Robert A. K lopotek20000000197834914.pdf

共18页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:18 页 大小:414.6KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 18
客服
关注