Towards Practical Explainability with Cluster Descriptors 1stXiaoyuan Liu

2025-05-06 0 0 361.28KB 10 页 10玖币

侵权投诉

Towards Practical Explainability with Cluster

Descriptors

1st Xiaoyuan Liu

Fujitsu Research of America, Inc.

Sunnyvale, CA, USA

xliu@fujitsu.com

2nd Ilya Tyagin

University of Delaware

Newark, DE, USA

tyagin@udel.edu

3rd Hayato Ushijima-Mwesigwa

Fujitsu Research of America, Inc.

Sunnyvale, CA, USA

hayato@fujitsu.com

4th Indradeep Ghosh

Fujitsu Research of America, Inc.

Sunnyvale, CA, USA

ighosh@fujitsu.com

5th Ilya Safro

University of Delaware

Newark, DE, USA

isafro@udel.edu

Abstract—With the rapid development of machine learning,

improving its explainability has become a crucial research goal.

We study the problem of making the clusters more explainable

by investigating the cluster descriptors. Given a set of objects

S, a clustering of these objects π, and a set of tags Tthat

have not participated in the clustering algorithm. Each object

in Sis associated with a subset of T. The goal is to ﬁnd a

representative set of tags for each cluster, referred to as the

cluster descriptors, with the constraint that these descriptors we

ﬁnd are pairwise disjoint, and the total size of all the descriptors

is minimized. In general, this problem is NP-hard. We propose a

novel explainability model that reinforces the previous models in

such a way that tags that do not contribute to explainability and

do not sufﬁciently distinguish between clusters are not added to

the optimal descriptors. The proposed model is formulated as

a quadratic unconstrained binary optimization problem which

makes it suitable for solving on modern optimization hardware

accelerators. We experimentally demonstrate how a proposed

explainability model can be solved on specialized hardware

for accelerating combinatorial optimization, the Fujitsu Digital

Annealer, and use real-life Twitter and PubMed datasets for use

cases. Reproducibility materials: Link

Index Terms—explainability, clustering, machine learning,

Ising model, combinatorial optimization

I. INTRODUCTION

Machine learning (ML) methods and artiﬁcial intelligence

(AI) technologies bring growing impact to our daily life, across

all domains, from healthcare to urban planning. As a result, the

need to make the obtained results more explainable is contin-

uously growing, and improving the explainability is extremely

important for many applications. Results of unsupervised ML

techniques are often hard to interpret, especially in a post-hoc

analysis. Unsupervised clustering is not an exception.

While most existing literature focuses on performing clus-

tering and ﬁnding an explanation simultaneously using the

same dataset, Davidson et al. [1] ﬁrst introduced the cluster

description problem in which they explored the idea of taking

a partitioned dataset Xfound by some clustering algorithm,

and explaining its result using another dataset Y. For instance,

This work is supported by Fujitsu Research of America, Inc.

in our experiments with the Twitter dataset [1], Xis a matrix

of user reposting behavior in Twitter, i.e., Xi,j is the number

of times user iwas retweeted by user j. We are given the

result found by spectral clustering, which partitions the users

into clusters. The dataset Yis a matrix of users’ hashtag usage,

i.e., Yi,j is the number of times user itweets with hashtag j.

For another example, we experiment with PubMed, the dataset

of scientiﬁc abstracts and metadata, for which matrix Xis

not strictly required for clustering. The clustering is naturally

performed based on the medical subject heading terms (MeSH

terms) that are manually curated and added to the document

metadata. Matrix Yrepresents a document-term matrix i.e.,

Yi,j is the number of times a term joccurs in document

i. Overall, there exist many motivating examples in which

we may need to restrict some features from participating in

clustering but then use them to analyze the clusters. One

important example is sensitive and protected features that

exhibit a major problem in modern data mining. These features

are not supposed to participate in clustering but if they still

clearly distinguish the obtained clusters, it could be a sign of a

wrong clustering model or hyperparameters’ choice. Another

use case is the third party proprietary data that cannot inform

clustering but can be used to analyze its results.

Let Sdenote the set of nobjects, S={si}n

i=1, and we are

given a clustering πthat partitions Sinto kdisjoint clusters

{C`}k

`=1. We are also given a set of tags T. Each object si∈S

is associated with a subset ti⊆Tof tags, 1≤i≤n. For each

cluster, we deﬁne T`⊆T, the descriptor of cluster C`. An

object siin cluster C`, is said to be covered by the descriptor

T`, if T`contains at least one of the tags in ti, the tag set

associated with si. The goal of cluster description problem

is to ﬁnd kpairwise disjoint descriptors, one descriptor for

each cluster, such that all objects are covered and the total

number of tags used in the descriptors is minimized. This

optimization problem is refered to as the disjoint tag descriptor

minimization (DTDM) problem. Figure 1(a) depicts a toy

example with 6 objects and a partition of 2 clusters.

However, the constraint that requires all objects to be

arXiv:2210.10662v2 [cs.LG] 20 Oct 2022

Tag 1 Tag 2 Tag 3 Tag 4 Tag 1 TAG2 TAG 3 TAG4 Tag 1 Tag 2 Tag 3 Tag 4

(a) (b) (c)

Fig. 1. (a) A toy example with 6 objects {si}6

i=1, and their partition into 2 clusters C1and C2. Tag set Tcontains 4 tags. Each object siis associated a subset

ti⊂T. In this example, t1={TAG 1,TAG 2},t2={TAG 1},t3={TAG 1},t4={TAG 2,TAG 3,TAG 4},t5={TAG 3},t6={TAG 3}. One

solution for DTDM is T1={TAG 1}, T2={TAG 3}. (b) Optimal solution for DTDM and MinConCD is T1={TAG 1}, T2={TAG 3}, however both

tags are generic and do not reveal speciﬁc information for each cluster, reducing the usefulness of the descriptors. (c) If select T1={TAG 1,TAG 2}, T2=

{TAG 3,TAG 4}as descriptors, they explain each cluster better albeit using more tags.

covered can sometimes be too strict, making even ﬁnding

a feasible solution very challenging. To address this issue,

Davidson et al. also discussed several variants of DTDM.

One such variant is referred to as the “cover-or-forget” ver-

sion, which relaxes the constraint that all objects must be

covered and allows ignoring some of the objects. Recently,

Sambaturu et al. [2], have pointed out that the “cover-or-

forget” variant does resolve the feasibility issue but might

have highly unbalanced coverage, i.e., for some clusters, the

descriptors can cover most of the objects, but for some cluster,

the descriptor might cover very few objects. The authors

proposed to address this issue by adding constraints on the

number of objects needed to be covered for each cluster, i.e.,

introduce additional parameter M`for each cluster C`, such

that the number of objects that are covered must be at least

M`. The extended cluster description formulation with more

balanced coverage guarantees for all clusters is referred to

as the minimum constrained cluster description (MinConCD)

problem. The DTDM is therefore a special case of MinConCD

when M`=|C`|for 1≤`≤k. However, we argue that this

model has another issue. Consider the following scenario: in

many datasets, we might have several tags that cover most

of the objects. For example, it has been noticed multiple

times that depending on the partially extracted information,

the Twitter hashtags may follow Zipf or approximate power

laws [3], [4]. If we only focus on minimizing the total number

of tags while satisfying the coverage constraints, the most

frequently used tags will be selected preferably. Figure 1(b)

shows such scenario, while the optimal solution uses only 2

tags. These tags might generally describe the entire dataset,

they do not reveal speciﬁc information about the cluster they

are assigned to. The descriptors of the clusters will therefore be

similar to each other, reducing the usefulness of the results. In

contrast, we wish to select the tags that reveal the information

of each cluster albeit potentially using more tags, as in the

scenario displayed in Figure 1(c).

To address this issue, we propose to further reinforce the

explainability formulations of MinConCD by adding an addi-

tional metric to the objective function which will fundamen-

tally change the outcomes. The novel metric, which we call

tag modularity (TM), is inspired by the notion of modularity

[5] in network science. Modularity is a measure of the graph

structure which quantiﬁes the strengths of the division of a

graph into clusters. A clustering with high modularity tends to

have dense connections between the nodes within clusters but

sparse connections between nodes in different clusters. Quanti-

tatively, the modularity on graphs measures a deviation of node

clusters from a random model. Maximizing the modularity

naturally punishes trivial clusterings which are practically

useless. Intuitively, by considering both the total number of

tags used for the descriptors and the connections between the

tags and the objects, we can better identify the cluster-speciﬁc

tags and ﬁlter out the general tags, making the descriptor

more interpretable and thus improving the explainability of

the descriptors.

Both DTDM and MinConCD are challenging combinatorial

optimization problems. Davidson et al. [1] showed that even

for two clusters, ﬁnding only a feasible solution of DTDM

is intractable not to mention the optimality. Sambaturu et

al. [2] showed that if the coverage constraints in MinConCD

for each cluster must be met, then unless P=NP , for

any ρ≥1, there is no polynomial-time algorithm that can

approximate the objective function value with a factor of ρ.

Typically, in order to tackle such challenging optimization

models, the problems are formulated in terms of integer linear

programming (ILP), as proposed in the works of Davidson

et al. and Sambaturu et al. Then, to tackle the computational

intractability of such problems, relaxations on the constraints

or approximation algorithms are applied.

We explore the way to deal with the hardness of such

problems from the optimization accelerator systems’ angle

whose recent advent opens new opportunities for the data

mining community. As we are approaching the end of Moore’s

law [6] due to the physical limitations, major efforts from the

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

TowardsPracticalExplainabilitywithClusterDescriptors1stXiaoyuanLiuFujitsuResearchofAmerica,Inc.Sunnyvale,CA,USAxliu@fujitsu.com2ndIlyaTyaginUniversityofDelawareNewark,DE,USAtyagin@udel.edu3rdHayatoUshijima-MwesigwaFujitsuResearchofAmerica,Inc.Sunnyvale,CA,USAhayato@fujitsu.com4thIndradeepGhoshFujits...

展开>> 收起<<

Towards Practical Explainability with Cluster Descriptors 1stXiaoyuan Liu.pdf

共10页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Towards Practical Explainability with Cluster Descriptors 1stXiaoyuan Liu

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: