Generating Hierarchical Explanations on Text Classiﬁcation Without Connecting Rules Yiming Ju Yuanzhe Zhang Kang Liu Jun Zhao

2025-05-06 0 0 2.76MB 8 页 10玖币

侵权投诉

Generating Hierarchical Explanations on Text Classiﬁcation

Without Connecting Rules

Yiming Ju, Yuanzhe Zhang, Kang Liu, Jun Zhao,

1National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China

2School of Artiﬁcial Intelligence, University of Chinese Academy of Sciences, Beijing, China

{yiming.ju, yzzhang, kliu, jzhao}@nlpr.ia.ac.cn

Abstract

The opaqueness of deep NLP models has mo-

tivated the development of methods for inter-

preting how deep models predict. Recently,

work has introduced hierarchical attribution,

which produces a hierarchical clustering of

words, along with an attribution score for each

cluster. However, existing work on hierarchi-

cal attribution all follows the connecting rule,

limiting the cluster to a continuous span in the

input text. We argue that the connecting rule

as an additional prior may undermine the abil-

ity to reﬂect the model decision process faith-

fully. To this end, we propose to generate hi-

erarchical explanations without the connecting

rule and introduce a framework for generating

hierarchical clusters. Experimental results and

further analysis show the effectiveness of the

proposed method in providing high-quality ex-

planations for reﬂecting model predicting pro-

cess.

1 Introduction

The opaqueness of deep natural language process-

ing (NLP) models has grown in tandem with their

power (Doshi-Velez and Kim,2017), which has

motivated efforts to interpret how these black-box

models work (Sundararajan et al.,2017;Belinkov

and Glass,2019). Post-hoc explanation aims to

explain a trained model and reveal how the model

arrives at a decision (Jacovi and Goldberg,2020;

Molnar,2020). In NLP, this goal is usually ap-

proached with attribution method, which assesses

the inﬂuence of inputs on model predictions.

Prior lines of work on post-hoc explanation usu-

ally focus on generating word-level or phrase-level

attribution for deep NLP models. Recently, work

has introduced the new idea of hierarchical attribu-

tion (Singh et al.,2018;Jin et al.,2019;Chen et al.,

2020). As shown in Figure 1, hierarchical attribu-

tion produces a hierarchical clustering of words,

and provides attribution scores for each clusters.

By providing compositional semantics information,

hierarchical attribution can give users a better un-

derstanding of the model decision-making process.

Since the attribution score of each cluster in hierar-

chical attribution is calculated separately, the key

point of generating hierarchical attribution is how

to get word clusters, which should be informative

enough to capture meaningful feature interactions

while displaying a sufﬁciently small subset of all

feature groups to maintain simplicity (Singh et al.,

2018).

Figure 1: An example of hierarchical attribution.

Existing work has proposed various algorithms

to generate hierarchical clusters. For example,

Singh et al. (2018) use CD score (Murdoch et al.,

2018) as a joining metric in the agglomerative clus-

tering procedure; Chen et al. (2020) recursively

divides large text spans into smaller ones by detect-

ing feature interaction. However, previous work

requires only adjacent clusters to be grouped as a

new cluster, which we denote as

the connecting

rule

. With the connecting rule, generated clusters

will always be continuous text spans in the input

text. While consistent with human reading habits,

we argue that the connecting rule as an additional

prior may undermine the ability to faithfully re-

ﬂect the model decision process. The concerns are

summarized as follows:

First, modern NLP models such as BERT (De-

vlin et al.,2019) and GPT (Radford et al.,2018,

2019) are almost all transformer-based, using self-

attention mechanisms (Vaswani et al.,2017) to

build word relations. Since all word relations are

calculated parallelly in the self-attention mecha-

nism, connecting rule is inconsistent with the base

arXiv:2210.13270v1 [cs.CL] 24 Oct 2022

Figure 2: Examples of hierarchical explanations. The prediction of MNLI example is entailment (the ﬁrst sentence

entails the second sentence.) The prediction of SST example is positive. ‘...’ represent omitted words for clear

visualization.

working algorithms of these models.

Second, unlike the toy sample in Figure 1, NLP

tasks are becoming increasingly complex, often re-

quiring the joint reasoning of different parts of the

input text (Chowdhary,2020). For example, Figure

2shows an sample from natural language interface

(NLI) task

, in which ‘has a’ and ‘available’ are

the key combinatorial semantics to make the pre-

diction. However, hierarchical explanations with

the connecting rule can not identify this composi-

tional information but only can build relations ntil

the whole sentence is regarded as one cluster.

To this end, we propose to generate hierarchi-

cal explanations without the connecting rule and

introduce a framework for generating hierarchi-

cal clusters, which produces hierarchical clusters

by recursively detecting the strongest interactions

among clusters and then merging small clusters

into bigger ones. Compared to previous methods

with connecting rules, our method can provide com-

positional semantics information of long-distance

spans. We build systems based on two classic at-

tribution methods: LOO (Lipton,2018) and LIME

(Ribeiro et al.,2016). Experimental results and

further analysis show that our method can capture

higher quality features for reﬂecting model predict-

ing than existing competitive methods.

2 Method

2.1 Generating Hierarchical Clusters

For a classiﬁcation task, let let

X= (x1, ..., xn)

denote a sample with

words, and

denotes a word

cluster containing a set of words in

. Algorithm

describes the whole procedure of hierarchical

NLI is a task requiring the model to predict whether the

premise entails the hypothesis, contradicts it, or is neutral.

clusters. With current clusters

initialized with

each

as a cluster at step 0, the algorithm will

choose two clusters from

and merge them into

one cluster in each iteration. After

n−1

steps,

all words in

will be merged as one, and

each time step can constitute the ﬁnal hierarchical

clusters.

As shown in algorithm

, to perform the merge

procedure, we need to decide which cluster to be

merged for the next step. In this work, we choose

clusters by ﬁnding the maximum interaction be-

tween clusters, which can be formulated as the

following optimization problem:

max

ci,cj∈Cφ(ci, cj|C,).(1)

φ(ci, cj|C)deﬁnes the interaction score given cur-

rent clusters C.

2.2 Detecting Cluster Interaction

Different previous work calculate interactions be-

tween two words/phrases according to model pre-

dictions, we calculate interactions between two

clusters considering the inﬂuence of one cluster

on the explanations of the other cluster. Given an

attribution algorithm

Algo

, quantiﬁed interaction

score between

and

can be calculate as follows:

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

GeneratingHierarchicalExplanationsonTextClassicationWithoutConnectingRulesYimingJu,YuanzheZhang,KangLiu,JunZhao,1NationalLaboratoryofPatternRecognition,InstituteofAutomation,CAS,Beijing,China2SchoolofArticialIntelligence,UniversityofChineseAcademyofSciences,Beijing,China{yiming.ju,yzzhang,kliu,jzh...

展开>> 收起<<

Generating Hierarchical Explanations on Text Classiﬁcation Without Connecting Rules Yiming Ju Yuanzhe Zhang Kang Liu Jun Zhao.pdf

共8页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Generating Hierarchical Explanations on Text Classiﬁcation Without Connecting Rules Yiming Ju Yuanzhe Zhang Kang Liu Jun Zhao

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: