Generating Hierarchical Explanations on Text Classification Without Connecting Rules Yiming Ju Yuanzhe Zhang Kang Liu Jun Zhao

2025-05-06 0 0 2.76MB 8 页 10玖币
侵权投诉
Generating Hierarchical Explanations on Text Classification
Without Connecting Rules
Yiming Ju, Yuanzhe Zhang, Kang Liu, Jun Zhao,
1National Laboratory of Pattern Recognition, Institute of Automation, CAS, Beijing, China
2School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
{yiming.ju, yzzhang, kliu, jzhao}@nlpr.ia.ac.cn
Abstract
The opaqueness of deep NLP models has mo-
tivated the development of methods for inter-
preting how deep models predict. Recently,
work has introduced hierarchical attribution,
which produces a hierarchical clustering of
words, along with an attribution score for each
cluster. However, existing work on hierarchi-
cal attribution all follows the connecting rule,
limiting the cluster to a continuous span in the
input text. We argue that the connecting rule
as an additional prior may undermine the abil-
ity to reflect the model decision process faith-
fully. To this end, we propose to generate hi-
erarchical explanations without the connecting
rule and introduce a framework for generating
hierarchical clusters. Experimental results and
further analysis show the effectiveness of the
proposed method in providing high-quality ex-
planations for reflecting model predicting pro-
cess.
1 Introduction
The opaqueness of deep natural language process-
ing (NLP) models has grown in tandem with their
power (Doshi-Velez and Kim,2017), which has
motivated efforts to interpret how these black-box
models work (Sundararajan et al.,2017;Belinkov
and Glass,2019). Post-hoc explanation aims to
explain a trained model and reveal how the model
arrives at a decision (Jacovi and Goldberg,2020;
Molnar,2020). In NLP, this goal is usually ap-
proached with attribution method, which assesses
the influence of inputs on model predictions.
Prior lines of work on post-hoc explanation usu-
ally focus on generating word-level or phrase-level
attribution for deep NLP models. Recently, work
has introduced the new idea of hierarchical attribu-
tion (Singh et al.,2018;Jin et al.,2019;Chen et al.,
2020). As shown in Figure 1, hierarchical attribu-
tion produces a hierarchical clustering of words,
and provides attribution scores for each clusters.
By providing compositional semantics information,
hierarchical attribution can give users a better un-
derstanding of the model decision-making process.
Since the attribution score of each cluster in hierar-
chical attribution is calculated separately, the key
point of generating hierarchical attribution is how
to get word clusters, which should be informative
enough to capture meaningful feature interactions
while displaying a sufficiently small subset of all
feature groups to maintain simplicity (Singh et al.,
2018).
Figure 1: An example of hierarchical attribution.
Existing work has proposed various algorithms
to generate hierarchical clusters. For example,
Singh et al. (2018) use CD score (Murdoch et al.,
2018) as a joining metric in the agglomerative clus-
tering procedure; Chen et al. (2020) recursively
divides large text spans into smaller ones by detect-
ing feature interaction. However, previous work
requires only adjacent clusters to be grouped as a
new cluster, which we denote as
the connecting
rule
. With the connecting rule, generated clusters
will always be continuous text spans in the input
text. While consistent with human reading habits,
we argue that the connecting rule as an additional
prior may undermine the ability to faithfully re-
flect the model decision process. The concerns are
summarized as follows:
First, modern NLP models such as BERT (De-
vlin et al.,2019) and GPT (Radford et al.,2018,
2019) are almost all transformer-based, using self-
attention mechanisms (Vaswani et al.,2017) to
build word relations. Since all word relations are
calculated parallelly in the self-attention mecha-
nism, connecting rule is inconsistent with the base
arXiv:2210.13270v1 [cs.CL] 24 Oct 2022
Figure 2: Examples of hierarchical explanations. The prediction of MNLI example is entailment (the first sentence
entails the second sentence.) The prediction of SST example is positive. ‘... represent omitted words for clear
visualization.
working algorithms of these models.
Second, unlike the toy sample in Figure 1, NLP
tasks are becoming increasingly complex, often re-
quiring the joint reasoning of different parts of the
input text (Chowdhary,2020). For example, Figure
2shows an sample from natural language interface
(NLI) task
1
, in which ‘has a’ and ‘available’ are
the key combinatorial semantics to make the pre-
diction. However, hierarchical explanations with
the connecting rule can not identify this composi-
tional information but only can build relations ntil
the whole sentence is regarded as one cluster.
To this end, we propose to generate hierarchi-
cal explanations without the connecting rule and
introduce a framework for generating hierarchi-
cal clusters, which produces hierarchical clusters
by recursively detecting the strongest interactions
among clusters and then merging small clusters
into bigger ones. Compared to previous methods
with connecting rules, our method can provide com-
positional semantics information of long-distance
spans. We build systems based on two classic at-
tribution methods: LOO (Lipton,2018) and LIME
(Ribeiro et al.,2016). Experimental results and
further analysis show that our method can capture
higher quality features for reflecting model predict-
ing than existing competitive methods.
2 Method
2.1 Generating Hierarchical Clusters
For a classification task, let let
X= (x1, ..., xn)
denote a sample with
n
words, and
c
denotes a word
cluster containing a set of words in
X
. Algorithm
??
describes the whole procedure of hierarchical
1
NLI is a task requiring the model to predict whether the
premise entails the hypothesis, contradicts it, or is neutral.
clusters. With current clusters
Ct
initialized with
each
x
as a cluster at step 0, the algorithm will
choose two clusters from
Ct
and merge them into
one cluster in each iteration. After
n1
steps,
all words in
X
will be merged as one, and
Ct
in
each time step can constitute the final hierarchical
clusters.
As shown in algorithm
??
, to perform the merge
procedure, we need to decide which cluster to be
merged for the next step. In this work, we choose
clusters by finding the maximum interaction be-
tween clusters, which can be formulated as the
following optimization problem:
max
ci,cjCφ(ci, cj|C,).(1)
φ(ci, cj|C)defines the interaction score given cur-
rent clusters C.
2.2 Detecting Cluster Interaction
Different previous work calculate interactions be-
tween two words/phrases according to model pre-
dictions, we calculate interactions between two
clusters considering the influence of one cluster
on the explanations of the other cluster. Given an
attribution algorithm
Algo
, quantified interaction
score between
ci
and
cj
can be calculate as follows:
摘要:

GeneratingHierarchicalExplanationsonTextClassicationWithoutConnectingRulesYimingJu,YuanzheZhang,KangLiu,JunZhao,1NationalLaboratoryofPatternRecognition,InstituteofAutomation,CAS,Beijing,China2SchoolofArticialIntelligence,UniversityofChineseAcademyofSciences,Beijing,China{yiming.ju,yzzhang,kliu,jzh...

展开>> 收起<<
Generating Hierarchical Explanations on Text Classification Without Connecting Rules Yiming Ju Yuanzhe Zhang Kang Liu Jun Zhao.pdf

共8页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:8 页 大小:2.76MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 8
客服
关注