Painting the black box white experimental findings from applying XAI to an ECG reading setting

2025-05-06 0 0 1.7MB 15 页 10玖币
侵权投诉
Painting the black box white: experimental findings
from applying XAI to an ECG reading setting
Federico Cabitza1,2,*,Matteo Cameli3,Andrea Campagner1,Chiara Natali1and
Luca Ronzio4
1Department of Computer Science, Systems and Communication, University of Milano-Bicocca, Milan, Italy
2IRCCS Istituto Ortopedico Galeazzi, Milan, Italy
3Department of Medicine, Surgery and Neuroscience, University of Siena, Siena, Italy
4Department of Medicine and Surgery, University of Milano-Bicocca, Milan, Italy
Abstract
The shift from symbolic AI systems to black-box, sub-symbolic, and statistical ones has motivated a
rapid increase in the interest toward explainable AI (XAI), i.e. approaches to make black-box AI systems
explainable to human decision makers with the aim of making these systems more acceptable and more
usable tools and supports. However, we make the point that, rather than always making black boxes
transparent, these approaches are at risk of painting the black boxes white, thus failing to provide a level
of transparency that would increase the system’s usability and comprehensibility; or, even, at risk of
generating new errors, in what we termed the white-box paradox. To address these usability-related issues,
in this work we focus on the cognitive dimension of users’ perception of explanations and XAI systems.
To this aim, we designed and conducted a questionnaire-based experiment by which we involved 44
cardiology residents and specialists in an AI-supported ECG reading task. In doing so, we investigated
dierent research questions concerning the relationship between users’ characteristics (e.g. expertise)
and their perception of AI and XAI systems, including their trust, the perceived explanations’ quality
and their tendency to defer the decision process to automation (i.e. technology dominance), as well as
the mutual relationships among these dierent dimensions. Our ndings provide a contribution to the
evaluation of AI-based support systems from a Human-AI interaction-oriented perspective and lay the
ground for further investigation of XAI and its eects on decision making and user experience.
Keywords
Explainable AI, Decision Support Systems, ECG, Articial Intelligence, XAI
1. Introduction
We are witnessing a continuous and indeed accelerating move from decision support systems
that are based on explicit rules conceived by domain experts (so called expert systems or
knowledge-based systems) to systems whose behaviors can be traced back to an innumerable
amount of rules that have been automatically learnt on the basis of correlative and statistical
analyses of large quantities of data: this is the shift from symbolic AI systems to sub-symbolic
ones, which has made the black-box nature of these latter systems an object of a lively and
widespread debate in both technological and philosophical contexts [
1
]. The main assumption
*Corresponding author.
"federico.cabitza@unimib.it (F. Cabitza)
©2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
http://ceur-ws.org
ISSN 1613-0073
CEUR Workshop Proceedings (CEUR-WS.org)
arXiv:2210.15236v1 [cs.AI] 27 Oct 2022
motivating this debate is that making subsymbolic systems explainable to human decision
makers makes them better and more acceptable tools and supports.
This assumption is widely accepted [
2
,
3
,
4
], although there are a few scattered voices against
it (see e.g. [
5
,
6
,
7
,
8
]): for instance, explanations were found to increase complacency towards
the machine advice [
9
], increase automation bias [
10
,
11
] as well as groundlessly increase
condence in one’s own decision [
12
,
13
]. Understanding or participating to this debate, which
characterizes the scientic community that recognizes itself in the expression “explainable AI”
and in the acronym “XAI”, is dicult for the seemingly disarming heterogeneity of denitions
of explanation, and the variety of characteristics that are associated with “good explanations”,
or of the systems that generate them [14].
In what follows, we adopt the simplifying approach recently proposed in [
14
], where expla-
nation is dened as the meta output (that is an output that describes, enriches or complements,
another main output) of an XAI systems. In this perspective, good explanations are those that
make the XAI system more usable, and therefore a useful support. The reference to usability
suggests that we can assess explanations (and explainability) on dierent levels, by addressing
complementary questions, such as: do explanations make the socio-technical, decision-making
setting more eective, in that they help decision makers commit fewer errors? Do they make it
more ecient, by making decisions easier and faster, or just requiring fewer resources? And
lastly, but not least, do they make users more satised with the advice received, possibly because
they have understood it more, and this made them more condent about their nal say?
While some studies [
15
] have already considered the psychometric dimension of user satisfac-
tion (see, e.g., the concept of causability [
16
], related to the role of explanations in making advice
more understandable from a causal point of view), here we would like to focus on eectiveness
(i.e., accuracy) and other cognitive dimensions (than understandability), both in regard to the
support (e.g., trust and utility) and the explanations received. In fact, explanations can be either
clear or ambiguous (cf. comprehensibility); either tautological and placebic [
17
] or instructive
(cf. informativeness); either pertinent or o-topic (cf. pertinence); and, as obvious as it may
seem, either correct or wrong, as any AI output can be: therefore, otherwise good explanations
(that is persuasive, reassuring, comprehensible, etc.) could even mislead their target users: this
is the so called white-box paradox, which we have already begun investigating in previous
empirical studies [
18
,
19
]. Thus, investigating if and how much users nd explanations “good”
(and in the next section we will make this term operationally clear) can be related to focusing
on the possible determinants of machine inuence (i.e., also called dominance), automation
bias and other negative eects related to the output of decision support systems on decision
performance and practices.
2. Methods
To investigate how human decision makers perceive explanations, we designed and conducted
a questionnaire-based experiment in which we involved 44 cardiology of varying expertise and
competence (namely, 25 residents and 19 specialists) from the Medicine School of the University
Hospital of Siena (Italy), in an AI-supported ECG reading task, not connected to their daily
care. The readers were invited to classify and annotate 20 ECG cases, previously selected by a
cardiologist from a random set of cases extracted from the ECG Wave-Maven repository
1
on the
basis of their complexity (recorded in the above repository), so as to have a balanced dataset in
terms of case type and diculty. The study participants had to provide their diagnoses both with
and without the support of a simulated AI system, according to an asynchronous Wizard-of-Oz
protocol [
20
]: the support of the AI system included both a proposed diagnosis and a textual
explanation to back the former one. The experiment was performed by means of a web-based
questionnaire set up through the LimeSurvey platform (version 3.23), to which the readers had
been individually invited by personal email.
The ECG readers were randomly divided in two groups, which were equivalent for expertise
and were supposed to interact with the AI system dierently (see Fig. 1); in doing so, we could
comparatively evaluate potential dierences between a human-rst and an AI-rst conguration.
In both groups, the rst question of the questionnaire asked the readers to self-assess their
trust in AI-based diagnostic support systems for ECG reading. The same question was also
repeated at the end of the questionnaire, to evaluate potential dierences in trust caused by the
interaction with the AI system.
Figure 1:
A BPMN representation of the study design. Information collected are represented as data
objects, coming from collection tasks, whose name is denoted with the name of the main actor. Aer
the initial collection of the perceived “trust in AI”, the subprocess is repeated for each ECG case, where
HD1, AI, HD2, XAI and FHD items are collected, together with comprehensibility, appropriateness and
utility. Finally, a post-test “trust in AI” is collected again.
For each ECG case, the readers in the human-rst group were rst shown the trace of the ECG
together with a brief case description, and then they had to provide an initial diagnosis (in free
text format). After that this diagnosis had been recorded, these respondents were then shown
the diagnosis proposed by the AI; after having considered this latter advice, the respondents
could revise their initial diagnosis; then they were shown the textual explanation (motivating
the AI advice) and asked to provide their nal diagnosis in light of this additional information. In
contrast, the participants enrolled in the AI-rst group were shown the AI-proposed diagnosis
together with the ECG trace and case description; only afterwards, they were asked to provide
their own diagnosis in light of this advice only. Finally, ECG readers were shown the textual
1https://ecg.bidmc.harvard.edu/maven/mavenmain.asp
摘要:

Paintingtheblackboxwhite:experimentalfindingsfromapplyingXAItoanECGreadingsettingFedericoCabitza1,2,*,MatteoCameli3,AndreaCampagner1,ChiaraNatali1andLucaRonzio41DepartmentofComputerScience,SystemsandCommunication,UniversityofMilano-Bicocca,Milan,Italy2IRCCSIstitutoOrtopedicoGaleazzi,Milan,Italy3Depa...

展开>> 收起<<
Painting the black box white experimental findings from applying XAI to an ECG reading setting.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:1.7MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注