
2 Cardoso, L. et al.
The question “Can the decision made by a black-box model be trusted for
a context-sensitive problem?” has been asked not only by the scientific commu-
nity, but also by the society as a whole. For example, in 2018 the General Data
Protection Regulation was implemented in the European Union. It is geared at
securing anyone the right to an explanation as to why an intelligent system made
a given decision [20]. In this sense, for a continuous advance in AI applications,
the entire community is faced with the barrier of model explainability [9,11]. To
address this issue, a new field of study is growing rapidly: Explained Artificial
Intelligence (XAI). Developed by AI and Human Computer Interaction (HCI)
researchers, XAI is a user-centric field of study aimed at developing techniques
to make the functioning of these systems and models more transparentand con-
sequently more reliable [2]. Recent research shows that the trust calibration on
the models’ decision is very important, since exaggerated or measured confidence
can lead to critical problems depending on the context [19].
The models that have high success rates to solve real-world problems are
usually of the black-box type. In other words, they are not easily explained and,
therefore, applying XAI techniques is required so that they can be explained and
then interpreted by the end user[9,2]. The emergence of XAI techniques based
on different methodologies is a real fact today, but there are still many gaps
in literatute. For example, XAI methods based on Explanation-by-Example in
a model-agnostic fashion 6are still underexplored by the scientific community
[8,10,18]. Techniques based on Explanation-by-Example use previously known
ou model-generated data instances to explain them, thus providing for a good
understanding of this model and decisions thereof. This is a technique that may
be natural for human beings, since humans seek to explain certain decisions they
themselves make based on previously known examples and experiences [2].
This research explores a new measure of XAI based on the working principles
of Item Response Theory (IRT), which is commonly used in psychometric tests
to assess the performance of individuals on a set of items (e.g., questions) with
different levels of difficulty [3]. To this end, the IRT was adapted for Machine
Learning (ML) evaluation, treating classifiers as individuals and test instances
as items [16]. In previous works [16,5] IRT was used to evaluate ML models and
datasets for classification problems. By applying IRT concepts, the authors were
able to provide new information about the data and the performance of the mod-
els in order to grant more robustness to the preexisting evaluation techniques.
In addition, the IRT’s main feature is to explore the individual’s performance on
a specific item and then compute the information about the individual’s ability
and item complexity in order to explain why a respondent got an item right or
wrong. Thus, it is understood that IRT can be used as a means to comprehend
the relationship between the performance of a model and the data, thus helping
in explaining models and understanding the model’s predictions at a local level.
Given the intrinsic characteristics of the IRT, it is understood that it can be
fitted within the universe of techniques based on Explanation-by-Example. At
the same time, the IRT also has concepts that allow to explain and interpret
6Model-Agnostic: it does not depend on the type of model to be explained [18].