
Why Should I Choose You? A PREPRINT
also suggests grouping XAI solutions according to the type of explanation produced. They list: feature summary,
model internals, data point, surrogate intrinsically interpretable model, rule sets, explanations in natural language, and
question-answering. Later, Liao et al. [2020] suggests that XAI explanations answer specific questions about data, its
processing and results in ML. They map existing XAI solutions to questions and create an XAI question bank that
supports the design of user-centered XAI applications. Overton [2011] defines an explanation as an explanan: the
answer to the question and an explanandum: what is to be explained. These two elements provide a user-friendly
characterization of explanations and thus allow the user to specify which explanation is more adapted.
The diversity of existing XAI solutions makes it hard to find an XAI solution adapted to one’s needs. Moreover, as
the XAI field is growing, more and more XAI solutions proposed in the literature are producing similar kinds of
explanations. Hence, it has become necessary to objectively compare XAI solutions by assessing the effectiveness of
their explanations. In this direction, the recent literature has focused on quantitative XAI evaluations Nauta et al. [2022].
2.2 Evaluation of XAI solutions
Doshi-Velez and Kim [2017] distinguishes three strategies of evaluation: application-grounded evaluation, human-
grounded evaluation, and functionality-grounded evaluation that does not imply human intervention. Application-
grounded evaluation tests the effectiveness of explanations in a real-world application with domain experts and
human-grounded evaluation are carried out with lay humans. While explanations are intended for humans, functionality-
grounded evaluations are interesting because of their objectivity. Thus, this type of evaluation is inexpensive, fast and
can lead to a formal comparison of explanation methods Zhou et al. [2021].
Since the notion of "good explanations" is not trivial, some quality properties have been proposed by Robnik-Šikonja
and Bohanec [2018]. These are man-made criteria that attest to the quality of the explanations. Functionality-grounded
evaluation metrics are constructed to calculate scores to measure how well a property is met.
Nauta et al. [2022] focuses on the functionality-grounded evaluation and proposed the Co-12 Explanation Quality
Properties to unify the diverse properties proposed in the literature. They reviewed most XAI evaluation metrics and
associate each of them with properties. Examples of their properties that will be studied in this paper are as follow:
Continuity describes how continuous and generalizable the explanation function is, Correctness describes how faithful
the explanation is w.r.t. the black box, Compactness describes the size of the explanation, and Completeness describes
how much of the black box behavior is described in the explanation.
In practice, XAI evaluation metrics produce scores for properties of interest, making it possible to compare and choose
an XAI solution. However, the data scientists still have to find the desired XAI solutions and their corresponding XAI
evaluation metrics. This issue could be addressed with strategies that have been studied in context-aware recommender
systems.
2.3 Context-aware recommender systems
Recommender systems filter information to present the most relevant elements to a user. To the best of our knowledge,
there is no recommender system for XAI solutions. To recommend adapted XAI solutions, one should consider the
whole context of the data scientist. According to Adomavicius et al. [2011], context-aware recommender systems offer
more relevant recommendations by adapting them to the user’s situation. They also state that context may be integrated
during three phases: contextual prefiltering which selects a subset of possible candidates before the recommendation,
contextual modeling which uses context in the recommendation process, and contextual postfiltering which adjusts
the recommendation afterward. These three phases require formally defining the elements of the context, which is
one of our objectives for the framework we propose in this paper. While recommending an adapted XAI solution is
a first interesting step, the data scientist eventually wants a reliable explanation, i.e. an explanation that verifies the
properties of interest. To achieve this, a possible approach is to use previously detailed XAI evaluation metrics to
optimize hyperparameters of adapted XAI solutions. For this kind of approach, many strategies have been proposed in
the AutoML domain.
2.4 AutoML
Designing ML algorithms is an iterative task of testing and modifying both the architecture and the hyperparameters of
the algorithm. It is a repetitive task that requires a lot of time. For this reason, a part of the research has focused on
automating the design of ML algorithms, namely AutoML He et al. [2021]. AutoML frameworks look for the best
performing ML pipeline treatment to solve a task on a given dataset. According to He et al. [2021], AutoML consists of
several processes: data preparation, feature engineering, model generation, and model evaluation. They divide the model
generation process into two steps: search space and optimization methods. The first step defines the design principles of
3