
D={1, . . . , d}. For every subset of input XA= (Xi)i∈A,A∈ P(D), the Sobol’ indices are defined as
follows:
SA=PB⊂A(−1)|A|−|B|V(E[G(X)|XB])
V(G(X)) (1)
where | · | denotes the number of elements in a subset. If the inputs are assumed to be independent,
thanks to the FANOVA decomposition, Sobol’ indices lead to a well-defined allocation of an output’s
variance share (i.e., SA) to every subset of inputs A∈ P(D). In this case, the variance’s shares
(SA)A∈P(D)sum up to one while being nonnegative. As the indices can be interpreted as proportions
of the output variance, they allow to determine which inputs of a numerical model contribute the
most to the variability of the output, or, on the contrary, to identify the ones that are not influential,
and possibly which inputs interact with each other. Therefore, Sobol’ indices can be directly used to
answer to the factor fixing and factor prioritization settings (ii. and iii.).
However, in many applications, some inputs may have a statistical dependence structure, either
initially imposed in their probabilistic modeling [27] or induced by physical constraints upon the input
or the output space [26,29]. In these cases, estimating and interpreting Sobol’ indices is not trivial
as shown by many different analyses and interpretations proposed in the past (see [22] or [6] for an
overview of this topic). In order to circumvent this issue, [31] proposed a new approach based on the
“Shapley value” [36], a solution concept developed in cooperative game theory and powerfully used
in economic modeling. It consists in distributing both gains and costs to several players working in
coalition in an egalitarian way, ensuring that each player gains as much (or more) as they would have
from playing individually. Therefore, based on Shapley values and Sobol’ indices, [31] proposed the
so-called “Shapley effects” as new GSA indices in the context of dependent inputs. The underlying
idea is to compute, similarly to a game involving coalition of players, the value assigned to a coalition
of inputs XAas the explanatory power of a part of output variance. This value corresponds to the
so-called “closed Sobol’ indices” defined as:
Sclos
A=V(E[G(X)|XA])
V(G(X)) .(2)
In the GSA context, the two main properties and advantages of the Shapley effects are the following:
firstly, they cannot be negative; secondly, their sum is equal to one, even in the dependent inputs’
case since they allow to bypass the intricate issue of variance decomposition [32,22]. Let us remark
that these two properties correspond to the two main desirability criteria for importance measures of
linear regression models as reviewed in [14]. Moreover, the egalitarian principle driving the allocation
rule states that, in the independent inputs’ case, an interaction effect is equally apportioned to each
input involved in the interaction. Finally, several works have studied the Shapley effects estimation
issues. Such estimates can be obtained via several techniques such as Monte Carlo-based algorithms
[39], k-nearest neighbors [3] or M¨obius inverses [33].
In [22], the Shapley effects have been claimed to be used for the factor fixing setting since an effect
close to zero means that the input has no significant contribution to the variance of the output (neither
by its interactions nor by its possible dependencies with other inputs). However, another phenomenon,
observed by [22] and known as the “Shapley’s joke” [18], proves that the factor fixing setting cannot
be fully achieved with Shapley effects: an exogenous variable (i.e., which is not explicitly included in
the structural equations of the model) can be granted a non-negligible share of the output variance,
as soon as it is sufficiently correlated with endogenous inputs. This means that Shapley effects do
not respect the so-called “exclusion property” defined for the importance measures of linear regression
models [24,14]. This exclusion property states that, if an input’s linear regression coefficient equals
zero, then its importance measure should be zero too.
In the context of statistical learning, if Gis a linear regression model, an analogy can be made
between the Sobol’ indices and the squared value of the standardized regression coefficients (denoted
by SRC2). Moreover, the Shapley effects correspond to the so-called “LMG measure” (named after
the authors’ names, Lindeman-Merenda-Gold, see [28,4]), which partitions the explained variance
2