Proportional marginal effects for global sensitivity analysis

2025-05-02 0 0 2.25MB 28 页 10玖币
侵权投诉
Proportional marginal effects for global sensitivity analysis
Margot Herina, Marouane Il Idrissib,c,d, Vincent Chabridonb,c, Bertrand Ioossb,c,d,e
aSorbonne Universit´e, Laboratoire d’Informatique de Paris 6, 4 place Jussieu, 75005 Paris, France.
bEDF Lab Chatou, 6 Quai Watier, 78401 Chatou, France
cSINCLAIR AI Lab., Saclay, France
dInstitut de Math´ematiques de Toulouse, 31062 Toulouse, France
eCorresponding Author - Email: bertrand.iooss@edf.fr
Abstract
Performing (variance-based) global sensitivity analysis (GSA) with dependent inputs has recently
benefited from cooperative game theory concepts. By using this theory, despite the potential correlation
between the inputs, meaningful sensitivity indices can be defined via allocation shares of the model
output’s variance to each input. The “Shapley effects”, i.e., the Shapley values transposed to variance-
based GSA problems, allowed for this suitable solution. However, these indices exhibit a particular
behavior that can be undesirable: an exogenous input (i.e., which is not explicitly included in the
structural equations of the model) can be associated with a strictly positive index when it is correlated
to endogenous inputs. In the present work, the use of a different allocation, called the “proportional
values” is investigated. A first contribution is to propose an extension of this allocation, suitable for
variance-based GSA. Novel GSA indices are then proposed, called the “proportional marginal effects”
(PME). The notion of exogeneity is formally defined in the context of variance-based GSA, and it is
shown that the PME allow the distinction of exogenous variables, even when they are correlated to
endogenous inputs. Moreover, their behavior is compared to the Shapley effects on analytical toy-cases
and more realistic use-cases.
Keywords: Cooperative game theory, Dependence, Proportional values, Sobol’ indices, Shapley
effects.
1. Introduction
When using phenomenological numerical models in science and engineering, the uncertainty quan-
tification (UQ) process allows to consider and better quantify the various sources of uncertainties,
most often by the way of probabilistic modeling [13]. Global sensitivity analysis (GSA) is a key step of
this process, aiming to understand the effects of each uncertain model input (or set of inputs) on the
quantity of interest related to one (or more) output variable of interest obtained from the numerical
model [35,23]. From a practical viewpoint, GSA aims at investigating four major settings [6]: (i.)
model exploration, i.e., investigating the input-output relationship; (ii.) factor fixing, i.e., identifying
non-influential inputs; (iii.) factor prioritization, i.e., quantifying the most important inputs using
quantitative importance measures; (iv.) robustness analysis, i.e., quantifying the sensitivity of the
quantity of interest with respect to probabilistic model uncertainty of the input distributions. In the
present paper, one will more focus on the first three settings, without discussing much the fourth one.
Among a large panel of GSA indices, the variance-based sensitivity measures, also called “Sobol’
indices” [38], are derived from the functional analysis of variance (FANOVA) decomposition [7] between
all the independent inputs. Thus, these indices enable to provide interpretable answers to some of the
previously mentioned GSA settings. Let Y=G(X) denotes the input-output relationship under
study, with G(·) : RdRa deterministic (often black-box) numerical model, Ya scalar output
and X= (X1, . . . , Xd) a vector of dscalar inputs. Moreover, let P(D) the set of all subsets of
Preprint submitted to Elsevier October 25, 2022
arXiv:2210.13065v1 [math.ST] 24 Oct 2022
D={1, . . . , d}. For every subset of input XA= (Xi)iA,A∈ P(D), the Sobol’ indices are defined as
follows:
SA=PBA(1)|A|−|B|V(E[G(X)|XB])
V(G(X)) (1)
where | · | denotes the number of elements in a subset. If the inputs are assumed to be independent,
thanks to the FANOVA decomposition, Sobol’ indices lead to a well-defined allocation of an output’s
variance share (i.e., SA) to every subset of inputs A∈ P(D). In this case, the variance’s shares
(SA)A∈P(D)sum up to one while being nonnegative. As the indices can be interpreted as proportions
of the output variance, they allow to determine which inputs of a numerical model contribute the
most to the variability of the output, or, on the contrary, to identify the ones that are not influential,
and possibly which inputs interact with each other. Therefore, Sobol’ indices can be directly used to
answer to the factor fixing and factor prioritization settings (ii. and iii.).
However, in many applications, some inputs may have a statistical dependence structure, either
initially imposed in their probabilistic modeling [27] or induced by physical constraints upon the input
or the output space [26,29]. In these cases, estimating and interpreting Sobol’ indices is not trivial
as shown by many different analyses and interpretations proposed in the past (see [22] or [6] for an
overview of this topic). In order to circumvent this issue, [31] proposed a new approach based on the
“Shapley value” [36], a solution concept developed in cooperative game theory and powerfully used
in economic modeling. It consists in distributing both gains and costs to several players working in
coalition in an egalitarian way, ensuring that each player gains as much (or more) as they would have
from playing individually. Therefore, based on Shapley values and Sobol’ indices, [31] proposed the
so-called “Shapley effects” as new GSA indices in the context of dependent inputs. The underlying
idea is to compute, similarly to a game involving coalition of players, the value assigned to a coalition
of inputs XAas the explanatory power of a part of output variance. This value corresponds to the
so-called “closed Sobol’ indices” defined as:
Sclos
A=V(E[G(X)|XA])
V(G(X)) .(2)
In the GSA context, the two main properties and advantages of the Shapley effects are the following:
firstly, they cannot be negative; secondly, their sum is equal to one, even in the dependent inputs’
case since they allow to bypass the intricate issue of variance decomposition [32,22]. Let us remark
that these two properties correspond to the two main desirability criteria for importance measures of
linear regression models as reviewed in [14]. Moreover, the egalitarian principle driving the allocation
rule states that, in the independent inputs’ case, an interaction effect is equally apportioned to each
input involved in the interaction. Finally, several works have studied the Shapley effects estimation
issues. Such estimates can be obtained via several techniques such as Monte Carlo-based algorithms
[39], k-nearest neighbors [3] or M¨obius inverses [33].
In [22], the Shapley effects have been claimed to be used for the factor fixing setting since an effect
close to zero means that the input has no significant contribution to the variance of the output (neither
by its interactions nor by its possible dependencies with other inputs). However, another phenomenon,
observed by [22] and known as the “Shapley’s joke” [18], proves that the factor fixing setting cannot
be fully achieved with Shapley effects: an exogenous variable (i.e., which is not explicitly included in
the structural equations of the model) can be granted a non-negligible share of the output variance,
as soon as it is sufficiently correlated with endogenous inputs. This means that Shapley effects do
not respect the so-called “exclusion property” defined for the importance measures of linear regression
models [24,14]. This exclusion property states that, if an input’s linear regression coefficient equals
zero, then its importance measure should be zero too.
In the context of statistical learning, if Gis a linear regression model, an analogy can be made
between the Sobol’ indices and the squared value of the standardized regression coefficients (denoted
by SRC2). Moreover, the Shapley effects correspond to the so-called “LMG measure” (named after
the authors’ names, Lindeman-Merenda-Gold, see [28,4]), which partitions the explained variance
2
percentage R2in the same way that is done by the Shapley-based allocation rule. A weighted analog
of LMG, called proportional marginal variance decomposition (PMVD), has been proposed by [9] in
order to respect the exclusion property. It is based on the proportional value allocation rule coming
from cooperative game theory. Its usefulness in relation to LMG has been described in details in [14,15]
and illustrated more recently in [20,21]. In addition to the exclusion property, a more discriminating
power between the influential inputs than the one obtained with the Shapley effects is also shown.
Therefore, the PMVD is a good tool (in the linear regression context) to address the factor fixing
setting.
In this paper, inspired on the one hand, by the work achieved in the linear regression context leading
to the PMVD, and on the other hand, by the Shapley effects, we build and propose a set of novel
sensitivity indices respecting the exclusion property and not restricted to the linear model case. To
do so, the proportional marginal effects (PME) are introduced by using a new variance decomposition,
based on the proportional values concept [30,9], which encompasses the ability to detect exogenous
variables. For the sake of clarity, Table 1provides a first preliminary analogy to emphasize which
category of problem one tries to address in the present paper.
R2decomposition (linear regression) V(Y) decomposition (GSA)
SRC2Sobol’ indices
LMG Shapley effects
PMVD PME (proposed indices)
Table 1: Analogy between linear regression importance measures (R2decomposition) and variance-based GSA.
The rest of this paper is organized as follows. Section 2focuses on the interaction between GSA and
cooperative game theory and the existing literature. The Shapley effects are recalled, as well as their
main shortcoming: the inability to detect exogenous inputs. To that end, the notion of L2-exogeneity
is formally defined. Then, Section 3defines the proportional values and presents the main result of this
paper, an extension allowing for well-defined novel GSA indices: the PME. It is additionally shown
that these novel indices allow to detect exogenous inputs, while remaining inherently interpretable.
Section 4illustrates the behavior of the novel PME by using analytical formulas obtained for analytical
forms of G. Section 5briefly recalls several strategies for the estimation of PME and provides the
results obtained on several more challenging numerical test-cases. Section 6discusses several possible
improvements as well as some perspectives about the proposed work. A few appendices provide extra
materials such as information about reproducibility of numerical results (Appendix Appendix A) and
proofs (Appendix Appendix B).
Throughout this paper, let E[·] and V(·) denote the expectation and variance respectively. A
coalition of players is a subset of the grand coalition denoted D={1, . . . , d}. Moreover, AD, the
restricted set of indices A\ {i}, for any iA, is denoted by Ai. Additionally, for any AD,XD\A
is denoted by XA. The distribution of the random inputs Xis generically denoted by PXand the
marginal distribution of any subset of inputs XAfor any ADis generically denoted by PXA. The
spaces L2(PXA), for any AD, denote the spaces of measurable functions with finite second-order
moments. When a function is referred to as being nonnegative (resp. positive), it entails that it takes
values in R+(resp. R+
). Whenever reference is made to a model G, it is always implicitly assumed
that GL2(PX). In this paper, almost sure statements are followed by the acronym “a.s.”.
2. Cooperative game theory for variance-based global sensitivity analysis
This section aims at reviewing the usefulness of cooperative game theory in the process of designing
variance-based GSA indices. A particular class of allocations is presented: the random order model
allocations, which contains the Shapley values. The Sobol’ cooperative games are introduced, as a
formalization of the analogy between players and inputs of deterministic models. The Shapley effects
3
are presented as the application of Shapley values to a Sobol’ cooperative game. The notion of dual of
a cooperative game is also presented, and an analogy is drawn between backward-forward procedures
and the random order model allocations. Finally, a specific Shapley effects’ drawback (for factor fixing
setting) is presented as a motivation for the proposed work: their inability to detect exogenous inputs.
2.1. Analogy between allocation and variance-based GSA indices
A cooperative game is a tuple (D, v) where D={1, . . . , d}is a set of dplayers and v:P(D)R
is the value function, i.e., an application that maps a value to every possible coalition of players.
Usually, vis assumed to be monotonically increasing, meaning that, for any two sets Tand Asuch
that TA∈ P(D), one has v(T)v(A). In other words, the value of a coalition Acannot be
lower than the value of a sub-coalition TA. In the following, cooperative games with monotonically
increasing value functions are referred to as “monotonic cooperative games”. Moreover, if the value
function vtakes values in R+
(resp. in R+), the corresponding cooperative game is referred to as
“positive (resp. nonnegative) cooperative game“.
The analogy between the players Dof a cooperative game (D, v) and the inputs (Xi)iDinvolved
in a numerical model has been first used in [31]. The author proposed to use, as a value function, the
closed Sobol’ indices recalled in Eq. (2), allowing to define the Sobol’ cooperative games.
Definition 1 (Sobol’ cooperative game).Let X= (X1, . . . , Xd)>be random inputs, let GL2(PX)
be a model and denote Y=G(X)the random output. A Sobol’ cooperative game is the cooperative
game with value function Sclos defined as follows:
Sclos :P(D)R+
A7→ Sclos
A=V(E[Y|XA])
V(Y).
The Sobol’ cooperative game thus refers to the nonnegative, monotonic cooperative game (D, Sclos).
By analogy with the cooperative game theory paradigm, the choice of Sclos as a value function
entails measuring the value of every subset of players ADas the variance of the best approximation
of Yon L2(PXA), i.e., V(E[Y|XA]).
One of the key aspects of cooperative games is the notion of allocation. In general, allocations
can be understood as a decomposition of the quantity v(D) in delements, each one being allocated
to a specific player. When it comes to Sobol’ cooperative games, it translates to assigning a share of
the output’s variance V(Y) to each input in the model, with limited assumptions on the probabilistic
structure between the inputs (in particular, no independence is assumed between the inputs). Formally,
an allocation can be understood as a mapping φthat associates, to a cooperative game (D, v), a real-
valued vector (φ1, . . . , φd)>Rd.
The Shapley values, are a particular example of allocations. For any cooperative game (D, v), it is
uniquely characterized as the allocation φ(D, v)verifying a set of four distinct axioms:
1. Efficiency:Pd
i=1 φi=v(D);
2. Symmetry:i, j Dwith i6=j, if v(A∪ {i}) = v(A∪ {j}) for all A∈ P(D), then φi=φj;
3. Null player:iD, if v(A∪ {i}) = v(A) for all A∈ P(D), then φi= 0;
4. Additivity: If two cooperative games (D, v) and (D, v0) have Shapley values φand φ0respec-
tively, then the cooperative game (D, v +v0) has Shapley values φj+φ0
jfor jD.
For any cooperative game (D, v), its Shapley values can be expressed analytically, for any iD, as:
Shapi(D, v)=1
dX
ADid1
|A|1
[v(A∪ {i})v(A)] .(3)
This original formulation attributed to [36] can be interpreted as a weighted average, over every possible
coalition A, of the contribution of a player ito that coalition A. This contribution is quantified by the
4
quantity v(A{i})v(A), often called “marginal contribution” of the player ito the coalition Ain the
literature. The weighting scheme can be understood as the proportion of permutations (or orderings)
of Dsuch that iappears after the players in A. While this interpretation can be hard to understand,
defining the Shapley values in terms of players permutations allows for a better understanding of its
underlying sharing mechanism, as it is done in the following.
A particular class of allocations, known as random order models [40,10], allows to define allocations
based on orderings of players, instead of reasoning in terms of coalitions as in Eq. (3). Let SDbe
symmetric group on D(the set of all permutations of D). Let π= (π1, . . . , πd)∈ SDbe a particular
permutation, and for any iD, denote π(i) = π1
iits inverse (i.e., the position of iin π, such that
ππ(i)=i). Then, one can define the following set of players, for any i∈ {0, . . . , d}:
Ci(π) = {πj:ji}.(4)
Ci(π) is the set of the i-th first players in the ordering π, with the convention that, for any permutation,
C0(π) = {∅}. As an illustration, let D={1,2,3}, and let π= (2,1,3) ∈ SD. Then,
π(1) = 2, π(2) = 1,and π(3) = 3.
Moreover,
Cπ(1)(π) = C2(π) = {1,2}, Cπ(2)(π) = C1(π) = {2}, Cπ(3)(π) = C3(π) = {1,2,3}
As their names suggest, random order models endow SDwith a probabilistic structure. For any
game (D, v), the set of random order models allocations (or probabilistic allocations) contains every
allocation φ(D, v)that can be written, for any iD, as:
φi=X
π∈SD
p(π)vCπ(i)(π)vCπ(i)1(π)
=EπpvCπ(i)(π)vCπ(i)1(π)
where pis a probability mass function over the orderings of D. For a player i, its random order
allocation can be interpreted as the expectation over the permutations πof Dwith respect to p, of the
marginal contributions of ito the coalitions formed by Cπ(i)1(π). The random order model allocations
are always efficient and, when dealing with monotonic games, positive (i.e., φi0 for any iD)
[40]. The Shapley values, in particular, can be expressed as a random order model allocation, under
the particular choice of pas a discrete uniform distribution over SD, which echoes Eq. (3):
Shapi(D, v)=1
d!X
π∈SDvCπ(i)(π)vCπ(i)1(π).(5)
Random order models allow to apprehend allocations dynamically (see Section 2.2), meaning that
coalitions are formed regarding orderings, as opposed to the pure coalition point of view displayed
in Eq. (3). In this setting, Shapley values can then be understood as a maximum entropy a priori
(i.e., uniform over SD) about this dynamic. In the light of this equivalent expression, L. S. Shapley
himself interpreted the Shapley values as “[...] an a priori assessment of the situation, based on either
ignorance or disregard of the social organization of the players” [37].
When it comes to GSA, the Shapley values of the Sobol’ cooperative game (D, Sclos) associated
to a numerical model Y=G(X1, . . . , Xd) allow to define the so-called Shapley effects [31]. For any
iD, they can be written as:
Shi:= Shapi(D, Sclos)(6a)
=1
dX
ADid1
|A|1Sclos
A∪{i})Sclos
A(6b)
=1
d!X
π∈SDhSclos
Cπ(i)(π)Sclos
Cπ(i)1(π)i.(6c)
5
摘要:

Proportionalmarginale ectsforglobalsensitivityanalysisMargotHerina,MarouaneIlIdrissib,c,d,VincentChabridonb,c,BertrandIoossb,c,d,eaSorbonneUniversite,Laboratoired'InformatiquedeParis6,4placeJussieu,75005Paris,France.bEDFLabChatou,6QuaiWatier,78401Chatou,FrancecSINCLAIRAILab.,Saclay,FrancedInstitutd...

展开>> 收起<<
Proportional marginal effects for global sensitivity analysis.pdf

共28页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:28 页 大小:2.25MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 28
客服
关注