Proportional marginal effects for global sensitivity analysis

2025-05-02 0 0 2.25MB 28 页 10玖币

侵权投诉

Proportional marginal eﬀects for global sensitivity analysis

Margot Herina, Marouane Il Idrissib,c,d, Vincent Chabridonb,c, Bertrand Ioossb,c,d,e

aSorbonne Universit´e, Laboratoire d’Informatique de Paris 6, 4 place Jussieu, 75005 Paris, France.

bEDF Lab Chatou, 6 Quai Watier, 78401 Chatou, France

cSINCLAIR AI Lab., Saclay, France

dInstitut de Math´ematiques de Toulouse, 31062 Toulouse, France

eCorresponding Author - Email: bertrand.iooss@edf.fr

Abstract

Performing (variance-based) global sensitivity analysis (GSA) with dependent inputs has recently

beneﬁted from cooperative game theory concepts. By using this theory, despite the potential correlation

between the inputs, meaningful sensitivity indices can be deﬁned via allocation shares of the model

output’s variance to each input. The “Shapley eﬀects”, i.e., the Shapley values transposed to variance-

based GSA problems, allowed for this suitable solution. However, these indices exhibit a particular

behavior that can be undesirable: an exogenous input (i.e., which is not explicitly included in the

structural equations of the model) can be associated with a strictly positive index when it is correlated

to endogenous inputs. In the present work, the use of a diﬀerent allocation, called the “proportional

values” is investigated. A ﬁrst contribution is to propose an extension of this allocation, suitable for

variance-based GSA. Novel GSA indices are then proposed, called the “proportional marginal eﬀects”

(PME). The notion of exogeneity is formally deﬁned in the context of variance-based GSA, and it is

shown that the PME allow the distinction of exogenous variables, even when they are correlated to

endogenous inputs. Moreover, their behavior is compared to the Shapley eﬀects on analytical toy-cases

and more realistic use-cases.

Keywords: Cooperative game theory, Dependence, Proportional values, Sobol’ indices, Shapley

eﬀects.

1. Introduction

When using phenomenological numerical models in science and engineering, the uncertainty quan-

tiﬁcation (UQ) process allows to consider and better quantify the various sources of uncertainties,

most often by the way of probabilistic modeling [13]. Global sensitivity analysis (GSA) is a key step of

this process, aiming to understand the eﬀects of each uncertain model input (or set of inputs) on the

quantity of interest related to one (or more) output variable of interest obtained from the numerical

model [35,23]. From a practical viewpoint, GSA aims at investigating four major settings [6]: (i.)

model exploration, i.e., investigating the input-output relationship; (ii.) factor ﬁxing, i.e., identifying

non-inﬂuential inputs; (iii.) factor prioritization, i.e., quantifying the most important inputs using

quantitative importance measures; (iv.) robustness analysis, i.e., quantifying the sensitivity of the

quantity of interest with respect to probabilistic model uncertainty of the input distributions. In the

present paper, one will more focus on the ﬁrst three settings, without discussing much the fourth one.

Among a large panel of GSA indices, the variance-based sensitivity measures, also called “Sobol’

indices” [38], are derived from the functional analysis of variance (FANOVA) decomposition [7] between

all the independent inputs. Thus, these indices enable to provide interpretable answers to some of the

previously mentioned GSA settings. Let Y=G(X) denotes the input-output relationship under

study, with G(·) : Rd−→ Ra deterministic (often black-box) numerical model, Ya scalar output

and X= (X1, . . . , Xd) a vector of dscalar inputs. Moreover, let P(D) the set of all subsets of

Preprint submitted to Elsevier October 25, 2022

arXiv:2210.13065v1 [math.ST] 24 Oct 2022

D={1, . . . , d}. For every subset of input XA= (Xi)i∈A,A∈ P(D), the Sobol’ indices are deﬁned as

follows:

SA=PB⊂A(−1)|A|−|B|V(E[G(X)|XB])

V(G(X)) (1)

where | · | denotes the number of elements in a subset. If the inputs are assumed to be independent,

thanks to the FANOVA decomposition, Sobol’ indices lead to a well-deﬁned allocation of an output’s

variance share (i.e., SA) to every subset of inputs A∈ P(D). In this case, the variance’s shares

(SA)A∈P(D)sum up to one while being nonnegative. As the indices can be interpreted as proportions

of the output variance, they allow to determine which inputs of a numerical model contribute the

most to the variability of the output, or, on the contrary, to identify the ones that are not inﬂuential,

and possibly which inputs interact with each other. Therefore, Sobol’ indices can be directly used to

answer to the factor ﬁxing and factor prioritization settings (ii. and iii.).

However, in many applications, some inputs may have a statistical dependence structure, either

initially imposed in their probabilistic modeling [27] or induced by physical constraints upon the input

or the output space [26,29]. In these cases, estimating and interpreting Sobol’ indices is not trivial

as shown by many diﬀerent analyses and interpretations proposed in the past (see [22] or [6] for an

overview of this topic). In order to circumvent this issue, [31] proposed a new approach based on the

“Shapley value” [36], a solution concept developed in cooperative game theory and powerfully used

in economic modeling. It consists in distributing both gains and costs to several players working in

coalition in an egalitarian way, ensuring that each player gains as much (or more) as they would have

from playing individually. Therefore, based on Shapley values and Sobol’ indices, [31] proposed the

so-called “Shapley eﬀects” as new GSA indices in the context of dependent inputs. The underlying

idea is to compute, similarly to a game involving coalition of players, the value assigned to a coalition

of inputs XAas the explanatory power of a part of output variance. This value corresponds to the

so-called “closed Sobol’ indices” deﬁned as:

Sclos

A=V(E[G(X)|XA])

V(G(X)) .(2)

In the GSA context, the two main properties and advantages of the Shapley eﬀects are the following:

ﬁrstly, they cannot be negative; secondly, their sum is equal to one, even in the dependent inputs’

case since they allow to bypass the intricate issue of variance decomposition [32,22]. Let us remark

that these two properties correspond to the two main desirability criteria for importance measures of

linear regression models as reviewed in [14]. Moreover, the egalitarian principle driving the allocation

rule states that, in the independent inputs’ case, an interaction eﬀect is equally apportioned to each

input involved in the interaction. Finally, several works have studied the Shapley eﬀects estimation

issues. Such estimates can be obtained via several techniques such as Monte Carlo-based algorithms

[39], k-nearest neighbors [3] or M¨obius inverses [33].

In [22], the Shapley eﬀects have been claimed to be used for the factor ﬁxing setting since an eﬀect

close to zero means that the input has no signiﬁcant contribution to the variance of the output (neither

by its interactions nor by its possible dependencies with other inputs). However, another phenomenon,

observed by [22] and known as the “Shapley’s joke” [18], proves that the factor ﬁxing setting cannot

be fully achieved with Shapley eﬀects: an exogenous variable (i.e., which is not explicitly included in

the structural equations of the model) can be granted a non-negligible share of the output variance,

as soon as it is suﬃciently correlated with endogenous inputs. This means that Shapley eﬀects do

not respect the so-called “exclusion property” deﬁned for the importance measures of linear regression

models [24,14]. This exclusion property states that, if an input’s linear regression coeﬃcient equals

zero, then its importance measure should be zero too.

In the context of statistical learning, if Gis a linear regression model, an analogy can be made

between the Sobol’ indices and the squared value of the standardized regression coeﬃcients (denoted

by SRC2). Moreover, the Shapley eﬀects correspond to the so-called “LMG measure” (named after

the authors’ names, Lindeman-Merenda-Gold, see [28,4]), which partitions the explained variance

percentage R2in the same way that is done by the Shapley-based allocation rule. A weighted analog

of LMG, called proportional marginal variance decomposition (PMVD), has been proposed by [9] in

order to respect the exclusion property. It is based on the proportional value allocation rule coming

from cooperative game theory. Its usefulness in relation to LMG has been described in details in [14,15]

and illustrated more recently in [20,21]. In addition to the exclusion property, a more discriminating

power between the inﬂuential inputs than the one obtained with the Shapley eﬀects is also shown.

Therefore, the PMVD is a good tool (in the linear regression context) to address the factor ﬁxing

setting.

In this paper, inspired on the one hand, by the work achieved in the linear regression context leading

to the PMVD, and on the other hand, by the Shapley eﬀects, we build and propose a set of novel

sensitivity indices respecting the exclusion property and not restricted to the linear model case. To

do so, the proportional marginal eﬀects (PME) are introduced by using a new variance decomposition,

based on the proportional values concept [30,9], which encompasses the ability to detect exogenous

variables. For the sake of clarity, Table 1provides a ﬁrst preliminary analogy to emphasize which

category of problem one tries to address in the present paper.

R2decomposition (linear regression) V(Y) decomposition (GSA)

SRC2Sobol’ indices

LMG Shapley eﬀects

PMVD PME (proposed indices)

Table 1: Analogy between linear regression importance measures (R2decomposition) and variance-based GSA.

The rest of this paper is organized as follows. Section 2focuses on the interaction between GSA and

cooperative game theory and the existing literature. The Shapley eﬀects are recalled, as well as their

main shortcoming: the inability to detect exogenous inputs. To that end, the notion of L2-exogeneity

is formally deﬁned. Then, Section 3deﬁnes the proportional values and presents the main result of this

paper, an extension allowing for well-deﬁned novel GSA indices: the PME. It is additionally shown

that these novel indices allow to detect exogenous inputs, while remaining inherently interpretable.

Section 4illustrates the behavior of the novel PME by using analytical formulas obtained for analytical

forms of G. Section 5brieﬂy recalls several strategies for the estimation of PME and provides the

results obtained on several more challenging numerical test-cases. Section 6discusses several possible

improvements as well as some perspectives about the proposed work. A few appendices provide extra

materials such as information about reproducibility of numerical results (Appendix Appendix A) and

proofs (Appendix Appendix B).

Throughout this paper, let E[·] and V(·) denote the expectation and variance respectively. A

coalition of players is a subset of the grand coalition denoted D={1, . . . , d}. Moreover, ∀A⊆D, the

restricted set of indices A\ {i}, for any i∈A, is denoted by A−i. Additionally, for any A⊆D,XD\A

is denoted by XA. The distribution of the random inputs Xis generically denoted by PXand the

marginal distribution of any subset of inputs XAfor any A⊆Dis generically denoted by PXA. The

spaces L2(PXA), for any A⊆D, denote the spaces of measurable functions with ﬁnite second-order

moments. When a function is referred to as being nonnegative (resp. positive), it entails that it takes

values in R+(resp. R+

∗). Whenever reference is made to a model G, it is always implicitly assumed

that G∈L2(PX). In this paper, almost sure statements are followed by the acronym “a.s.”.

2. Cooperative game theory for variance-based global sensitivity analysis

This section aims at reviewing the usefulness of cooperative game theory in the process of designing

variance-based GSA indices. A particular class of allocations is presented: the random order model

allocations, which contains the Shapley values. The Sobol’ cooperative games are introduced, as a

formalization of the analogy between players and inputs of deterministic models. The Shapley eﬀects

are presented as the application of Shapley values to a Sobol’ cooperative game. The notion of dual of

a cooperative game is also presented, and an analogy is drawn between backward-forward procedures

and the random order model allocations. Finally, a speciﬁc Shapley eﬀects’ drawback (for factor ﬁxing

setting) is presented as a motivation for the proposed work: their inability to detect exogenous inputs.

2.1. Analogy between allocation and variance-based GSA indices

A cooperative game is a tuple (D, v) where D={1, . . . , d}is a set of dplayers and v:P(D)→R

is the value function, i.e., an application that maps a value to every possible coalition of players.

Usually, vis assumed to be monotonically increasing, meaning that, for any two sets Tand Asuch

that T⊆A∈ P(D), one has v(T)≤v(A). In other words, the value of a coalition Acannot be

lower than the value of a sub-coalition T⊆A. In the following, cooperative games with monotonically

increasing value functions are referred to as “monotonic cooperative games”. Moreover, if the value

function vtakes values in R+

∗(resp. in R+), the corresponding cooperative game is referred to as

“positive (resp. nonnegative) cooperative game“.

The analogy between the players Dof a cooperative game (D, v) and the inputs (Xi)i∈Dinvolved

in a numerical model has been ﬁrst used in [31]. The author proposed to use, as a value function, the

closed Sobol’ indices recalled in Eq. (2), allowing to deﬁne the Sobol’ cooperative games.

Deﬁnition 1 (Sobol’ cooperative game).Let X= (X1, . . . , Xd)>be random inputs, let G∈L2(PX)

be a model and denote Y=G(X)the random output. A Sobol’ cooperative game is the cooperative

game with value function Sclos deﬁned as follows:

Sclos :P(D)→R+

A7→ Sclos

A=V(E[Y|XA])

V(Y).

The Sobol’ cooperative game thus refers to the nonnegative, monotonic cooperative game (D, Sclos).

By analogy with the cooperative game theory paradigm, the choice of Sclos as a value function

entails measuring the value of every subset of players A⊆Das the variance of the best approximation

of Yon L2(PXA), i.e., V(E[Y|XA]).

One of the key aspects of cooperative games is the notion of allocation. In general, allocations

can be understood as a decomposition of the quantity v(D) in delements, each one being allocated

to a speciﬁc player. When it comes to Sobol’ cooperative games, it translates to assigning a share of

the output’s variance V(Y) to each input in the model, with limited assumptions on the probabilistic

structure between the inputs (in particular, no independence is assumed between the inputs). Formally,

an allocation can be understood as a mapping φthat associates, to a cooperative game (D, v), a real-

valued vector (φ1, . . . , φd)>∈Rd.

The Shapley values, are a particular example of allocations. For any cooperative game (D, v), it is

uniquely characterized as the allocation φ(D, v)verifying a set of four distinct axioms:

1. Eﬃciency:Pd

i=1 φi=v(D);

2. Symmetry:∀i, j ∈Dwith i6=j, if v(A∪ {i}) = v(A∪ {j}) for all A∈ P(D), then φi=φj;

3. Null player:∀i∈D, if v(A∪ {i}) = v(A) for all A∈ P(D), then φi= 0;

4. Additivity: If two cooperative games (D, v) and (D, v0) have Shapley values φand φ0respec-

tively, then the cooperative game (D, v +v0) has Shapley values φj+φ0

jfor j∈D.

For any cooperative game (D, v), its Shapley values can be expressed analytically, for any i∈D, as:

Shapi(D, v)=1

A⊆D−id−1

|A|−1

[v(A∪ {i})−v(A)] .(3)

This original formulation attributed to [36] can be interpreted as a weighted average, over every possible

coalition A, of the contribution of a player ito that coalition A. This contribution is quantiﬁed by the

quantity v(A∪{i})−v(A), often called “marginal contribution” of the player ito the coalition Ain the

literature. The weighting scheme can be understood as the proportion of permutations (or orderings)

of Dsuch that iappears after the players in A. While this interpretation can be hard to understand,

deﬁning the Shapley values in terms of players permutations allows for a better understanding of its

underlying sharing mechanism, as it is done in the following.

A particular class of allocations, known as random order models [40,10], allows to deﬁne allocations

based on orderings of players, instead of reasoning in terms of coalitions as in Eq. (3). Let SDbe

symmetric group on D(the set of all permutations of D). Let π= (π1, . . . , πd)∈ SDbe a particular

permutation, and for any i∈D, denote π(i) = π−1

iits inverse (i.e., the position of iin π, such that

ππ(i)=i). Then, one can deﬁne the following set of players, for any i∈ {0, . . . , d}:

Ci(π) = {πj:j≤i}.(4)

Ci(π) is the set of the i-th ﬁrst players in the ordering π, with the convention that, for any permutation,

C0(π) = {∅}. As an illustration, let D={1,2,3}, and let π= (2,1,3) ∈ SD. Then,

π(1) = 2, π(2) = 1,and π(3) = 3.

Moreover,

Cπ(1)(π) = C2(π) = {1,2}, Cπ(2)(π) = C1(π) = {2}, Cπ(3)(π) = C3(π) = {1,2,3}

As their names suggest, random order models endow SDwith a probabilistic structure. For any

game (D, v), the set of random order models allocations (or probabilistic allocations) contains every

allocation φ(D, v)that can be written, for any i∈D, as:

φi=X

π∈SD

p(π)vCπ(i)(π)−vCπ(i)−1(π)

=Eπ∼pvCπ(i)(π)−vCπ(i)−1(π)

where pis a probability mass function over the orderings of D. For a player i, its random order

allocation can be interpreted as the expectation over the permutations πof Dwith respect to p, of the

marginal contributions of ito the coalitions formed by Cπ(i)−1(π). The random order model allocations

are always eﬃcient and, when dealing with monotonic games, positive (i.e., φi≥0 for any i∈D)

[40]. The Shapley values, in particular, can be expressed as a random order model allocation, under

the particular choice of pas a discrete uniform distribution over SD, which echoes Eq. (3):

Shapi(D, v)=1

d!X

π∈SDvCπ(i)(π)−vCπ(i)−1(π).(5)

Random order models allow to apprehend allocations dynamically (see Section 2.2), meaning that

coalitions are formed regarding orderings, as opposed to the pure coalition point of view displayed

in Eq. (3). In this setting, Shapley values can then be understood as a maximum entropy a priori

(i.e., uniform over SD) about this dynamic. In the light of this equivalent expression, L. S. Shapley

himself interpreted the Shapley values as “[...] an a priori assessment of the situation, based on either

ignorance or disregard of the social organization of the players” [37].

When it comes to GSA, the Shapley values of the Sobol’ cooperative game (D, Sclos) associated

to a numerical model Y=G(X1, . . . , Xd) allow to deﬁne the so-called Shapley eﬀects [31]. For any

i∈D, they can be written as:

Shi:= Shapi(D, Sclos)(6a)

A⊆D−id−1

|A|−1Sclos

A∪{i})−Sclos

A(6b)

d!X

π∈SDhSclos

Cπ(i)(π)−Sclos

Cπ(i)−1(π)i.(6c)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ProportionalmarginaleectsforglobalsensitivityanalysisMargotHerina,MarouaneIlIdrissib,c,d,VincentChabridonb,c,BertrandIoossb,c,d,eaSorbonneUniversite,Laboratoired'InformatiquedeParis6,4placeJussieu,75005Paris,France.bEDFLabChatou,6QuaiWatier,78401Chatou,FrancecSINCLAIRAILab.,Saclay,FrancedInstitutd...

展开>> 收起<<

Proportional marginal effects for global sensitivity analysis.pdf

共28页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Proportional marginal effects for global sensitivity analysis

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: