Reparameterization of extreme value framework for improved Bayesian workflow Théo MoinsJulyan ArbelStéphane GirardAnne Dutfoy

2025-04-29 0 0 4.08MB 32 页 10玖币

侵权投诉

Reparameterization of extreme value framework

for improved Bayesian workﬂow

Théo Moins ∗Julyan Arbel ∗Stéphane Girard ∗Anne Dutfoy †

June 12, 2023

Abstract

Using Bayesian methods for extreme value analysis oﬀers an alternative to frequentist ones,

with several advantages such as easily dealing with parametric uncertainty or studying irregular

models. However, computations can be challenging and the eﬃciency of algorithms can be

altered by poor parametrization choices. The focus is on the Poisson process characterization

of univariate extremes and outline two key beneﬁts of an orthogonal parameterization. First,

Markov chain Monte Carlo convergence is improved when applied on orthogonal parameters.

This analysis relies on convergence diagnostics computed on several simulations. Second,

orthogonalization also helps deriving Jeﬀreys and penalized complexity priors, and establishing

posterior propriety thereof. The proposed framework is applied to return level estimation of

Garonne ﬂow data (France).

1 Introduction

Studying the long-term behavior of environmental variables is necessary to understand the risks

of hazardous meteorological events such as ﬂoods, storms, or droughts. To this end, models from

extreme value theory allow us to extrapolate data in the distribution tails, in order to estimate

extreme quantiles that may not have been observed (see Coles,2001, for an introduction). In

particular, a key quantity to estimate is the return level

ℓT

associated with a given period of

years, the level that is exceeded on average once every

years. Assessing the resistance of facilities

to natural disasters such as dams to ﬂoods that occur on average once every 100 years or 1 000

years is critical for companies such as Électricité de France (EDF). Moreover, characterizing the

uncertainty on the estimation of this return level is also of interest, which encourages the choice

of the Bayesian paradigm. However, performing Bayesian inference requires multiple steps that

must be managed by the user, from the choice of the model to the evaluation and validation of

computations. This has been recently formalized by Gelman et al. (2020) in the form of a Bayesian

workﬂow. After introducing models stemming from extreme value theory in Section 1.1, we brieﬂy

review in Section 1.2 one particular step of the workﬂow, reparameterization, and more speciﬁcally

the choice of an orthogonal parameterization.

∗Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.

†EDF R&D dept. Périclès, 91120 Palaiseau, France.

arXiv:2210.05224v4 [stat.ME] 9 Jun 2023

1.1 Extreme-value models

Three diﬀerent frameworks exist to model extreme events, leading to diﬀerent likelihoods: one by

block maxima, one by peaks-over-threshold, and one that uniﬁes both through a Poisson process

characterization.

Block maxima model Let

be the maximum of

i.i.d random variables with cumulative

distribution function (cdf)

. We assume that

belongs to the maximum domain of attraction of a

non-degenerate cdf

, meaning that there exist two sequences

an>

0and

such that (

Mn−bn

)

/an

converges in distribution to the cdf

. The extreme value theorem (e.g., Haan and Ferreira,2006,

Chapter 1) states that Gis necessarily a generalized extreme-value (GEV) distribution, with cdf:

G(x) = (exp −{1 + ξx}−1/ξ

+if ξ̸= 0 ,

exp(−exp(−x)) if ξ= 0,(1)

where

{x}+

max{

, x}

. Consequently, for a ﬁnite value of

, one can consider the approximation

(

Mn≤x

)

≈G

((

x−bn

)

/an

) =:

(

x|bn, an, ξ

)

and focus on the estimation of the three parameters

of the GEV distribution. Here, as the dataset is ﬁxed, the dependence in

for the location and

scale parameters will be omitted. To obtain a sample of maxima, one can divide the dataset into

blocks of size n/m and extract the maximum from each of them.

Peaks-over-threshold model Alternatively, one can consider observations that exceed a high

threshold

. Let

be a random variable with cdf

. Pickands theorem (Pickands,1975) states

that, if

belongs to the maximum domain of attraction of

with

(

Mn≤x

)

≈G

(

x|µ, σ, ξ

), then

the distribution of the exceedances

X−u|X > u

is, as

converges to the upper endpoint of

, a

generalized Pareto distribution (GPD), with cdf

H(y|˜σ, ξ) = (1−1 + ξy

˜σ−1/ξ

+if ξ̸= 0 ,

1−exp −y

˜σif ξ= 0,(2)

where the shape parameter

is the same as in

(1)

and the GPD and GEV scales are linked by

˜σ

(

u−µ

). To obtain a sample of

excesses, the peaks-over-threshold method focusses on

the

largest values of the dataset. It thus requires the estimation of the quantile of order 1

−nu/n

which can be seen as the third parameter to estimate, in addition to

˜σ

and

. The most classical

choice is to estimate this intermediate quantile by the (n−nu)th order statistic.

Poisson process characterization of extremes Finally, these two approaches can be generalized

by a third one, using a non-homogeneous Poisson process. We present here an intuitive way for

obtaining this model similarly to Coles (2001, Chapter 7), and refer to (Leadbetter et al.,1983,

Chapter 5) for theoretical details. We start by observing that, for large

(

)

≈G

(

x|µ, σ, ξ

for

in the support of

denoted by

supp

(

· | µ, σ, ξ

)) =

x∈Rs.t. 1 + ξx−µ

σ>0

. Hence,

considering a large threshold u∈supp(G(· | µ, σ, ξ)), a Taylor expansion yields

nlog F(u)≃ −n(1 −F(u)) ≃log G(u|µ, σ, ξ),

or, equivalently,

P(X > u)≃ −1

nlog G(u|µ, σ, ξ).(3)

Equation (3) can be seen as the probability of

to belong to

:= [

∞

). In the case of

i.i.d

random variables, one can deduce that the associated point process

is such that

(

)

∼ B

(

n, pn

)

with

given by Equation (3). As

n→ ∞

, the binomial distribution

(

n, pn

)converges to the

Poisson distribution

(Λ(

)), with Λ(

) =

−log G

(

u|µ, σ, ξ

). This property being valid for all

together with the independence property on non-overlapping sets imply that

converges to a

non-homogeneous Poisson process, with intensity measure Λ(

−→ N

, with

(

)

∼ P

(Λ(

)).

This model generalizes the block maxima one since

P(Mn< x) = P(Nn(Ix) = 0) →P(N(Ix) = 0) = exp(−Λ(Ix)) = G(x|µ, σ, ξ),

n→ ∞

. However, an estimation of the parameters (

µ, σ, ξ

)with this model is related to the

overall maximum

of the dataset, and it is frequent to study maxima of

smaller blocks

Mn/m

where

is typically the number of years in the observations and so

Mn/m

corresponds to annual

maxima. To do so, the intensity measure is multiplied by

, which modiﬁes the parameterization

and in particular the value of

and

:Wadsworth et al. (2010) shows that, if (

µki, σki, ξ

)(

i∈ {

2}),

are parameters for kiGEV observations, then

µk2=µk1−σk1

ξ 1−k2

k1−ξ!, σk2=σk1k2

k1−ξ

.(4)

The threshold excess model can also be derived from the point process representation, since

(

X > y

u|X > u

)

≃

−H

(

y|˜σ, ξ

), with

˜σ

(

u−µ

). Moreover, in contrast to the

peaks-over-threshold model where an intermediate quantile needs to be estimated, the Poisson model

directly includes a third location parameter µ.

In the following, we will focus mainly on this latter model, and treat the peaks-over-threshold

method as a special case in Section 4.1.

Bayesian inference Using the Bayesian paradigm in extreme value models is advantageous

in comparison to the frequentist approach, see Coles and Powell (1996) for a general review,

and Stephenson (2016)orBousquet (2021) for more recent overviews. For the Poisson process

characterization of extremes, Bayesian inference consists in ﬁxing a scaling factor

and a threshold

to get a number of

nu≥

1observations exceeding

denoted by

= (

x1, . . . , xnu

). The likelihood

of these observations can be written as

L(x, nu|µ, σ, ξ)=e−m(1+ξ(u−µ

σ))−1/ξ σ−nu

i=1 1 + ξxi−µ

σ−1−1/ξ

.(5)

A complete Bayesian model requires also the speciﬁcation of a prior

(

µ, σ, ξ

), to obtain the posterior

(

µ, σ, ξ |x, nu

)using Bayes’ theorem,

(

µ, σ, ξ |x, nu

)

∝p

(

µ, σ, ξ

)

(

x, nu|µ, σ, ξ

). This posterior

summarizes the information on the parameters after observations, and can be used to extract point

estimators, build credible intervals, or write the probability of a new observation

˜x

given data

using the posterior predictive:

p(˜x|x, nu) = Zp(˜x|θ)p(θ|x, nu)dθ,θ= (µ, σ, ξ).(6)

These quantities of interest are rarely explicit, and are often derived by sampling approaches. A recent

survey of extreme value softwares (Belzile et al.,2022) contains a Bayesian section, and a comparison

with frequentist methods. In the general Bayesian case, an overview of the Bayesian workﬂow is

given in Gelman et al. (2020), and we focus here on the particular step of reparameterization for

the likelihood

(

x, nu|µ, σ, ξ

)in the case where Markov chain Monte Carlo (MCMC) methods are

used to approximate the posterior distribution.

1.2 Reparameterization

Although the choice of parameterization of a statistical model does not alter the model per se, it

does reshape its geometry, which in turn may impact computational aspects of sampling algorithms

such as eﬃciency or accuracy. For these methods, a crucial complication for chain convergence is

parameter correlation. This notion of correlation between parameters can be associated with a

notion of asymptotic orthogonality, leading to independence of posterior components.

Parameterization and Bayesian inference It has been known for several decades that param-

eterization is crucial for good mixing of MCMC chains, especially when the correlation between the

coordinates is large. See Gilks et al. (1995, Chapter 6) for a great introduction for Gibbs sampling

and Metropolis–Hastings algorithm. More general computations are conducted by Roberts and

Sahu (1997) in the normal case, but this convergence rate is less explicit in the general case, see for

example Roberts and Polson (1994). For Metropolis–Hastings, if the structure of the kernel is not

similar to the one of the target density (which is a typical case if there is a complex dependence

between parameters), then too many candidates generated by the kernel are rejected and the same

problem as for Gibbs sampling occurs. For more recent MCMC algorithms such as Hamiltonian

Monte Carlo (HMC, Neal,2011) and its variant NUTS (Hoﬀman and Gelman,2014), Betancourt

and Girolami (2015) gives an example of the beneﬁt of reparameterization for hierarchical models.

More generally, Betancourt (2019) studies reparameterization from a geometric perspective, in order

to show its equivalence with adapted versions of HMC on Riemannian manifolds.

Due to the diﬃculty of obtaining general results on reparameterization and MCMC conver-

gence, a signiﬁcant part of the research focuses on speciﬁc models, such as hierarchical models

(Papaspiliopoulos et al.,2003;Browne et al.,2009), linear regression (Gilks et al.,1995), or mixed

models (Gelfand et al.,1995,1996).

For extreme value models, Diebolt et al. (2005) uses a continuous mixture of exponential

distributions in the GPD case. Opitz et al. (2018) also suggests to use the median instead of the

usual scale parameter to reduce correlation for Integrated Nested Laplace approximation (INLA). An

alternative Monte Carlo algorithm, the ratio-of-uniforms method, is also implemented for extreme

value models in the

revdbayes

package (Northrop,2023). The inﬂuence of parameterization is also

considered in this framework as the acceptance rate can be altered because of correlated parameters

(see Appendix C.4). Parameter transformations are also studied in order to make likelihood-based

inference suitable in the high-dimensional case in Jóhannesson et al. (2022). Finally, Belzile et al.

(2022) proposes a reparameterization trick that can be used to obtain a suitable initial value for

optimization routines.

Orthogonal parameterization As seen before, reducing dependence between coordinates is

desirable for MCMC methods. Dependence can be characterized using asymptotic covariance and

the notion of orthogonality according to Jeﬀreys (1961): parameters are said to be orthogonal when

the Fisher information is diagonal. From this deﬁnition, having orthogonal parameters leads to

asymptotic posterior independence when a Bernstein–von Mises theorem holds (e.g., Van der Vaart,

2000, Chapter 10). However, the problem of ﬁnding an orthogonal parameterization is seldom

feasible when there are more than three parameters, since the number of equations is then greater

than the number of unknown variables. In the case of three parameters, there are as many equations

as there are unknowns, but the non linear system does not necessarily lead to a solution (Huzurbazar,

1950).

The main use of orthogonal parameterization is to make parameters of interest independent of

nuisance parameters (Cox and Reid,1987). Other deﬁnitions of orthogonality are also proposed to be

more adapted to the inferential context (Tibshirani and Wasserman,1994) or to ensure consistency

of the parameter of interest (Woutersen,2011). For Bayesian inference, Tibshirani and Wasserman

(1994) compares diﬀerent deﬁnitions and suggests a strong assumption of normality for the posterior.

In the following, we keep the most popular deﬁnition of orthogonality due to Jeﬀreys (1961), as

we are not interested in properties associated with the estimation of a given parameter of interest,

but rather on the dependence structure between parameters. However, up to our knowledge, there

is no clear evidence in the literature of a direct link between parameter orthogonality and mixing

properties of the corresponding MCMC chains, such as a better convergence rate. In Section 4, we

bring some empirical evidence on the interest of orthogonality in extreme value models.

1.3 Contributions and outline

In this paper, we study the beneﬁts of reparameterization for the Poisson process characterization

of extremes in a Bayesian context. In particular, it is shown that the orthogonal parameterization

is useful for several reasons: we argue in Section 2that it improves the performance of MCMC

algorithms in terms of convergence, and we show in Section 3that it also facilitates the derivation of

priors such as Jeﬀreys and an informative variant on the shape parameter using penalized complexity

(PC) priors (Simpson et al.,2017). These results are then illustrated by experiments in Section 4,

ﬁrst on simulations to compare the diﬀerent parameterizations, and second on a real dataset of the

Garonne river ﬂow. Proofs as well as additional experiments are provided in the Appendix, and the

code corresponding to the experiments is available online.1

2 Reaching orthogonality for extreme Poisson process

An attempt to reparametrize the Poisson process for extremes in order to improve MCMC convergence

already exists in the literature (Sharkey and Tawn,2017), but has several limitations that are

detailed here. Instead, we suggest to use the fully orthogonal parameterization of Chavez-Demoulin

and Davison (2005).

Near-orthogonality with hyperparameter tuning Based on the relationship between pa-

rameters given in Equation (4), Sharkey and Tawn (2017) suggests to change the scaling factor

before using Metropolis–Hastings algorithm in order to optimize MCMC convergence. To this aim,

they minimize the non-diagonal elements of the inverse Fisher information matrix corresponding to

asymptotic covariances and then retrieved the parameters corresponding to the initial number of

blocks from Equation (4). As the calculations cannot be achieved explicitly, the authors found em-

pirically that the values

and

that cancel respectively the asymptotic covariances

ACov

(

µ, σ

)

and

ACov

(

σ, ξ

)are such that any

m∈

[

m1, m2

]improves the MCMC convergence. Approximations

1https://github.com/TheoMoins/ExtremesPyMC

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

ReparameterizationofextremevalueframeworkforimprovedBayesianworkflowThéoMoins∗JulyanArbel∗StéphaneGirard∗AnneDutfoy†June12,2023AbstractUsingBayesianmethodsforextremevalueanalysisoffersanalternativetofrequentistones,withseveraladvantagessuchaseasilydealingwithparametricuncertaintyorstudyingirregularm...

展开>> 收起<<

Reparameterization of extreme value framework for improved Bayesian workflow Théo MoinsJulyan ArbelStéphane GirardAnne Dutfoy.pdf

共32页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Reparameterization of extreme value framework for improved Bayesian workflow Théo MoinsJulyan ArbelStéphane GirardAnne Dutfoy

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: