Reparameterization of extreme value framework for improved Bayesian workflow Théo MoinsJulyan ArbelStéphane GirardAnne Dutfoy

2025-04-29 0 0 4.08MB 32 页 10玖币
侵权投诉
Reparameterization of extreme value framework
for improved Bayesian workflow
Théo Moins Julyan Arbel Stéphane Girard Anne Dutfoy
June 12, 2023
Abstract
Using Bayesian methods for extreme value analysis offers an alternative to frequentist ones,
with several advantages such as easily dealing with parametric uncertainty or studying irregular
models. However, computations can be challenging and the efficiency of algorithms can be
altered by poor parametrization choices. The focus is on the Poisson process characterization
of univariate extremes and outline two key benefits of an orthogonal parameterization. First,
Markov chain Monte Carlo convergence is improved when applied on orthogonal parameters.
This analysis relies on convergence diagnostics computed on several simulations. Second,
orthogonalization also helps deriving Jeffreys and penalized complexity priors, and establishing
posterior propriety thereof. The proposed framework is applied to return level estimation of
Garonne flow data (France).
1 Introduction
Studying the long-term behavior of environmental variables is necessary to understand the risks
of hazardous meteorological events such as floods, storms, or droughts. To this end, models from
extreme value theory allow us to extrapolate data in the distribution tails, in order to estimate
extreme quantiles that may not have been observed (see Coles,2001, for an introduction). In
particular, a key quantity to estimate is the return level
T
associated with a given period of
T
years, the level that is exceeded on average once every
T
years. Assessing the resistance of facilities
to natural disasters such as dams to floods that occur on average once every 100 years or 1 000
years is critical for companies such as Électricité de France (EDF). Moreover, characterizing the
uncertainty on the estimation of this return level is also of interest, which encourages the choice
of the Bayesian paradigm. However, performing Bayesian inference requires multiple steps that
must be managed by the user, from the choice of the model to the evaluation and validation of
computations. This has been recently formalized by Gelman et al. (2020) in the form of a Bayesian
workflow. After introducing models stemming from extreme value theory in Section 1.1, we briefly
review in Section 1.2 one particular step of the workflow, reparameterization, and more specifically
the choice of an orthogonal parameterization.
Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.
EDF R&D dept. Périclès, 91120 Palaiseau, France.
1
arXiv:2210.05224v4 [stat.ME] 9 Jun 2023
1.1 Extreme-value models
Three different frameworks exist to model extreme events, leading to different likelihoods: one by
block maxima, one by peaks-over-threshold, and one that unifies both through a Poisson process
characterization.
Block maxima model Let
Mn
be the maximum of
n
i.i.d random variables with cumulative
distribution function (cdf)
F
. We assume that
F
belongs to the maximum domain of attraction of a
non-degenerate cdf
G
, meaning that there exist two sequences
an>
0and
bn
such that (
Mnbn
)
/an
converges in distribution to the cdf
G
. The extreme value theorem (e.g., Haan and Ferreira,2006,
Chapter 1) states that Gis necessarily a generalized extreme-value (GEV) distribution, with cdf:
G(x) = (exp {1 + ξx}1
+if ξ̸= 0 ,
exp(exp(x)) if ξ= 0,(1)
where
{x}+
=
max{
0
, x}
. Consequently, for a finite value of
n
, one can consider the approximation
P
(
Mnx
)
G
((
xbn
)
/an
) =:
G
(
x|bn, an, ξ
)
,
and focus on the estimation of the three parameters
of the GEV distribution. Here, as the dataset is fixed, the dependence in
n
for the location and
scale parameters will be omitted. To obtain a sample of maxima, one can divide the dataset into
m
blocks of size n/m and extract the maximum from each of them.
Peaks-over-threshold model Alternatively, one can consider observations that exceed a high
threshold
u
. Let
X
be a random variable with cdf
F
. Pickands theorem (Pickands,1975) states
that, if
F
belongs to the maximum domain of attraction of
G
with
P
(
Mnx
)
G
(
x|µ, σ, ξ
), then
the distribution of the exceedances
Xu|X > u
is, as
u
converges to the upper endpoint of
F
, a
generalized Pareto distribution (GPD), with cdf
H(y|˜σ, ξ) = (11 + ξy
˜σ1
+if ξ̸= 0 ,
1exp y
˜σif ξ= 0,(2)
where the shape parameter
ξ
is the same as in
(1)
and the GPD and GEV scales are linked by
˜σ
=
σ
+
ξ
(
uµ
). To obtain a sample of
nu
excesses, the peaks-over-threshold method focusses on
the
nu
largest values of the dataset. It thus requires the estimation of the quantile of order 1
nu/n
,
which can be seen as the third parameter to estimate, in addition to
˜σ
and
ξ
. The most classical
choice is to estimate this intermediate quantile by the (nnu)th order statistic.
Poisson process characterization of extremes Finally, these two approaches can be generalized
by a third one, using a non-homogeneous Poisson process. We present here an intuitive way for
obtaining this model similarly to Coles (2001, Chapter 7), and refer to (Leadbetter et al.,1983,
Chapter 5) for theoretical details. We start by observing that, for large
n
,
Fn
(
x
)
G
(
x|µ, σ, ξ
),
for
x
in the support of
G
denoted by
supp
(
G
(
· | µ, σ, ξ
)) =
xRs.t. 1 + ξxµ
σ>0
. Hence,
considering a large threshold usupp(G(· | µ, σ, ξ)), a Taylor expansion yields
nlog F(u)≃ −n(1 F(u)) log G(u|µ, σ, ξ),
or, equivalently,
P(X > u)≃ −1
nlog G(u|µ, σ, ξ).(3)
2
Equation (3) can be seen as the probability of
X
to belong to
Iu
:= [
u,
+
). In the case of
n
i.i.d
random variables, one can deduce that the associated point process
Nn
is such that
Nn
(
Iu
)
∼ B
(
n, pn
)
with
pn
given by Equation (3). As
n→ ∞
, the binomial distribution
B
(
n, pn
)converges to the
Poisson distribution
P
(Λ(
Iu
)), with Λ(
Iu
) =
log G
(
u|µ, σ, ξ
). This property being valid for all
Iu
together with the independence property on non-overlapping sets imply that
Nn
converges to a
non-homogeneous Poisson process, with intensity measure Λ(
Iu
):
Nn
d
N
, with
N
(
Iu
)
∼ P
(Λ(
Iu
)).
This model generalizes the block maxima one since
P(Mn< x) = P(Nn(Ix) = 0) P(N(Ix) = 0) = exp(Λ(Ix)) = G(x|µ, σ, ξ),
as
n→ ∞
. However, an estimation of the parameters (
µ, σ, ξ
)with this model is related to the
overall maximum
Mn
of the dataset, and it is frequent to study maxima of
m
smaller blocks
Mn/m
,
where
m
is typically the number of years in the observations and so
Mn/m
corresponds to annual
maxima. To do so, the intensity measure is multiplied by
m
, which modifies the parameterization
and in particular the value of
µ
and
σ
:Wadsworth et al. (2010) shows that, if (
µki, σki, ξ
)(
i∈ {
1
,
2}),
are parameters for kiGEV observations, then
µk2=µk1σk1
ξ 1k2
k1ξ!, σk2=σk1k2
k1ξ
.(4)
The threshold excess model can also be derived from the point process representation, since
P
(
X > y
+
u|X > u
)
1
H
(
y|˜σ, ξ
), with
˜σ
=
σ
+
ξ
(
uµ
). Moreover, in contrast to the
peaks-over-threshold model where an intermediate quantile needs to be estimated, the Poisson model
directly includes a third location parameter µ.
In the following, we will focus mainly on this latter model, and treat the peaks-over-threshold
method as a special case in Section 4.1.
Bayesian inference Using the Bayesian paradigm in extreme value models is advantageous
in comparison to the frequentist approach, see Coles and Powell (1996) for a general review,
and Stephenson (2016)orBousquet (2021) for more recent overviews. For the Poisson process
characterization of extremes, Bayesian inference consists in fixing a scaling factor
m
and a threshold
u
to get a number of
nu
1observations exceeding
u
denoted by
x
= (
x1, . . . , xnu
). The likelihood
of these observations can be written as
L(x, nu|µ, σ, ξ)=em(1+ξ(uµ
σ))1σnu
nu
Y
i=1 1 + ξxiµ
σ11
.(5)
A complete Bayesian model requires also the specification of a prior
p
(
µ, σ, ξ
), to obtain the posterior
p
(
µ, σ, ξ |x, nu
)using Bayes’ theorem,
p
(
µ, σ, ξ |x, nu
)
p
(
µ, σ, ξ
)
L
(
x, nu|µ, σ, ξ
). This posterior
summarizes the information on the parameters after observations, and can be used to extract point
estimators, build credible intervals, or write the probability of a new observation
˜x
given data
x
using the posterior predictive:
p(˜x|x, nu) = Zp(˜x|θ)p(θ|x, nu)dθ,θ= (µ, σ, ξ).(6)
These quantities of interest are rarely explicit, and are often derived by sampling approaches. A recent
survey of extreme value softwares (Belzile et al.,2022) contains a Bayesian section, and a comparison
3
with frequentist methods. In the general Bayesian case, an overview of the Bayesian workflow is
given in Gelman et al. (2020), and we focus here on the particular step of reparameterization for
the likelihood
L
(
x, nu|µ, σ, ξ
)in the case where Markov chain Monte Carlo (MCMC) methods are
used to approximate the posterior distribution.
1.2 Reparameterization
Although the choice of parameterization of a statistical model does not alter the model per se, it
does reshape its geometry, which in turn may impact computational aspects of sampling algorithms
such as efficiency or accuracy. For these methods, a crucial complication for chain convergence is
parameter correlation. This notion of correlation between parameters can be associated with a
notion of asymptotic orthogonality, leading to independence of posterior components.
Parameterization and Bayesian inference It has been known for several decades that param-
eterization is crucial for good mixing of MCMC chains, especially when the correlation between the
coordinates is large. See Gilks et al. (1995, Chapter 6) for a great introduction for Gibbs sampling
and Metropolis–Hastings algorithm. More general computations are conducted by Roberts and
Sahu (1997) in the normal case, but this convergence rate is less explicit in the general case, see for
example Roberts and Polson (1994). For Metropolis–Hastings, if the structure of the kernel is not
similar to the one of the target density (which is a typical case if there is a complex dependence
between parameters), then too many candidates generated by the kernel are rejected and the same
problem as for Gibbs sampling occurs. For more recent MCMC algorithms such as Hamiltonian
Monte Carlo (HMC, Neal,2011) and its variant NUTS (Hoffman and Gelman,2014), Betancourt
and Girolami (2015) gives an example of the benefit of reparameterization for hierarchical models.
More generally, Betancourt (2019) studies reparameterization from a geometric perspective, in order
to show its equivalence with adapted versions of HMC on Riemannian manifolds.
Due to the difficulty of obtaining general results on reparameterization and MCMC conver-
gence, a significant part of the research focuses on specific models, such as hierarchical models
(Papaspiliopoulos et al.,2003;Browne et al.,2009), linear regression (Gilks et al.,1995), or mixed
models (Gelfand et al.,1995,1996).
For extreme value models, Diebolt et al. (2005) uses a continuous mixture of exponential
distributions in the GPD case. Opitz et al. (2018) also suggests to use the median instead of the
usual scale parameter to reduce correlation for Integrated Nested Laplace approximation (INLA). An
alternative Monte Carlo algorithm, the ratio-of-uniforms method, is also implemented for extreme
value models in the
revdbayes
package (Northrop,2023). The influence of parameterization is also
considered in this framework as the acceptance rate can be altered because of correlated parameters
(see Appendix C.4). Parameter transformations are also studied in order to make likelihood-based
inference suitable in the high-dimensional case in Jóhannesson et al. (2022). Finally, Belzile et al.
(2022) proposes a reparameterization trick that can be used to obtain a suitable initial value for
optimization routines.
Orthogonal parameterization As seen before, reducing dependence between coordinates is
desirable for MCMC methods. Dependence can be characterized using asymptotic covariance and
the notion of orthogonality according to Jeffreys (1961): parameters are said to be orthogonal when
the Fisher information is diagonal. From this definition, having orthogonal parameters leads to
asymptotic posterior independence when a Bernstein–von Mises theorem holds (e.g., Van der Vaart,
4
2000, Chapter 10). However, the problem of finding an orthogonal parameterization is seldom
feasible when there are more than three parameters, since the number of equations is then greater
than the number of unknown variables. In the case of three parameters, there are as many equations
as there are unknowns, but the non linear system does not necessarily lead to a solution (Huzurbazar,
1950).
The main use of orthogonal parameterization is to make parameters of interest independent of
nuisance parameters (Cox and Reid,1987). Other definitions of orthogonality are also proposed to be
more adapted to the inferential context (Tibshirani and Wasserman,1994) or to ensure consistency
of the parameter of interest (Woutersen,2011). For Bayesian inference, Tibshirani and Wasserman
(1994) compares different definitions and suggests a strong assumption of normality for the posterior.
In the following, we keep the most popular definition of orthogonality due to Jeffreys (1961), as
we are not interested in properties associated with the estimation of a given parameter of interest,
but rather on the dependence structure between parameters. However, up to our knowledge, there
is no clear evidence in the literature of a direct link between parameter orthogonality and mixing
properties of the corresponding MCMC chains, such as a better convergence rate. In Section 4, we
bring some empirical evidence on the interest of orthogonality in extreme value models.
1.3 Contributions and outline
In this paper, we study the benefits of reparameterization for the Poisson process characterization
of extremes in a Bayesian context. In particular, it is shown that the orthogonal parameterization
is useful for several reasons: we argue in Section 2that it improves the performance of MCMC
algorithms in terms of convergence, and we show in Section 3that it also facilitates the derivation of
priors such as Jeffreys and an informative variant on the shape parameter using penalized complexity
(PC) priors (Simpson et al.,2017). These results are then illustrated by experiments in Section 4,
first on simulations to compare the different parameterizations, and second on a real dataset of the
Garonne river flow. Proofs as well as additional experiments are provided in the Appendix, and the
code corresponding to the experiments is available online.1
2 Reaching orthogonality for extreme Poisson process
An attempt to reparametrize the Poisson process for extremes in order to improve MCMC convergence
already exists in the literature (Sharkey and Tawn,2017), but has several limitations that are
detailed here. Instead, we suggest to use the fully orthogonal parameterization of Chavez-Demoulin
and Davison (2005).
Near-orthogonality with hyperparameter tuning Based on the relationship between pa-
rameters given in Equation (4), Sharkey and Tawn (2017) suggests to change the scaling factor
m
before using Metropolis–Hastings algorithm in order to optimize MCMC convergence. To this aim,
they minimize the non-diagonal elements of the inverse Fisher information matrix corresponding to
asymptotic covariances and then retrieved the parameters corresponding to the initial number of
blocks from Equation (4). As the calculations cannot be achieved explicitly, the authors found em-
pirically that the values
m1
and
m2
that cancel respectively the asymptotic covariances
ACov
(
µ, σ
)
and
ACov
(
σ, ξ
)are such that any
m
[
m1, m2
]improves the MCMC convergence. Approximations
1https://github.com/TheoMoins/ExtremesPyMC
5
摘要:

ReparameterizationofextremevalueframeworkforimprovedBayesianworkflowThéoMoins∗JulyanArbel∗StéphaneGirard∗AnneDutfoy†June12,2023AbstractUsingBayesianmethodsforextremevalueanalysisoffersanalternativetofrequentistones,withseveraladvantagessuchaseasilydealingwithparametricuncertaintyorstudyingirregularm...

展开>> 收起<<
Reparameterization of extreme value framework for improved Bayesian workflow Théo MoinsJulyan ArbelStéphane GirardAnne Dutfoy.pdf

共32页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:32 页 大小:4.08MB 格式:PDF 时间:2025-04-29

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 32
客服
关注