An extended generalized Pareto regression model for count data 3
of zeros (meaning no avalanche has been reported) as well as heavy-tailed behavior.
We fit a zero-inflated negative binomial regression model under the framework of
generalized additive models for location, scale, and shape (GAMLSS) where the pa-
rameters are related to additive environmental covariates (see Section 4for a detailed
description) via suitable link functions (Stasinopoulos et al.,2018). The randomized
quantile residuals (Dunn and Smyth,1996) are used to check the adequacy of the fit-
ted model. Figure 1(b) clearly shows that the fitted models do not correctly estimate
the upper tail behavior of avalanche extremes. In addition, the number of zeros is
not correctly predicted in this example.
Extreme value theory, originally developed by Fisher and Tippett (1928), provides
a mathematical blueprint to model very high and very low-frequency events (e.g.,
extreme temperatures, heavy rainfall intensities, heavy floods, and extreme winds,
etc.), and monographs such as Coles (2001)orBeirlant et al. (2004) discuss the main
extreme value models. In particular, under the peak-over-threshold (POT) approach
(Pickands,1975), the distribution of exceedances of a high threshold is often ap-
proximated by the Generalized Pareto Distribution (GPD). Modifications of GPD
to discrete data exist in the literature (Krishna and Pundir,2009;Buddana and
Kozubowski,2014;Kozubowski et al.,2015), and recently Hitz et al. (2024) discussed
discrete versions of GPD to approximate the tail behavior of integer-valued random
variables. This approach still requires the definition of a threshold at a high quantile,
which is not easy due to the discrete nature of the data (Daouia et al.,2023).
It should also be noted that especially environmental time series are rarely stationary
and depend on environmental factors. A standard approach to modeling continuous
extremes of a non-stationary process focuses on maintaining a predetermined thresh-
old but treating parameters of the GPD as functions of covariates (Davison and Smith,
1990). An alternative approach (Eastoe and Tawn,2009) uses preprocessing methods
to model the non-stationarity in the body of the process to produce transformed data
and then uses standard methods to model the extremes of the transformed data. The
first approach has been adapted to the discrete case by Ranjbar et al. (2022). The
second approach seems to be difficult to adapt. The distribution of the preprocessed
data cannot be connected to a distribution of count data.
The proposed model addresses the issue of the POT approach ignoring or separating
non-extreme data below the selected threshold from the extremes. The model utilizes
a smooth transition between the bulk and upper tail of the distribution, for the full
range of the data, while bypassing a threshold selection. The discrete extended version
of GPD (DEGPD) is derived by discretizing the cumulative distribution function
(CDF) of an extended GPD (Naveau et al.,2016). The model takes into account the
possible effects of covariates in a non-parametric way. Since it is possible to have a
dataset with an excess of zeros, such as in the motivating example, we also consider
a mixture of the previous distribution with a degenerate distribution at zero. This
results in a distribution named Zero-Inflated DEGPD (ZIDEGPD).