Stochastic Precipitation Generation for the Chesapeake Bay Watershed using Hidden Markov Models with Variational Bayes Parameter Estimation

2025-05-03 0 0 774.46KB 27 页 10玖币
侵权投诉
Stochastic Precipitation Generation for the Chesapeake Bay
Watershed using Hidden Markov Models with Variational
Bayes Parameter Estimation
Reetam Majumder1, Nagaraj K. Neerchal2and Amita Mehta2
December 14, 2022
Abstract
Stochastic precipitation generators (SPGs) are a class of statistical models which gen-
erate synthetic data that can simulate dry and wet rainfall stretches for long du-
rations. Generated precipitation time series data are used in climate projections,
impact assessment of extreme weather events, and water resource and agricultural
management. We construct an SPG for daily precipitation data that is specified as
a semi-continuous distribution at every location, with a point mass at zero for no
precipitation and a mixture of two exponential distributions for positive precipita-
tion. Our generators are obtained as hidden Markov models (HMMs) where the
underlying climate conditions form the states. We fit a 3-state HMM to daily precip-
itation data for the Chesapeake Bay watershed in the Eastern coast of the USA for
the wet season months of July to September from 2000–2019. Data is obtained from
the GPM-IMERG remote sensing dataset, and existing work on variational HMMs is
extended to incorporate semi-continuous emission distributions. In light of the high
spatial dimension of the data, a stochastic optimization implementation allows for
computational speedup. The most likely sequence of underlying states is estimated
using the Viterbi algorithm, and we identify the differences in the weather regimes
associated with the states of the proposed model. Synthetic data generated from the
HMM can reproduce monthly precipitation statistics as well as spatial dependency
present in the historical GPM-IMERG data.
Key words: Variational Bayes; Hidden Markov models; Spatio-temporal statistics;
Stochastic optimization; Semi-continuous distributions
1North Carolina State University
2University of Maryland, Baltimore County
1
arXiv:2210.04305v2 [stat.AP] 13 Dec 2022
1 Introduction
Precipitation is the major component of the global water cycle and plays an important
role in atmospheric and land surface processes in the climate system. While numerical
weather models study precipitation over regional to global scales, observational data
are used to develop statistical models for precipitation over watershed to local areas
at higher temporal frequencies and higher spatial resolution. The measurement and
modeling of precipitation has historically relied on sparsely located rain gauge measure-
ments that are spatially irregular. In recent years, precipitation derived from remote
sensing observations with uniform spatial and temporal coverage are becoming more
easily available. A common class of statistical models which are of interest for analyzing
meteorological data are known as stochastic weather generators (SWGs). SWGs can be
used to generate multi-year series of synthetic data to simulate weather patterns and
are useful in weather and climate research; for precipitation data, the corresponding
generator is a stochastic precipitation generator (SPG). The modeling and forecasting
of seasonal and inter-annual variations in precipitation is used to determine water allo-
cation and resource management for regions dependent on precipitation as a primary
water source. To this end, SPGs produce time series of synthetic data representative
of the general rainfall patterns within a region. In particular, they aim to replicate key
statistical properties of the historical data like dry and wet stretches, spatial correlations,
and extreme weather events. SPGs are also used to downscale precipitation data from
numerical weather models, and simulations from them are used for climate projections,
impact assessments of extreme weather events, water resources and agricultural man-
agement, and for public and veterinary health. Numerical models used in weather and
climate research tend to be sensitive to initial conditions, and can be augmented by
SWGs. The output from SWGs are stochastic by nature and therefore have uncertainty
built in, and ensemble datasets generated from these models can improve other climate
and weather models. Breinl et al. (2017) provides a review of current SPG approaches
and applications.
Like most meteorological data, precipitation is modeled as a multivariate time series
whose univariate components each correspond to a location. However, modeling it di-
rectly using time series methodology usually requires the estimation of a large number
of parameters and high-dimensional autocovariance matrices. Daily precipitation data is
often modeled as a mixture of a point mass at 0 for no rainfall and one or more Gamma
or exponential distributions for positive rainfall (Hughes and Guttorp, 1994; Wilks, 1998;
Robertson et al., 2004; Mhanna and Bauwens, 2012), introducing an additional layer of
complexity. The statistical analysis of such datasets at scale calls for parameter esti-
mation approaches that are computationally efficient while being able to represent the
dynamics of the underlying processes to a satisfactory degree. Hidden Markov mod-
els (HMMs), initially introduced and studied since the late 1960s (Rabiner, 1989; Cappé
et al., 2005), are an attractive class of models that have seen widespread use for construct-
ing SPGs. A hidden Markov model (HMM) is a pair of stochastic processes {St,Yt}t1
where {St}is a Markov chain, and conditional on it, {Yt}is a sequence of indepen-
2
dent random variables such that the distribution of Ytdepends only on St.{St}usually
takes values in a finite set; tis often, although not necessarily, an integer index. How-
ever, {St}is unobservable, and instead we observe only {Yt}t0.{Yt}can be univariate
or multivariate, and can follow a discrete, continuous, or mixture distribution. {St}is
known as the state process, while {Yt}is called the emission or observation process.
The Markov property of the state process serves to capture the temporal dependency
in the data, and the emission process at each time point describes the spatial patterns
in the data. Much of the groundwork for using HMMs for daily precipitation was laid
in Hughes and Guttorp (1994), with Bellone et al. (2000) proposing different emission
distributions for precipitation amounts and precipitation occurrence models. This was
extended to non-homogeneous hidden Markov models by (Robertson et al., 2004, 2006;
Kirshner, 2005), where the transition probabilities of the HMM’s Markov process change
over time.
The overwhelming majority of HMM studies use the Baum-Welch algorithm (Baum
and Petrie, 1966; Baum and Eagon, 1967; Baum and Sell, 1968; Baum et al., 1970; Baum,
1972) for parameter estimation. It is a maximum likelihood approach used for efficient
parameter estimation in HMMs while taking into account the Markov assumptions of
the model, and can be considered as a variant of the expectation-maximization (EM)
algorithm (Dempster et al., 1977). The Viterbi algorithm (Viterbi, 1967) can then estimate
the most likely sequence of states that has generated the data. The ability to estimate
and interpret the underlying states of a relatively parsimonious model has made HMMs
a popular approach for sequential data. However, the Baum-Welch algorithm, being a
maximum likelihood based method, can run into problems for large datasets with com-
plex dependencies. In particular, it can lead to model overfitting for graphical models
which tend to have complex dependency structures (Attias, 1999). Holsclaw et al. (2016)
use a Bayesian approach to model daily precipitation, but in general, Bayesian alterna-
tives which use Gibbs sampling (Scott, 2002; Cappé et al., 2005) tend to be computa-
tionally intensive. Historically, the reliance on spatially non-uniform weather stations
for data has prevented these from being practical issues. However, as gridded remote
sensing data which tend to be highly correlated become more easily available, alterna-
tive approaches which are scalable and can incorporate prior information are desirable.
This is where variational Bayes (VB) provides an attractive alternative for parameter
estimation. While Markov chain Monte Carlo (MCMC) methods use sampling to find
the posterior distribution, VB uses optimization to calculate an approximate posterior;
the posteriors are obtained by an iterative EM-like algorithm which always converges
(Attias, 1999). The variational posteriors have analytical forms under certain conditions
(Ghahramani and Beal, 2000) and can be used to perform approximate Bayesian infer-
ence. A review of VB methods can be found in Blei et al. (2017). However, while VB
estimation has been implemented for state space models and HMMs (MacKay, 1997;
Ghahramani and Beal, 2000; Beal, 2003; Ji et al., 2006; McGrory and Titterington, 2009),
studies have usually only focused on cases where emissions are distributed as Normal
or mixtures of Normal distributions.
3
S1
Y1
S2
Y2
St
Yt
ST-1
YT-1
ST
YT
A A A A
Figure 1: A graphical model representation of the conditional independence structure
for an HMM.
In this paper, we outline VB estimation for HMMs with semi-continuous emissions,
with the motivation of constructing an SPG for daily precipitation using gridded remote
sensing data from GPM-IMERG (Huffman et al., 2019) for a large spatial domain. Our
model is constructed using precipitation data for the Chesapeake Bay watershed in East-
ern US for the wet season months of July–September of 2000–2019. The SPG aims to
replicate the spatial correlation present in the data, as well as key properties of the orig-
inal data, e.g., the proportion of dry days (with no rainfall) and mean seasonal rainfall.
Estimates for these can be calculated using data simulated from the fitted model.
The rest of this paper is organized as follows: Section 2 provides background for
HMMs and VB. Section 3 introduces the dataset and discusses the HMM for precipita-
tion as well as VB estimation for the model. Section 4 presents a numerical study for
multi-site precipitation, and also presents our case study for daily precipitation over the
Chesapeake Bay watershed. Section 5 concludes with a discussion.
2 Background
We provide some background on parameter estimation for hidden Markov models and
on variational Bayes in this section. A more thorough treatment of learning procedures
for HMMs as well as parameter estimation using variational Bayes can be found in
Majumder (2021, Chapter 2).
2.1 Hidden Markov models
An HMM consists of a sequence of multivariate observations y1:T= [y1, . . . , yT], to-
gether with a sequence of hidden (unobserved) states s1:T= [s1, . . . , sT]. The states
are assumed to follow a first order Markov process, and the multivariate observation
yt= (yt1, . . . , ytL)0is emitted by the corresponding state st∈ {1, . . . , K}. For the pur-
poses of this study, Lcan be considered the number of spatial locations. Figure 1 shows
a graphical model representation of an HMM. The state process sis parameterized
4
by an initial probability π1j=Pr[s1=j]and a K×Kmatrix A, whose elements are
ajk =Pr[st+1=k|st=j]for j,k=1, . . . K. The probability density of the emission ytl at
location land time t, given that the system is in state j, is:
p(ytl |st=j) = pj(ytl |θjl ),
where θjl are the parameters associated with the distribution of ytl. The distribution
at each location are assumed to be independent conditional on the state, and the full
likelihood can be expressed as:
p(y,s|Θ) = π1j
T1
t=1
K
j=1
K
k=1
ajk
T
t=1
K
j=1
L
l=1
pj(ytl |stj),
where stj =I{st=j}. We refer the reader to Rabiner (1989) for a detailed tutorial on
parameter estimation using the Baum-Welch algorithm.
2.2 Variational Bayes inference
Variational Bayes (VB) methods aim to approximate the posterior distribution through
optimization. VB tends to be faster than MCMC for intractable likelihoods, but it only
provides approximate inference. VB is suited for large datasets, and can take advantage
of stochastic optimization (Robbins and Monro, 1951) which makes it scalable. VB posits
a family of approximate posterior distributions Qover the latent variables zand param-
eters θ, and optimizes within this family to find the member closest to the true posterior
p(z,θ|y). In its most widely applied form, the VB posterior minimizes the Kullback-
Leibler (KL) divergence (Kullback and Leibler, 1951) to the true posterior among all
candidates q(·)Q, i.e.,
˜
q(·) = arg min
q(·)QKLq(z,θ)kp(z,θ|y). (1)
Optimizing the KL divergence is typically difficult in practice since it involves computing
the log marginal likelihood log p(y). However, it is possible to find a lower bound for
log p(y)and equivalently maximize a quantity known as the evidence lower bound
(ELBO) (Jordan et al., 1999), defined as
ELBO(q) = E[log p(z,θ,y)] E[log q(z,θ)]. (2)
The so called mean-field assumption is commonly made to factorize q(z,θ)by assuming
independence between the variational posterior of the parameters and latent variables:
q(z,θ)q(z)q(θ). (3)
5
摘要:

StochasticPrecipitationGenerationfortheChesapeakeBayWatershedusingHiddenMarkovModelswithVariationalBayesParameterEstimationReetamMajumder1,NagarajK.Neerchal2andAmitaMehta2December14,2022AbstractStochasticprecipitationgenerators(SPGs)areaclassofstatisticalmodelswhichgen-eratesyntheticdatathatcansimul...

展开>> 收起<<
Stochastic Precipitation Generation for the Chesapeake Bay Watershed using Hidden Markov Models with Variational Bayes Parameter Estimation.pdf

共27页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:27 页 大小:774.46KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 27
客服
关注