Stochastic Precipitation Generation for the Chesapeake Bay Watershed using Hidden Markov Models with Variational Bayes Parameter Estimation

2025-05-03 0 0 774.46KB 27 页 10玖币

侵权投诉

Stochastic Precipitation Generation for the Chesapeake Bay

Watershed using Hidden Markov Models with Variational

Bayes Parameter Estimation

Reetam Majumder1, Nagaraj K. Neerchal2and Amita Mehta2

December 14, 2022

Abstract

Stochastic precipitation generators (SPGs) are a class of statistical models which gen-

erate synthetic data that can simulate dry and wet rainfall stretches for long du-

rations. Generated precipitation time series data are used in climate projections,

impact assessment of extreme weather events, and water resource and agricultural

management. We construct an SPG for daily precipitation data that is speciﬁed as

a semi-continuous distribution at every location, with a point mass at zero for no

precipitation and a mixture of two exponential distributions for positive precipita-

tion. Our generators are obtained as hidden Markov models (HMMs) where the

underlying climate conditions form the states. We ﬁt a 3-state HMM to daily precip-

itation data for the Chesapeake Bay watershed in the Eastern coast of the USA for

the wet season months of July to September from 2000–2019. Data is obtained from

the GPM-IMERG remote sensing dataset, and existing work on variational HMMs is

extended to incorporate semi-continuous emission distributions. In light of the high

spatial dimension of the data, a stochastic optimization implementation allows for

computational speedup. The most likely sequence of underlying states is estimated

using the Viterbi algorithm, and we identify the differences in the weather regimes

associated with the states of the proposed model. Synthetic data generated from the

HMM can reproduce monthly precipitation statistics as well as spatial dependency

present in the historical GPM-IMERG data.

Key words: Variational Bayes; Hidden Markov models; Spatio-temporal statistics;

Stochastic optimization; Semi-continuous distributions

1North Carolina State University

2University of Maryland, Baltimore County

arXiv:2210.04305v2 [stat.AP] 13 Dec 2022

1 Introduction

Precipitation is the major component of the global water cycle and plays an important

role in atmospheric and land surface processes in the climate system. While numerical

weather models study precipitation over regional to global scales, observational data

are used to develop statistical models for precipitation over watershed to local areas

at higher temporal frequencies and higher spatial resolution. The measurement and

modeling of precipitation has historically relied on sparsely located rain gauge measure-

ments that are spatially irregular. In recent years, precipitation derived from remote

sensing observations with uniform spatial and temporal coverage are becoming more

easily available. A common class of statistical models which are of interest for analyzing

meteorological data are known as stochastic weather generators (SWGs). SWGs can be

used to generate multi-year series of synthetic data to simulate weather patterns and

are useful in weather and climate research; for precipitation data, the corresponding

generator is a stochastic precipitation generator (SPG). The modeling and forecasting

of seasonal and inter-annual variations in precipitation is used to determine water allo-

cation and resource management for regions dependent on precipitation as a primary

water source. To this end, SPGs produce time series of synthetic data representative

of the general rainfall patterns within a region. In particular, they aim to replicate key

statistical properties of the historical data like dry and wet stretches, spatial correlations,

and extreme weather events. SPGs are also used to downscale precipitation data from

numerical weather models, and simulations from them are used for climate projections,

impact assessments of extreme weather events, water resources and agricultural man-

agement, and for public and veterinary health. Numerical models used in weather and

climate research tend to be sensitive to initial conditions, and can be augmented by

SWGs. The output from SWGs are stochastic by nature and therefore have uncertainty

built in, and ensemble datasets generated from these models can improve other climate

and weather models. Breinl et al. (2017) provides a review of current SPG approaches

and applications.

Like most meteorological data, precipitation is modeled as a multivariate time series

whose univariate components each correspond to a location. However, modeling it di-

rectly using time series methodology usually requires the estimation of a large number

of parameters and high-dimensional autocovariance matrices. Daily precipitation data is

often modeled as a mixture of a point mass at 0 for no rainfall and one or more Gamma

or exponential distributions for positive rainfall (Hughes and Guttorp, 1994; Wilks, 1998;

Robertson et al., 2004; Mhanna and Bauwens, 2012), introducing an additional layer of

complexity. The statistical analysis of such datasets at scale calls for parameter esti-

mation approaches that are computationally efﬁcient while being able to represent the

dynamics of the underlying processes to a satisfactory degree. Hidden Markov mod-

els (HMMs), initially introduced and studied since the late 1960s (Rabiner, 1989; Cappé

et al., 2005), are an attractive class of models that have seen widespread use for construct-

ing SPGs. A hidden Markov model (HMM) is a pair of stochastic processes {St,Yt}t≥1

where {St}is a Markov chain, and conditional on it, {Yt}is a sequence of indepen-

dent random variables such that the distribution of Ytdepends only on St.{St}usually

takes values in a ﬁnite set; tis often, although not necessarily, an integer index. How-

ever, {St}is unobservable, and instead we observe only {Yt}t≥0.{Yt}can be univariate

or multivariate, and can follow a discrete, continuous, or mixture distribution. {St}is

known as the state process, while {Yt}is called the emission or observation process.

The Markov property of the state process serves to capture the temporal dependency

in the data, and the emission process at each time point describes the spatial patterns

in the data. Much of the groundwork for using HMMs for daily precipitation was laid

in Hughes and Guttorp (1994), with Bellone et al. (2000) proposing different emission

distributions for precipitation amounts and precipitation occurrence models. This was

extended to non-homogeneous hidden Markov models by (Robertson et al., 2004, 2006;

Kirshner, 2005), where the transition probabilities of the HMM’s Markov process change

over time.

The overwhelming majority of HMM studies use the Baum-Welch algorithm (Baum

and Petrie, 1966; Baum and Eagon, 1967; Baum and Sell, 1968; Baum et al., 1970; Baum,

1972) for parameter estimation. It is a maximum likelihood approach used for efﬁcient

parameter estimation in HMMs while taking into account the Markov assumptions of

the model, and can be considered as a variant of the expectation-maximization (EM)

algorithm (Dempster et al., 1977). The Viterbi algorithm (Viterbi, 1967) can then estimate

the most likely sequence of states that has generated the data. The ability to estimate

and interpret the underlying states of a relatively parsimonious model has made HMMs

a popular approach for sequential data. However, the Baum-Welch algorithm, being a

maximum likelihood based method, can run into problems for large datasets with com-

plex dependencies. In particular, it can lead to model overﬁtting for graphical models

which tend to have complex dependency structures (Attias, 1999). Holsclaw et al. (2016)

use a Bayesian approach to model daily precipitation, but in general, Bayesian alterna-

tives which use Gibbs sampling (Scott, 2002; Cappé et al., 2005) tend to be computa-

tionally intensive. Historically, the reliance on spatially non-uniform weather stations

for data has prevented these from being practical issues. However, as gridded remote

sensing data which tend to be highly correlated become more easily available, alterna-

tive approaches which are scalable and can incorporate prior information are desirable.

This is where variational Bayes (VB) provides an attractive alternative for parameter

estimation. While Markov chain Monte Carlo (MCMC) methods use sampling to ﬁnd

the posterior distribution, VB uses optimization to calculate an approximate posterior;

the posteriors are obtained by an iterative EM-like algorithm which always converges

(Attias, 1999). The variational posteriors have analytical forms under certain conditions

(Ghahramani and Beal, 2000) and can be used to perform approximate Bayesian infer-

ence. A review of VB methods can be found in Blei et al. (2017). However, while VB

estimation has been implemented for state space models and HMMs (MacKay, 1997;

Ghahramani and Beal, 2000; Beal, 2003; Ji et al., 2006; McGrory and Titterington, 2009),

studies have usually only focused on cases where emissions are distributed as Normal

or mixtures of Normal distributions.

ST-1

YT-1

A A A A

Figure 1: A graphical model representation of the conditional independence structure

for an HMM.

In this paper, we outline VB estimation for HMMs with semi-continuous emissions,

with the motivation of constructing an SPG for daily precipitation using gridded remote

sensing data from GPM-IMERG (Huffman et al., 2019) for a large spatial domain. Our

model is constructed using precipitation data for the Chesapeake Bay watershed in East-

ern US for the wet season months of July–September of 2000–2019. The SPG aims to

replicate the spatial correlation present in the data, as well as key properties of the orig-

inal data, e.g., the proportion of dry days (with no rainfall) and mean seasonal rainfall.

Estimates for these can be calculated using data simulated from the ﬁtted model.

The rest of this paper is organized as follows: Section 2 provides background for

HMMs and VB. Section 3 introduces the dataset and discusses the HMM for precipita-

tion as well as VB estimation for the model. Section 4 presents a numerical study for

multi-site precipitation, and also presents our case study for daily precipitation over the

Chesapeake Bay watershed. Section 5 concludes with a discussion.

2 Background

We provide some background on parameter estimation for hidden Markov models and

on variational Bayes in this section. A more thorough treatment of learning procedures

for HMMs as well as parameter estimation using variational Bayes can be found in

Majumder (2021, Chapter 2).

2.1 Hidden Markov models

An HMM consists of a sequence of multivariate observations y1:T= [y1, . . . , yT], to-

gether with a sequence of hidden (unobserved) states s1:T= [s1, . . . , sT]. The states

are assumed to follow a ﬁrst order Markov process, and the multivariate observation

yt= (yt1, . . . , ytL)0is emitted by the corresponding state st∈ {1, . . . , K}. For the pur-

poses of this study, Lcan be considered the number of spatial locations. Figure 1 shows

a graphical model representation of an HMM. The state process sis parameterized

by an initial probability π1j=Pr[s1=j]and a K×Kmatrix A, whose elements are

ajk =Pr[st+1=k|st=j]for j,k=1, . . . K. The probability density of the emission ytl at

location land time t, given that the system is in state j, is:

p(ytl |st=j) = pj(ytl |θjl ),

where θjl are the parameters associated with the distribution of ytl. The distribution

at each location are assumed to be independent conditional on the state, and the full

likelihood can be expressed as:

p(y,s|Θ) = π1j

T−1

∏

t=1

∏

j=1

∏

k=1

ajk

∏

t=1

∏

j=1

∏

l=1

pj(ytl |stj),

where stj =I{st=j}. We refer the reader to Rabiner (1989) for a detailed tutorial on

parameter estimation using the Baum-Welch algorithm.

2.2 Variational Bayes inference

Variational Bayes (VB) methods aim to approximate the posterior distribution through

optimization. VB tends to be faster than MCMC for intractable likelihoods, but it only

provides approximate inference. VB is suited for large datasets, and can take advantage

of stochastic optimization (Robbins and Monro, 1951) which makes it scalable. VB posits

a family of approximate posterior distributions Qover the latent variables zand param-

eters θ, and optimizes within this family to ﬁnd the member closest to the true posterior

p(z,θ|y). In its most widely applied form, the VB posterior minimizes the Kullback-

Leibler (KL) divergence (Kullback and Leibler, 1951) to the true posterior among all

candidates q(·)∈Q, i.e.,

q(·) = arg min

q(·)∈QKLq(z,θ)kp(z,θ|y). (1)

Optimizing the KL divergence is typically difﬁcult in practice since it involves computing

the log marginal likelihood log p(y). However, it is possible to ﬁnd a lower bound for

log p(y)and equivalently maximize a quantity known as the evidence lower bound

(ELBO) (Jordan et al., 1999), deﬁned as

ELBO(q) = E[log p(z,θ,y)] −E[log q(z,θ)]. (2)

The so called mean-ﬁeld assumption is commonly made to factorize q(z,θ)by assuming

independence between the variational posterior of the parameters and latent variables:

q(z,θ)≈q(z)q(θ). (3)

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

StochasticPrecipitationGenerationfortheChesapeakeBayWatershedusingHiddenMarkovModelswithVariationalBayesParameterEstimationReetamMajumder1,NagarajK.Neerchal2andAmitaMehta2December14,2022AbstractStochasticprecipitationgenerators(SPGs)areaclassofstatisticalmodelswhichgen-eratesyntheticdatathatcansimul...

展开>> 收起<<

Stochastic Precipitation Generation for the Chesapeake Bay Watershed using Hidden Markov Models with Variational Bayes Parameter Estimation.pdf

共27页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Stochastic Precipitation Generation for the Chesapeake Bay Watershed using Hidden Markov Models with Variational Bayes Parameter Estimation

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: