dent random variables such that the distribution of Ytdepends only on St.{St}usually
takes values in a finite set; tis often, although not necessarily, an integer index. How-
ever, {St}is unobservable, and instead we observe only {Yt}t≥0.{Yt}can be univariate
or multivariate, and can follow a discrete, continuous, or mixture distribution. {St}is
known as the state process, while {Yt}is called the emission or observation process.
The Markov property of the state process serves to capture the temporal dependency
in the data, and the emission process at each time point describes the spatial patterns
in the data. Much of the groundwork for using HMMs for daily precipitation was laid
in Hughes and Guttorp (1994), with Bellone et al. (2000) proposing different emission
distributions for precipitation amounts and precipitation occurrence models. This was
extended to non-homogeneous hidden Markov models by (Robertson et al., 2004, 2006;
Kirshner, 2005), where the transition probabilities of the HMM’s Markov process change
over time.
The overwhelming majority of HMM studies use the Baum-Welch algorithm (Baum
and Petrie, 1966; Baum and Eagon, 1967; Baum and Sell, 1968; Baum et al., 1970; Baum,
1972) for parameter estimation. It is a maximum likelihood approach used for efficient
parameter estimation in HMMs while taking into account the Markov assumptions of
the model, and can be considered as a variant of the expectation-maximization (EM)
algorithm (Dempster et al., 1977). The Viterbi algorithm (Viterbi, 1967) can then estimate
the most likely sequence of states that has generated the data. The ability to estimate
and interpret the underlying states of a relatively parsimonious model has made HMMs
a popular approach for sequential data. However, the Baum-Welch algorithm, being a
maximum likelihood based method, can run into problems for large datasets with com-
plex dependencies. In particular, it can lead to model overfitting for graphical models
which tend to have complex dependency structures (Attias, 1999). Holsclaw et al. (2016)
use a Bayesian approach to model daily precipitation, but in general, Bayesian alterna-
tives which use Gibbs sampling (Scott, 2002; Cappé et al., 2005) tend to be computa-
tionally intensive. Historically, the reliance on spatially non-uniform weather stations
for data has prevented these from being practical issues. However, as gridded remote
sensing data which tend to be highly correlated become more easily available, alterna-
tive approaches which are scalable and can incorporate prior information are desirable.
This is where variational Bayes (VB) provides an attractive alternative for parameter
estimation. While Markov chain Monte Carlo (MCMC) methods use sampling to find
the posterior distribution, VB uses optimization to calculate an approximate posterior;
the posteriors are obtained by an iterative EM-like algorithm which always converges
(Attias, 1999). The variational posteriors have analytical forms under certain conditions
(Ghahramani and Beal, 2000) and can be used to perform approximate Bayesian infer-
ence. A review of VB methods can be found in Blei et al. (2017). However, while VB
estimation has been implemented for state space models and HMMs (MacKay, 1997;
Ghahramani and Beal, 2000; Beal, 2003; Ji et al., 2006; McGrory and Titterington, 2009),
studies have usually only focused on cases where emissions are distributed as Normal
or mixtures of Normal distributions.
3