using the following terminology: σij is a cross covariance parameter, αij is a cross range parameter,
and νij is a cross smoothness parameter. To be consistent with how the Mat´ern is parameterized in
our existing software, our αij parameters are ranges, whereas they are inverse ranges in Gneiting et al.
(2010), Apanasovich et al. (2012), and Emery et al. (2022). When i=j, we refer to them as marginal
parameters. Note that this model implies the symmetry Cij (h) = Cji(h) which need not hold in
general. Li and Zhang (2011) and Qadir et al. (2021) proposed methods for modeling asymmetries.
The univariate Mat´ern model Cii(h) = σiiM(h|νii, αii) provides a valid (i.e nonnegative definite)
second-order structure for the marginal process Zi(x) as long as the marginal parameters σii, αii, νii
are positive. Additional conditions are needed on the cross parameters σij , αij and νij to ensure
that the multivariate Mat´ern model is valid for the multivariate process Z(x). We discuss these in
the next section.
Kleiber (2017) studied the properties of various multivariate spatial models, including separa-
ble models, kernel convolution models, the linear model of coregionalization, and the multivariate
Mat´ern. He found that the multivariate Mat´ern is sufficiently flexible in that it allows the high
frequency coherence to exhibit a range of behaviors, depending on the parameter settings. Loosely
speaking, the coherence between two components is the correlation between the linear combination
of each component and a sinusoidal function; it measures correlation between components at a par-
ticular frequency of variation. Qadir and Sun (2021) demonstrated that further improvements in
flexibility can be achieved with semiparametric models.
Over the past decades, spatial statisticians have produced a mountain of literature on the topic
of estimating covariance parameters, especially on the problem of computing estimates when the
dataset is very large. This work has led to various software packages, including INLA (Rue et al.,
2009; Lindgren et al., 2011), fields (Nychka et al., 2021), RandomFields (Schlather et al., 2022),
spBayes (Finley et al., 2015), spNNGP (Finley et al., 2022), LatticeKrig (Nychka et al., 2016), FRK
(Zammit-Mangion and Cressie, 2021), exageostat (Abdulah et al., 2018), GpGp (Guinness et al.,
2021), GPvecchia (Katzfuss et al., 2020), and GeoModels (Bevilacqua et al., 2018), to name a few.
Most of the research and software development is focused on the univariate case, with the exception
of RandomFields,exageostat, and spBayes, which are capable of fitting bivariate Mat´ern models,
and GeoModels, which has a number of bivariate spatial models.
Clearly, there is a need for reliable software capable of fitting multivariate spatial models with
two or more components to large datasets. In this work, we report on extending the GpGp R package
(Guinness et al., 2021) to handle the multivariate Mat´ern model, demonstrate its capabilities on sev-
eral datasets, and explore the implications of various proposed sufficient conditions on multivariate
Mat´ern parameters. GpGp implements Vecchia’s Gaussian process approximation (Vecchia, 1988),
along with improvements to the approximation (Guinness, 2018), and likelihood optimization proce-
dures that efficiently compute and make use of the gradient and Fisher information (Guinness, 2021).
The application of Vecchia’s approximation is agnostic to the covariance function, which has allowed
for the implementation of more than 25 covariance models in GpGp at the time of writing (package
version 0.4.0), including isotropic, geometrically anisotropic, nonstationary, and spatial-temporal
models on Euclidean spaces and spheres. GpGp has enjoyed success in spatial data competitions,
including being used by the winning team in the first KAUST spatial data analysis competition
(Huang et al., 2021), and by the winners of the multivariate spatial data analysis section of the
second competition (Abdulah et al., 2022).
Adding the multivariate Mat´ern model to GpGp presents difficulties that go far beyond the normal
challenges of implementing a typical univariate covariance function. As opposed to the univariate
Mat´ern, whose parameters must simply be positive in order for the model to be valid, known sufficient
conditions are more complicated, so some care must be taken when enforcing them. The parameter
space is also large; our formulation of the model, which allows for correlated nuggets, has 2p(p+ 1)
parameters. Depending on the dataset, many of the parameters–or combinations thereof–are not well
identified. In short, it is a nasty optimization problem. In order to quickly maximize the likelihood,
one has to take large steps through a high dimensional space fraught with Errors, Infs, and NaNs.
2