Nonparametric testing of the covariate signicance for spatial point patterns under the presence of nuisance covariates

2025-04-24 0 0 812.11KB 28 页 10玖币
侵权投诉
Nonparametric testing of the covariate significance for
spatial point patterns under the presence of nuisance
covariates
Jiˇr´ı Dvoˇak1and Tom´aˇs Mrkviˇcka2
1Faculty of Mathematics and Physics, Charles University, Czech Republic
2Faculty of Economics, University of South Bohemia, Czech Republic
October 12, 2022
Abstract. Determining the relevant spatial covariates is one of the most important
problems in the analysis of point patterns. Parametric methods may lead to incorrect
conclusions, especially when the model of interactions between points is wrong. There-
fore, we propose a fully nonparametric approach to testing significance of a covariate,
taking into account the possible effects of nuisance covariates. Our tests match the nom-
inal significance level, and their powers are comparable with the powers of parametric
tests in cases where both the model for intensity function and the model for interactions
are correct. When the parametric model for the intensity function is wrong, our tests
achieve higher powers. The proposed methods rely on Monte Carlo testing and take
advantage of the newly introduced covariate-weighted residual measure. We also define
a correlation coefficient between a point process and a covariate and a partial correla-
tion coefficient quantifying the dependence between a point process and a covariate of
interest while removing the influence of nuisance covariates.
Keywords: correlation coefficient, covariate, nonparametric methods, partial correla-
tion coefficient, point process, random shift test, residual analysis
1 Introduction
1.1 Motivation and overview
Spatial point patterns are often accompanied by spatial covariates. Determining the
relevant covariates that influence the positions of points is certainly one of the most
important questions of point pattern analysis. Applications include spatial epidemiology,
spatial ecology, exploration geology, seismology, and many other fields.
In this paper, we mainly focus on this question. Our proposed methods use nonpara-
metric tools. The second question that we are interested in is nonparametric quantifica-
1
arXiv:2210.05424v1 [stat.ME] 11 Oct 2022
tion of the spatial dependence between a point process and a covariate, both without and
with presence of nuisance covariates. We define a correlation coefficient and a partial
correlation coefficient between a point process and a covariate. The second problem has
not been studied before, to our knowledge.
The first problem is usually solved by parametric methods (Schoenberg, 2005; Waage-
petersen and Guan, 2009; Kutoyants, 1998; Coeurjolly and Lavancier, 2013), see Sec-
tion 2.1 for details. However, we show in our simulation study that even when the
parametric model is selected correctly, these tests of covariate significance may lead to
liberality. The parametric methods have even bigger problems when: 1) the paramet-
ric model for the intensity function is incorrect, or 2) the form of interactions between
points is specified incorrectly. We propose here two tests of covariate significance, a
fully nonparametric one which avoids both selecting the intensity function model and
the interaction model, and a semiparametric one which does not assume an interaction
model but uses the log-linear intensity function model as the one predominantly used in
practice. These two proposed tests do not exhibit liberality, and their powers are compa-
rable with the powers of parametric methods in cases with correctly specified models for
the intensity function and the interactions. The proposed tests also have a higher power
than the parametric ones when either the intensity function model or the interaction
model is misspecified.
Since the proposed nonparametric tests do not need to choose a specific model and
exhibit better properties than parametric methods, their use should become a standard
practice in the analysis of point patterns.
For determining relevant covariates one can also use the lurking variable plots (Bad-
deley and Turner, 2005) or appropriate information critera (Choiruddin et al., 2021) but
these do not provide formal tests. The only nonparametric method studying the depen-
dence of a point process and a covariate without nuisance covariates was introduced in
Dvoˇak et al. (2022).
Throughout the paper, we assume that the spatial covariates are continuous. The
methodology is up to a certain extent also applicable for categorical covariates, as dis-
cussed in Section 7.
1.2 Motivational examples
To illustrate the relevance of the questions posed above, we consider a part of the tropical
tree data set from the Barro Colorado Island plot (Condit, 1998). We focus on the
positions of 3 604 trees of the Beilschmiedia pendula species in a rectangular 1 000 ×500
metre sampling plot, plotted in the top left panel of Figure 1. This part of the data set
is available in the spatstat package. Below, we call it the BCI data set.
The intensity of point occurrence in the observation window is clearly nonconstant as
the trees tend to prefer specific environmental conditions. The variation in the intensity
of point occurrence may possibly be explained by the accompanying covariate informa-
tion. The available covariates include the terrain elevation and gradient (available in
the spatstat package) and the soil contents of mineralised nitrogen, phosphorus and
potassium (Dalling et al., 2022), see Figure 1. Maybe all the covariates bring important
2
120 140
0.05 0.2
−10 20 40
2 4 6 8
100 250
Figure 1: The Barro Colorado Island data set. From left to right, top to bottom: loca-
tions of trees, terrain elevation, terrain gradient, the soil contents of nitrogen, phosphorus
and potassium.
information and should be used for inference. However, it is equally possible that some
of the covariates bring redundant information (as could be expected from the nitrogen
and potassium content in this data set, see the bottom left and bottom right panel of
Figure 1) or that some of the covariates, in fact, do not influence the point process. It
is important to determine with high degree of confidence which covariates influence the
point process and should be included in the further steps of the inference.
In certain cases, a relevant parametric model can be specified based on the available
expert knowledge. However, often no such parametric model is available, or we do not
want to take a risk of model misspecification. Then nonparametric methods for covariate
selection need to be used.
Furthermore, we consider the Castilla-La Mancha forest fire data set, again available
in the spatstat package. We study the locations of 689 forest fires that occurred in this
region in Spain in 2007, plotted in the left panel of Figure 2. Below we call it the CLM
data set. The size of the region is approximately 400 by 400 kilometers. The intensity of
point occurrence is nonconstant and may be influenced by the accompanying covariates
(terrain elevation and gradient, see the middle and right panels of Figure 2). We aim
at quantifying the strength of influence of the individual covariates on the point process
and comparing it with the BCI data set.
1.3 Outline of the work
In order to achieve our objectives, we propose to employ the residual analysis (Baddeley
et al., 2005) with respect to the model built from the nuisance covariates. The sam-
ple (Kendall’s) correlation coefficient of the smoothed residual field and the interesting
covariate then quantifies their dependence both without and with nuisance covariates.
The latter defines the partial correlation.
The testing of covariate significance is proposed to be performed via a new test
statistic, the covariate-weighted residual measure, and a Monte Carlo test. The residual
3
500 1000 1500 2000
10 20 30 40
Figure 2: The Castilla-La Mancha data set. From left to right: locations of forest fires,
terrain elevation, terrain gradient.
analysis can be computed in the parametrical way, which defines our semiparametrical
approach, or it can be computed nonparametrically using the nonparametrical estimate
of the point pattern intensity (Baddeley et al., 2012) and it defines our completely
nonparametrical approach. The nonparametric residuals are used for the first time in
this work.
The replications in the Monte Carlo test are obtained through random shifts both
with torus correction (Lotwick and Silverman, 1982) and variance correction (Mrkviˇcka
et al., 2021). The torus correction is a standard method whereas the variance correction
was recently defined, and it allows to use nonrectangular windows and it better controls
the level of the test than the torus correction.
The paper is organised as follows. Section 2 recalls all the concepts we need to
define our procedures. Section 3 describes all new methods we are introducing in
this work. That is, nonparametric residuals, spatial (partial) correlation coefficient,
covariate-weighted residual measure, and tests of covariate significance with nuisance
covariate. Section 4 contains a simulation study in which the exactness and power of
our nonparametrical methods is compared with parametrical methods. Section 5 con-
tains an example of the usage of our methods for nonparametric selection of relevant
covariates. Section 6 contains an example of usage of our methods for comparison of
dependence strength. Finally, Section 7 is left for conclusions and discussion.
The Rcodes providing an implementation of the proposed methods are available at
https://msekce.karlin.mff.cuni.cz/~dvorak/software.html and will be available
in the planned package NTSS for R.
2 Notation and background
Let Xbe a point process on R2with the intensity function λ(u). Throughout this paper,
we assume that the intensity function of Xexists. Let C1, C2, . . . , Cm+1 be the covariates
in R2. Denote by WR2a compact observation window with area |W|and n(XB)
the number of points of the process Xobserved in the set B. We assume that the values
of the covariates are available in all points of W, at least on a fine pixel grid. This can
4
be achieved from a finite set of observations, e.g. by kriging techniques.
2.1 Covariate selection in parametric point process models
The dependence of the intensity function of a point process on the covariates C1, . . . , Cm
is often modelled parametrically, e.g. using the log-linear model
λ(u;β) = exp{β0+β1C1(u) + . . . +βmCm(u)}.(1)
The standard approach to estimating the model parameters βiis to maximize the Poisson
likelihood (Schoenberg, 2005; Waagepetersen and Guan, 2009). This corresponds to the
maximum likelihood approach for Poisson models, while for non-Poisson models, this
constitutes a first-order composite likelihood approach. For the log-linear model (1)
the estimation is implemented in the ppm function from the popular spatstat package
(Baddeley et al., 2015).
For Poisson or Gibbs processes, the ppm function also provides confidence intervals
for the regression parameters βiand the p-values of the tests of the null hypothesis
that βi= 0 for a given i, based on the asymptotic variance matrix (Kutoyants, 1998;
Coeurjolly and Rubak, 2013). For cluster processes, the kppm function from the spatstat
package provides means of model fitting. The regression parameters βifrom (1) are again
estimated using the ppm function, but the asymptotic variance matrix is determined ac-
cording to Waagepetersen (2008), taking into account the attractive interactions between
points.
The methods discussed above provide means for formal testing of the hypothesis that
βi= 0 for a given i∈ {1, . . . , m}, allowing one to select the set of relevant covariates to
be included in the model.
2.2 Parametric residuals for point processes
Residuals can be used to check whether the fitted model for the intensity function is
appropriate, see Baddeley et al. (2005) or Baddeley et al. (2015, Sec. 11.3). In the
following we employ the version of residuals based on the intensity function, as suggested
by R. Waagepetersen in the discussion to the paper Baddeley et al. (2005), rather than
based on the conditional intensity function as discussed in the paper itself. Let ˆ
βbe the
vector of the estimated regression parameters. The residual measure is defined as
R(B) = n(XB)ZB
λ(u;ˆ
β) du, (2)
where BWis a Borel set. The smoothed residual field is obtained as
s(u) = 1
e(u)
X
xiXW
k(uxi)ZW
k(uv)λ(v;ˆ
β) dv
,(3)
where e(u) = RWk(uv) dvis the edge-correction factor and kis a probability density
function in R2. In fact, the first term in (3) gives the nonparametric kernel estimate
5
摘要:

Nonparametrictestingofthecovariatesigni canceforspatialpointpatternsunderthepresenceofnuisancecovariatesJirDvorak1andTomasMrkvicka21FacultyofMathematicsandPhysics,CharlesUniversity,CzechRepublic2FacultyofEconomics,UniversityofSouthBohemia,CzechRepublicOctober12,2022Abstract.Determiningtherel...

收起<<
Nonparametric testing of the covariate signicance for spatial point patterns under the presence of nuisance covariates.pdf

共28页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:28 页 大小:812.11KB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 28
客服
关注