Alternative Mean Square Error Estimators and Condence Intervals for Prediction of Nonlinear Small Area Parameters

2025-04-30 0 0 786.98KB 44 页 10玖币
侵权投诉
Alternative Mean Square Error Estimators
and Confidence Intervals for Prediction of
Nonlinear Small Area Parameters
Yanghyeon Cho and Emily Berg
Department of Statistics, Iowa State University, 2438 Osborn Dr., Ames, IA 50011, USA
Abstract
A difficulty in MSE estimation occurs because we do not specify a full distribution
for the survey weights. This obfuscates the use of fully parametric bootstrap proce-
dures. To overcome this challenge, we develop a novel MSE estimator. We estimate
the leading term in the MSE, which is the MSE of the best predictor (constructed
with the true parameters), using the same simulated samples used to construct the
basic predictor. We then exploit the asymptotic normal distribution of the parameter
estimators to estimate the second term in the MSE, which reflects variability in the
estimated parameters. We incorporate a correction for the bias of the estimator of
the leading term without the use of computationally intensive double-bootstrap pro-
cedures. We further develop calibrated prediction intervals that rely less on normal
theory than standard prediction intervals. We empirically demonstrate the validity of
the proposed procedures through extensive simulation studies. We apply the methods
to predict several functions of sheet and rill erosion for Iowa counties using data from
a complex agricultural survey.
1 Introduction
Small area estimation refers to the practice of using model-based estimators for domains
where direct estimators are considered unreliable. Many small area parameters are
nonlinear functions of the response variable in the model. Important examples that
occur in the domain of poverty mapping include the Gini coefficient and the proportion
of the population with income below the poverty line. Nonlinear small area parameters
also occur when the parameter of interest is the mean, and the model is specified in
1
arXiv:2210.12221v1 [stat.ME] 21 Oct 2022
a transformed scale. For instance, a log transformation is commonly used for skewed,
positive response variables. Molina & Rao (2010) propose a simulation-based procedure
that approximates the empirical best predictor of general small area parameters that
may be nonlinear functions of the model response variable. We call the method of
Molina & Rao (2010) the EBP method. Molina & Rao (2010) focus on frequentist
inference for the unit-level linear model. The method of Molina & Rao (2010) has been
extended to Bayesian inference (Molina et al. 2014), complex sampling (Guadarrama
et al. 2018), two-level models (Marhuenda et al. 2017), generalized linear mixed models
(Hobza & Morales 2016), data-driven transformations (Rojas-Perilla et al. 2020), and
skew-normal models (Diallo & Rao 2018).
Molina & Rao (2010) define a parametric bootstrap estimator of the mean square
error (MSE) of their small area predictor. The bootstrap MSE estimator of Molina &
Rao (2010) does not incorporate a correction for the bias of the estimator of the leading
term, where the leading term in the MSE is the conditional variance of the small area
parameter given the data. The double-bootstrap is commonly used to estimate the
bias of the estimator of the leading term (Hall & Maiti 2006a,b). As noted in Molina
& Rao (2010), use of the double-bootstrap is often computationally prohibitive.
We propose an alternative way to construct MSE estimators for predictors
obtained using the EBP method of Molina & Rao (2010). Our MSE estimators incor-
porate a correction for the bias of the estimator of the leading term, without requiring
the double bootstrap. We claim that this is possible because the EBP method furnishes
samples from the conditional distribution of the population parameter given the data.
These samples from the conditional distribution can be used to obtain an estimator of
the leading term in the MSE without implementing a bootstrap. We then use boot-
strap procedures to estimate the second term in the MSE, which reflects the variation
due to parameter estimation. We also use the parametric bootstrap to estimate the
bias of the estimator of the leading term. The parametric bootstrap that we propose
is less computationally expensive than the parametric bootstrap method of Molina
& Rao (2010) because we only require generating bootstrap versions of elements in
the sample. In contrast, the procedure of Molina & Rao (2010) requires generating a
bootstrap version of the entire population.
A further benefit of our proposed MSE estimation procedure is that it lends itself
2
naturally to the construction of calibrated prediction intervals. Molina & Rao (2010)
do not consider prediction intervals explicitly. One can construct a normal-theory pre-
diction interval as ˆ
θi±1.96ˆmsei, where ˆ
θidenotes the predictor and ˆmseidenotes the
MSE estimator. The normal-theory prediction interval may have poor coverage if the
standardized statistic defined as Ti=ˆmsei
1(ˆ
θiθi) does not have an approximately
normal distribution, where θidenotes the true parameter. We use the basic ingredients
defining the MSE estimator to construct calibrated prediction intervals that do not re-
quire normal theory. The proposed prediction intervals adapt the calibration procedure
of Carlin & Gelfand (1991) to the small area context. We use the same simulated sam-
ples used to estimate the leading term in the MSE to define a preliminary confidence
interval. The preliminary interval is then calibrated using the bootstrap. The calibra-
tion procedure is similar to a small area prediction interval proposed in Section 2.8 of
Hall & Maiti (2006b). Our procedure is tailored more specifically toward construction
of intervals for nonlinear parameters under unit-level models than the method of Hall
& Maiti (2006b). The prediction interval of Hall & Maiti (2006b) requires a bootstrap
version of the population parameter. For our procedure, we only generate bootstrap
versions of sampled elements and do not construct a bootstrap version of the popula-
tion parameter. Therefore, the procedure of Hall & Maiti (2006b) is not tenable for
use in combination with our proposed bootstrap procedure.
A further innovation of our work is that we consider MSE estimation and confi-
dence interval construction in the context of an informative design. The estimator of
the leading term in the MSE that we propose extends directly to an informative sam-
ple design. Estimation of the variance due to parameter estimation presents unique
challenges in the context of informative sampling. In our framework, we specify only
the first moment of the sample distribution of the survey weight, instead of postulating
a full distribution for the survey weight. As a result, the parametric bootstrap used for
the noninformative design does immediately apply in the context of informative sam-
pling. To overcome this challenge, we simulate bootstrap parameter estimates from a
nonparametric estimate of the asymptotic covariance matrix of the vector of parameter
estimators. Our use of the large sample distribution of the parameter estimators allows
us to circumvent the problem of specifying a full distribution for the survey weight. We
also evaluate the proposed prediction intervals in the context of informative sampling.
3
In contrast, Hall & Maiti (2006b) restrict attention to noninformative designs. Extend-
ing the procedure of Hall & Maiti (2006b) to informative sampling is nontrivial because
the procedure of Hall & Maiti (2006b) requires a bootstrap version of the population
parameter.
Variations of the proposed procedures have been used elsewhere. Sun et al. (2021)
implements a version of the proposed MSE estimator in the specific context of a bivari-
ate small area model with discrete and continuous components. Berg (2022) adapts
the proposed procedure for the purpose of constructing a database for small area es-
timation. The studies of Sun et al. (2021) and Berg (2022) are very specific to the
frameworks that they consider and are not easily generalizable. Further, Sun et al.
(2021) and Berg (2022) do not consider estimation of the bias of the estimator of the
leading term. In this work, we generalize the procedures with the aim of reaching a
broad audience. We also provide empirical and theoretical support for the methodol-
ogy. We conduct a thorough empirical evaluation of several estimators of the bias of
the estimator of the leading term in the MSE.
Upon completing this work, we learned that the estimator of the leading term that
we propose is in current use for production of poverty indicators at the World Bank.
The World Bank MSE estimator, however, does not appropriately reflect variability due
to parameter estimation. One of the contributions of our study is to provide rigorous
support for the estimator of the leading term in the MSE that is currently in use at
the World Bank (Isabelle Molina, Personal Communication, 7-6-22). The estimator
of the second term in the MSE that we propose has potential use of inference about
poverty measures at the World Bank. The relevance of the proposed procedures to the
current practice at the World Bank demonstrates that the methodology in this paper
is of salient importance for statistical practice.
Many other works propose MSE estimators that incorporate corrections for the
bias of the estimator of the leading term. Hall & Maiti (2006a) and Hall & Maiti
(2006b) propose parametric and non-parametric double-bootstrap based MSE estima-
tors that are very computationally intensive to implement. We do not consider the
bootstrap procedures of Hall & Maiti (2006a) or Hall & Maiti (2006b) as a result of
the computational burden. To reduce the computational demands, Erciulescu & Fuller
(2016) develop a fast double bootstrap. The fast double bootstrap MSE estimator of
4
Erciulescu & Fuller (2016) can result in negative estimates. In a study of small area
estimation based on the gamma distribution, Cho and Berg (in prepration) find that
the prevalence of negative estimates from the method of Erciulescu & Fuller (2016) is
nontrivial. Because one cannot construct a confidence interval from a negative MSE
estimate, we do not consider the method of Erciulescu & Fuller (2016). Erciulescu &
Fuller (2019) develop calibrated confidence intervals for small area means. It is not
immediately obvious to us that the method of Erciulescu & Fuller (2019) extends to
nonlinear small area parameters. Lahiri et al. (2007) develop positive MSE estimates
that incorporate a correction for the bias of the estimator of the leading term in the
context of an area-level model. An extension of their method to prediction of nonlinear
parameters in the context of a unit-level model is not straightforward and is beyond the
scope of our work. Further, their MSE estimator is much more difficult to implement
than the MSE estimator that we propose. An alternative to the bootstrap is to use
the jackknife to estimate the bias of the estimator of the leading term (Lohr & Rao
2009). The jackknife MSE estimator of Lohr & Rao (2009) is developed for an area-
level model. Because we focus on unit-level models, we do not consider the jackknife
MSE estimator of Lohr & Rao (2009). The SUMCA method (Jiang & Torabi 2020)
is an alternative way to construct a bias correction. We do not consider the SUMCA
method because it is not clear to us that SUMCA appropriately reflects the variance
of parameter estimators for nonlinear parameters, as we explain in Appendix A of the
supplementary material (SM).
We propose inference procedures that are computationally simple to implement
when used in combination with the EBP procedure of Molina & Rao (2010). The
procedures lend themselves naturally to construction of calibrated prediction intervals
and informative sampling. In Section 2, we define the proposed method for non-
informative and informative sample designs. We also define the confidence intervals and
corrections to the bias of the estimator of the leading term in Section 2. In Section 3,
we evaluate the proposed procedure through simulations that use both noninformative
and informative designs. In Section 4, we present two data analyses: one for non-
informative sampling and a second for informative sampling.
5
摘要:

AlternativeMeanSquareErrorEstimatorsandCon denceIntervalsforPredictionofNonlinearSmallAreaParametersYanghyeonChoandEmilyBergDepartmentofStatistics,IowaStateUniversity,2438OsbornDr.,Ames,IA50011,USAAbstractAdicultyinMSEestimationoccursbecausewedonotspecifyafulldistributionforthesurveyweights.Thisobf...

展开>> 收起<<
Alternative Mean Square Error Estimators and Condence Intervals for Prediction of Nonlinear Small Area Parameters.pdf

共44页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:44 页 大小:786.98KB 格式:PDF 时间:2025-04-30

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 44
客服
关注