Small Area Estimation using EBLUPs under the Nested Error Regression Model Ziyang Lyu

2025-05-03 0 0 558KB 35 页 10玖币
侵权投诉
Small Area Estimation using EBLUPs under the
Nested Error Regression Model
Ziyang Lyu *
UNSW Data Science Hub, School of Mathematics and Statisics
University of New South Wales
A.H. Welsh
Research School of Finance, Actuarial Studies and Statistics,
Australian National University
October 19, 2022
Abstract
Estimating characteristics of domains (referred to as small areas) within a population
from sample surveys of the population is an important problem in survey statistics. In this
paper, we consider model-based small area estimation under the nested error regression
model. We discuss the construction of mixed model estimators (empirical best linear
unbiased predictors, EBLUPs) of small area means and the conditional linear predictors of
small area means. Under the asymptotic framework of increasing numbers of small areas
and increasing numbers of units in each area, we establish asymptotic linearity results
and central limit theorems for these estimators which allow us to establish asymptotic
equivalences between estimators, approximate their sampling distributions, obtain simple
expressions for and construct simple estimators of their asymptotic mean squared errors,
and justify asymptotic prediction intervals. We present model-based simulations that show
that in quite small, finite samples, our mean squared error estimator performs as well or
better than the widely-used Prasad & Rao (1990) estimator and is much simpler, so is easier
to interpret. We also carry out a design-based simulation using real data on consumer
expenditure on fresh milk products to explore the design-based properties of the mixed
model estimators. We explain and interpret some surprising simulation results through
analysis of the population and further design-based simulations. The simulations highlight
important differences between the model- and design-based properties of mixed model
estimators in small area estimation.
Keywords: increasing area size asymptotics, indirect estimator, mean squared error estimation,
mixed model estimator, model-based prediction, prediction intervals
*
Ziyang Lyu is a postdoctoral research fellow in UNSW Data Science Hub, School of Mathematics and
Statisics,University of New South Wales, NSW, 2033, Australia. Email: lvziyang08@gmail.com. A.H.Welsh is
E.J. Hannan Professor of Statistics in the Research School of Finance, Actuarial Studies and Statistics, Australian
National University, ACT, 2601, Australia. Email: Alan.Welsh@anu.edu.au.
1
arXiv:2210.09502v1 [stat.ME] 18 Oct 2022
1 Introduction
Estimates of area-level characteristics of interest (such as means, totals, and quantiles) for
areas, domains or clusters within a population (all intended to be included whenever we refer
to areas) obtained from sample survey data are widely used for resource allocation in social,
education and environmental programs, and as the basis for commercial decisions. Direct
estimates which use only data specific to an area, can have large standard errors because of
relatively small area-specific sample sizes. Small area estimation is concerned with producing
more reliable estimates with valid measures of uncertainty for the characteristics of interest;
recent reviews have been given by for example Rao (2005,2008), Lehtonen & Veijanen (2009),
Pfeffermann (2013), and Rao & Molina (2015).
A popular method for obtaining reliable estimates (Fay & Herriot 1979,Battese et al. 1988)
is to introduce a mixed model for the population which includes fixed effects (to describe
either unit-level or area-level effects) and random effects (to capture additional between area
variation), fit the model using data from multiple areas and then, use the fitted model to
construct the desired estimates. When we have unit-level data, a simple and widely used model
is the nested error regression or random intercept model (Battese et al. 1988), and a widely
used method for estimating means or totals is to use empirical best linear unbiased predictors
(EBLUPs) obtained by minimising the (prediction) mean squared error and then estimating
the unknown quantities by maximum likelihood or restricted maximum likelihood (REML)
estimation; see for example Saei & Chambers (2003a,b), Jiang & Lahiri (2006) and Haslett &
Welsh (2019). In addition to the issue of the level at which the data are available, there is
also a subtlety about the target of estimation. The most commonly studied characteristics of
interest are small area means (or equivalently totals) and, when we assume a mixed model,
the conditional expectations of the small area means given the random effects. These two
targets, the small area means and their conditional expectations are different and have different
2
EBLUPs with potentially different mean squared errors, but are treated as interchangeable in
small area estimation. They are both random variables under the model-based framework, so
technically they need to be predicted rather than estimated. However, it is common to use both
prediction” and “estimation” in small area estimation and we refer to their EBLUPs as mixed
model estimators that are distinguished by their different targets (Tzavidis et al. 2010); they can
also be described as composite and synthetic estimators respectively.
The model-based variability of small area estimators is usually described by reporting
estimates of their (prediction) mean squared errors or by prediction intervals, often based on
these estimates. Estimation of (prediction) mean squared errors for mixed model estimators
is complicated, even for simple linear mixed models like the nested error regression model:
estimates of (prediction) mean squared error for mixed model estimators based on treating the
variance parameters as known (i.e. for the BLUPs rather than the EBLUPs) are underestimates
when linear mixed models are fitted to real data; and simple, analytic expressions for the
(prediction) mean squared errors of mixed model estimators are not available, complicating
their estimation. Under normal linear mixed models (including the nested error regression
model), when the number of areas is allowed to increase while the area sizes are held fixed
(or bounded), Kackar & Harville (1981) and Prasad & Rao (1990) used second order Taylor
expansions to obtain approximations to the (prediction) mean squared error of the EBLUPs
of the conditional expectation of the small area means, and then constructed mean squared
error estimators by replacing the unknown quantities in these approximations by estimators.
The Prasad-Rao approximation and estimator have been extended to more general models
and to allow additional estimators of the model parameters by Datta & Lahiri (2000) and Das
et al. (2004); see also Z
a¸
dło (2009) and Torabi & Rao (2013). Alternatives to estimators based
on analytic approximations include estimators obtained using resampling methods. Jiang
et al. (2002) proposed and investigated cluster-level jackknife methods (unusually, treating the
small area means as the characteristics of interest), Hall & Maiti (2006a) proposed a parametric
3
bootstrap approach for constructing bias-corrected estimates of the prediction mean squared
error and prediction regions and Chatterjee et al. (2008) used a different approach to construct
parametric bootstrap prediction intervals. For the considerably more complicated non-normal
case, Hall & Maiti (2006b) proposed a moment-matching, double-bootstrap procedure to
estimate the prediction mean squared error.
Many extensions of the basic approach have been developed. These include, for example,
introducing outlier robust estimators (Sinha & Rao 2009,Tzavidis et al. 2010), spatial models to
allow correlation between small areas (Saei & Chambers 2005,Torabi & Jiang 2020), and different
response distributions through using generalized linear mixed models (Saei & Chambers 2003a)
and models that allow responses with extra zeros (Chandra & Chambers 2016). Other work has
incorporated design-based considerations (Jiang & Lahiri 2006,Jiang et al. 2011).
The standard asymptotic framework for model-based small area estimation under the
nested error regression model follows Kackar & Harville (1981) and Prasad & Rao (1990) in
allowing the number of areas to increase while holding the area sizes fixed (or bounded).
This framework has several disadvantages, most notably that the small area estimators are
not consistent and their asymptotic distributions are not known. In turn, this hinders the
derivation of approximate distributions (none are available), complicates the construction
of mean squared error estimates and means that prediction intervals based on the estimated
mean squared errors cannot be shown to achieve their nominal level even asymptotically. To
overcome these difficulties, we need both the number of areas and the sample size in each
area to increase. This appears to contradict the “small” in “small area estimation, but the
approximations derived within this framework perform well (the ultimate purpose of deriving
asymptotic approximations) even when some areas have quite small sample size. In addition,
many practical applications include a number of large “small areas” and no tiny ones, so it is
often a practically relevant framework. Examples occur in clinical research (clustered trials)
when we study records on large groups (areas) of patients (units) with each group treated by a
4
different medical practitioner or at a different hospital, in educational research when we look at
records on college students (units) grouped within schools (areas), and in sample surveys when
we observe people or households (units) grouped in defined clusters (areas). For example,
Arora & Lahiri (1997) gave an instance with 43 areas ranging in size from 95 to 633 units, and
such examples are common in poverty data (Pratesi 2016).
We use recent increasing number of areas and increasing area size asymptotic results
obtained by Lyu & Welsh (2022) for estimating the parameters of the nested error regression
model by maximum likelihood or REML estimation and by Lyu & Welsh (2021) for the EBLUPs
for the random effects in the nested error regression model, to study small area estimators in
the same framework. Without having to assume normality in the model, we obtain simple
approximations to the distributions of the mixed model estimators that involve the distribution
of the characteristic of interest and a normal distribution. We further obtain strikingly simple
expressions for the asymptotic mean squared errors of the estimators which are easy to estimate
and can be used in prediction intervals which are demonstrated to have correct asymptotic
coverage. Such results are are not available under the standard asymptotic framework. They are
achieved by approximating the estimators directly and taking the (prediction) mean squared
error of the approximation, rather than directly approximating the (prediction) mean squared
error. Our results fill the practical and theoretical gap around the most widely used mixed
model estimators in small area estimation and suggest how other similar gaps can be treated.
We describe the nested error regression model, discuss the targets of estimation and the
mixed model estimators we consider in Section 2. We present our increasing number of areas
and increasing area size asymptotic results in Section 3and use (model-based) simulation
to demonstrate the relevance of these results to finite samples in Section 4. We include a
design-based simulation using real data on consumer expenditure on fresh milk products, and
then use additional design-based simulations to explore some unexpected findings in Section
5. We conclude with a brief discussion in Section 6.
5
摘要:

SmallAreaEstimationusingEBLUPsundertheNestedErrorRegressionModelZiyangLyu*UNSWDataScienceHub,SchoolofMathematicsandStatisicsUniversityofNewSouthWalesA.H.WelshResearchSchoolofFinance,ActuarialStudiesandStatistics,AustralianNationalUniversityOctober19,2022AbstractEstimatingcharacteristicsofdomains(ref...

展开>> 收起<<
Small Area Estimation using EBLUPs under the Nested Error Regression Model Ziyang Lyu.pdf

共35页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:35 页 大小:558KB 格式:PDF 时间:2025-05-03

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 35
客服
关注