Higher-Order Asymptotic Properties of Kernel Density Estimator with Global Plug-In and Its Accompanying Pilot Bandwidth Shunsuke Imai Yoshihiko Nishiyama

2025-05-06 0 0 731.98KB 58 页 10玖币
侵权投诉
Higher-Order Asymptotic Properties of Kernel Density Estimator
with Global Plug-In and Its Accompanying Pilot Bandwidth
Shunsuke Imai *Yoshihiko Nishiyama
October 5, 2022
Abstract
This study investigates the effect of bandwidth selection via a plug-in method on the asymptotic structure of
the nonparametric kernel density estimator. We generalise the result of Hall and Kang (2001) and find that the
plug-in method has no effect on the asymptotic structure of the estimator up to the order of O{(nh0)1/2+hL
0}=
O(nL/(2L+1))for a bandwidth h0and any kernel order Lwhen the kernel order for pilot estimation Lpis high
enough. We also provide the valid Edgeworth expansion up to the order of O{(nh0)1+h2L
0}and find that, as
long as the Lpis high enough , the plug-in method has an effect from on the term whose convergence rate is
O{(nh0)1/2h0+hL+1
0}=O(n(L+1)/(2L+1)). In other words, we derive the exact achievable convergence rate of
the deviation between the distribution functions of the estimator with a deterministic bandwidth and with the plug-in
bandwidth. In addition, we weaken the conditions on kernel order Lpfor pilot estimation by considering the effect
of pilot bandwidth associated with the plug-in bandwidth. We also show that the bandwidth selection via the global
plug-in method possibly has an effect on the asymptotic structure even up to the order of O{(nh0)1/2+hL
0}. Finally,
Monte Carlo experiments are conducted to see whether our approximation improves previous results.
Keywords: nonparametric statistics, kernel density estimator, plug-in bandwidth, Edgeworth expansion, coverage
probability
*Graduate School of Economics, Kyoto University, Yoshidahonmachi, Sakyoku, Kyoto, 606–8501, JAPAN, imai.shunsuke.57n@st.kyoto-u.ac.jp
Institute of Economic Research, Kyoto University, Yoshidahonmachi, Sakyoku, Kyoto, 606–8501, JAPAN, nishiyama@kier.kyoto-u.ac.jp
1
arXiv:2210.01411v1 [math.ST] 4 Oct 2022
1 Introduction
In nonparametric statistics, the target of statistical inference is a function or an infinite dimensional vector fthat is not
specifically modelled itself (See Wasserman (2006) for introductive overviews, Gin´
e and Nickl (2016) for mathemat-
ically unified understanding and Ichimura and Todd (2007) and Chen (2007) for overviews especially in the context
of economic literature). One of the important components of the function fis the density function because, in statis-
tics and its related fields, there are cases where we are interested in the distribution as a wage distribution (See e.g.
DiNardo et al. (1996)) or where a target of statistical inference depends on the density function as a conditional expec-
tation function. Although there are different methods for estimating a density function, we focus on the estimator based
on the kernel method, namely kernel density estimator (KDE), also called Rosenblatt estimator or Rosenblatt-Parzen
estimator after their pioneering works (Rosenblatt (1956) and Parzen (1962)).
The first-order asymptotic properties of KDE have been studied over a long period and it has been proven that,
under certain conditions, KDE has pointwise consistency and asymptotic normality (see e.g. Parzen (1962), and the
monograph by (Li and Racine,2007, pp.28-30)). As we will review in Section 2, the rate of convergence of KDE is
slower than the parametric rate, and furthermore, becomes slower as the dimension increases. This property is called
the curse of dimensionality. We can understand this as being the cost of using local data to avoid misspecification. Hall
(1991) has clarified the higher-order asymptotic properties of the estimator in both non-Studentised and Studentised
cases. The asymptotic expansion of KDE is no longer a series of n1/2as parametric estimators, but a series of
(nh)1/2, even in the non-Studentised case; it is a more complicated series in the Studentised case, where nand hare
the sample size and bandwidth, respectively.
Bandwidth hspecifies the flexibility of statistic models and is adjusted between the bias and variance trade-offs in
the sense that creating flexible models and consequently decreasing the bias results in increasing variance while creating
non-flexible models and decreasing the variance results in increasing bias. It is well known that the performance of
the kernel-based estimators depends greatly on the bandwidth, not so much on the kernel function. By defining a loss
function, one can compute the theoretically optimal bandwidth h0that minimises loss. For example, mean integrated
squared error (MISE) is the most commonly used global loss measure. However, in practice, such a bandwidth is
typically infeasible because it depends on the unknown density. Therefore, one has to choose the bandwidth in a
data-driven manner. Among the many bandwidth selection methods, two famous ones are cross-validation and plug-in
method. In this paper, we focus on the latter.
It is natural to ask whether the choice of bandwidth affects the asymptotic structure of the estimator. Ichimura
(2000) and Li and Li (2010) have considered the asymptotic distribution of kernel-based non/semiparametric estimators
with data-driven bandwidth. They argue that, under certain conditions, the bandwidth selection has no effect on the
first-order asymptotic structure of the estimators. Hall and Kang (2001) showed that the bandwidth selection by the
global plug-in method also has no effect on the asymptotic structure of KDE up to the order of O(n2/5)for L=2 and
Lp=6, where Land Lpare kernel orders for the density estimation and estimation of an unknown part of the optimal
bandwidth, respectively.
Our contributions are fivefold. First, we provide the Edgeworth expansion of KDE with global plug-in bandwidth
up to the order of O{(nh0)1+h2L
0}=O(n2L
2L+1)and show that the bandwidth selection by the plug-in method begins
to affect the term whose convergence rate is O{(nh0)1/2h0+hL+1
0}=O(n(L+1)
2L+1)under the condition that Lpis large
enough. Second, we generalise Theorem 3.2 of Hall and Kang (2001), which states that bandwidth selection via the
global plug-in method has no effect on the asymptotic structure of KDE up to the order of O{(nh0)1/2+hL
0}=
O(nL
2L+1). Their results limit the order of kernel functions K(u)and H(u)to L=2,Lp=6, respectively, but we show
that they are valid for general orders Las well under the condition that Lpis large enough. Third, we explore Edgeworth
expansion of KDE with deterministic bandwidth in more detail than Hall (1991). We show that Edgeworth expansion
of Standardised KDE with deterministic bandwidth has the term of order O{(nh0)1/2+hL
0}=O(nL
2L+1)right after
the term Φ(z)with a gap between them. After that however, the terms decrease at the rate of O(h0) = O(n1
2L+1).
However, the result of Hall and Kang (2001) and our results above need the kernel order Lpfor the estimation of
unknown parts of the optimal bandwidth to be high enough. We have two motivations to avoid imposing this condition
on Lp. One is that although the higher-order kernel is theoretically justified, in terms of implementation using a
computer, it has undesirable properties. The other is that the condition forces pilot bandwidth to be relatively large
but the range is restrictive especially in multidimensional settings. For details of the latter motivation, see the seminal
works of Cattaneo et al. (2010,2013,2014a,b) and Cattaneo and Jansson (2018). Then, as a fourth contribution ,
we weaken this condition on Lpassumed by Hall and Kang (2001) and our Theorem 3.1 and provide the Edgeworth
2
expansion including the effect of pilot bandwidth up to the order of O{(nh0)1+h2L
0}. In this situation, the bandwidth
selection via the global plug-in method possibly has an effect on the asymptotic structure of KDE even up to the
order of O{(nh0)1/2+hL
0}(for example, when L=2 and Lp=2). Finally, we consider the intersectional effect of
the bandwidth selection via the global plug-in method, its accompanying pilot bandwidth, and Studentisation. The
proof of our main theorem owes much to Nishiyama and Robinson (2000). They have established the valid Edgeworth
expansion for the semiparametric density-weighted averaged derivatives estimator of the single index model, which
has an exact second-order U-statistic form. Although the higher-order asymptotic structure of U-statistics had been
studied before Nishiyama and Robinson (2000) (See e.g. Callaert et al. (1980)), the estimator is different from standard
U-statistics in that it is U-statistics whose kernel depends on the sample size nthrough the bandwidth. Since KDE with
plug-in bandwidth can also be approximated by a sum of first- and second-order U-statistics whose kernel depends on
the sample size nthrough the bandwidth, we can benefit from their proof.
The remainder of this paper is organised as follows. In the next section, we introduce KDE and review its known
properties. Section 3provides the main results, namely the Edgeworth expansion of the estimator with the global
plug-in bandwidth. In section 4, we employ Monte Carlo studies to compare our results with those of previous works.
Section 5concludes and discusses future research directions.
2 Review of the Estimator’s Properties
2.1 Estimator and Its First Order Properties
Assumption 1. Let {Xi}n
i=1be a random sample with an absolutely continuous distribution with Lebesgue density f .
First, we introduce nonparametric KDE ˆ
ffor unknown density f. Estimator ˆ
fat a point xwith a bandwidth his
defined as follows:
ˆ
fh(x)1
nh
n
i=1
KXix
h1
nh
n
i=1
Ki,h(x),
where Kis a kernel function, and we say that Kis a L-th order kernel, for a positive integer L, if
ZulK(u)du =
1(l=0)
0(15l5L1)
C6=0,< (l=L).
Assumption 2. In a neighbourhood of x, f is L times continuously differentiable and its first L derivatives are bounded.
Assumption 3. Kernel function K is a bounded, even function with a compact support, of order L =2and RK(u)du =
1.
Assumption 4. x is an interior point in the support of X .
Assumption 5. h0,nh as n
KDE has pointwise consistency and asymptotic normality for an interior point in the support of X. Although it
also converges uniformly for an interior point in the support of X, we only review pointwise properties because we
investigate the pointwise higher-order asymptotics of KDE with global plug-in bandwidth. Under Assumption 13, we
can expand mean squared error (MSE) of ˆ
fh(x)as follows:
MSE[ˆ
f(x)] E[{ˆ
f(x)f(x)}2] = CLf(L)(x)hL2+R(K)f(x)
nh +o{h2L+ (nh)1},(2.1)
where R(K) = RK(u)2du,CL=1
L!RuLK(u)du. Therefore, Markov’s inequality, Assumptions 15, and (2.1) imply
pointwise consistency ˆ
fh(x)p
f(x). Moreover, we can show that KDE has asymptotic normality by applying Lindberg-
Feller’s central limit theorem: nhˆ
fh(x)Eˆ
fh(x)d
N0,R(K)f(x).
3
Remark 1. Since E[ˆ
fh(x)] f(x) +CLf(L)(x)hL, the statistics centred by f (x)asymptotically follows a zero-mean
normal distribution if nh2L+10holds. However, the theoretically optimal bandwidth does not satisfy this condition,
as we will discuss later. Therefore, we consider the statistics centred by E[ˆ
fh(x)], not f (x). For recent studies on
asymptotic bias of KDE, see, for example, Hall and Horowitz (2013) and Calonico et al. (2018). For other nonpara-
metric estimators, recent related studies are those by Armstrong and Koles´
ar (2018), Calonico et al. (2014),Calonico
et al. (2020,2022) and Schennach (2020).
2.2 Plug-In Method
Bandwidth his a parameter that analysts need to choose in advance. One of the criteria for bandwidth selection is the
mean integrated squared error (MISE):
MISE(h) = ZE[{ˆ
fh(x)f(x)}2]dx.
The theoretically optimal bandwidth is the one that minimises MISE and, from the MISE expansion, this bandwidth is
defined as follows:
h0=R(K)
2LC2
LIL1
2L+1
n1
2L+1,
where IL=Rf(L)(x)2dx. Although h0would perform the best, it is infeasible because ILis unknown, so one has to
select the bandwidth from the available data. We examine the effect of a certain plug-in method on the distribution of
the estimator.
Several plug-in methods have been proposed so far (see e.g. Hall et al. (1991), Sheather and Jones (1991)). In this
paper, we adopt as Hall and Kang (2001), a simple plug-in method that estimates ILdirectly and nonparametrically
using the estimator proposed by Hall and Marron (1987). Their estimator, ˆ
ILfor IL, is given as follows:
ˆ
IL=n
21n1
i=1
n
j=i+1
b(2L+1)H(2L)XiXj
bn
21n1
i=1
n
j=i+1
ˆ
ILi j,
where b(called pilot bandwidth) is a bandwidth for estimation of IL, different from h, and His a kernel function of
order Lp.
Another estimator for ILproposed by Hall and Marron (1987) is
Znˆ
f(L)(x)o2
dx =1
nb2L+1¯
H(L)(0) + 1
n2b2L+1
n
i=1
n
j6=i
¯
H(L)XiXj
b(2.2)
where ˆ
f(L)(x)1
nbL+1n
i=1K(L)Xix
band ¯
H(L)(v)RH(L)(u)H(L)(vu)du.Hall and Marron (1987) state that ’the
first term does not make use of the data, and hence may be thought of as adding a type of bias in the estimator. This
motivates the estimator’.
ˆ
Iconvo
L1
n(n1)b2L+1
n
i=1
n
j6=i
¯
H(L)XiXj
b.(2.3)
Remark 2. ˆ
ILand ˆ
Iconvo
Lcan be negative in small samples. Although they are asymptotically justified, it can cause
problems in empirical applications. Hall and Kang (2001) avoid this problem by using |ˆ
IL|instead of ˆ
IL. Another way
is to use Rnˆ
f(L)(x)o2
dx instead of ˆ
Iconvo
L. In Section 4, we employ the Monte Carlo Study in these two ways.
Assumption 6. b=cn2/(4L+2Lp+1)
Proposition 2.1 provides the expansion of the plug-in bandwidth (defined as ˆ
h) and plays an essential role in the
derivation of the asymptotic expansion of KDE with the plug-in bandwidth. We assume additional conditions for
Proposition 2.1:
4
Assumption 7. In a neighbourhood of x, f is (2L+Lp)-times continuously differentiable and its first (2L+Lp)
derivatives are bounded.
Assumption 8. Kernel function H is a bounded, even function with compact support, of order Lp=2, (2L)-times
continuously differentiable and for all integers k such that 15k52L1,limu→±|H(k)(u)| → 0.
Assumption 7gives regularity conditions on the smoothness of the estimand, which implies Assumption 2. As-
sumption 8is on the kernel function Hfor the estimation of IL, and the condition at the infinity of uis necessary for
integration by parts in the expanding process of (ˆ
hh0)/h0. These assumptions can be interpreted as a generalisation
of assumption (Agpi)of Hall and Kang (2001) to Kof order Land Hof order Lp.
Proposition 2.1 (Expansion of Plug-In Bandwidth).Under Assumptions 1,3,4,6,7and 8, and additionally 13 for
Theorem 2.4,15 for Theorem 3.1, and 16 and 17 for Theorem 3.5 and 3.6, we can expand (ˆ
hh0)/h0as follows:
ˆ
hh0
h0
=CPI
n
n
i=1
ViCPI
2n
21n1
i=1
n
j6=i
Wi j +op{(nh0)1+h2L
0}(2.4)
where
CPI =2
2L+1I1
L,
Vi≡ {f(2L)(Xi)Ef(2L)(Xi)}+RuLpH(u)du
(Lp)!bLpnf(2L+Lp)(Xi)Ef(2L+Lp)(Xi)o+op(n1/2bLp),
Wi j nˆ
ILi j Eˆ
ILi j|XiEˆ
ILi j|Xj+Eˆ
ILi jo.
The proof is in A.1.
Remark 3. The first term on the right-hand side of (2.4) reflects the projection term of the Hoeffding-decomposition
of ˆ
IL, whose convergence rate is Op(n1/2). The second term reflects the quadratic term of the decomposed ˆ
IL, whose
convergence rate is Op(n1b(4L+1)/2).
Remark 4. Since the MSE optimal rate of b is Op(n2
4L+2Lp+1)from Hall and Marron (1987), for example, when
one chooses the pilot bandwidth via the rule of thumb (see Silverman (1986)) or second-stage plug-in method, the
convergence rate of the second term in (2.4) is Op(n1/2bLp) = Op(n4L6Lp1
2(4L+2Lp+1)). We can make the second term in Vi
as small as we like up to the order of O(n3/2)by letting kernel order Lpbe large enough. This is not an unrealistic
statement; for example, when one uses a second order kernel function K, adopting a second order kernel function is
sufficient to make the effect of the second order term negligible in the sense that they do not affect on the asymptotic
structure of KDE up to the order of O{(nh0)1+h2L
0}.
Remark 5. Since the MSE optimal rate of b is Op(n2
4L+2Lp+1)from Hall and Marron (1987), for example, when one
choose the pilot bandwidth via rule of thumb (see Silverman (1986)), the convergence rate of the second term in (2.4)
is Op(n1b(4L+1)/2) = Op(n2Lp
4L+2Lp+1). This implies that we can also make the third term of (2.4) as small as we like
up to the order of O(n1)by letting kernel order Lpbe large enough. Although we cannot immediately identify how
large Lpneeds to be to make the effect of pilot bandwidth negligible without deriving the Edgeworth expansion with
pilot bandwidth, as we will see later, one has to adopt a considerably large Lp.
Remark 6. Since the convergence rate of the second term is Op(n1b(4L+1)/2) = Op(n2Lp
4L+2Lp+1), if not Lp>(4L+
1)/2, the convergence rate of the second term is slower than that of the first term. In order to ignore the effect of the
second term, Hall and Kang (2001) provide the expansion under the condition that L =2and Lp=6. The generalised
version of this assumption is provided as Assumption 13. In addition, we weaken the condition by considering the effect
of pilot bandwidth. We provide such results as Theorem 3.5 and 3.6.
5
摘要:

Higher-OrderAsymptoticPropertiesofKernelDensityEstimatorwithGlobalPlug-InandItsAccompanyingPilotBandwidthShunsukeImai*YoshihikoNishiyama†October5,2022AbstractThisstudyinvestigatestheeffectofbandwidthselectionviaaplug-inmethodontheasymptoticstructureofthenonparametrickerneldensityestimator.Wegenerali...

展开>> 收起<<
Higher-Order Asymptotic Properties of Kernel Density Estimator with Global Plug-In and Its Accompanying Pilot Bandwidth Shunsuke Imai Yoshihiko Nishiyama.pdf

共58页,预览5页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:58 页 大小:731.98KB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 58
客服
关注