Higher-Order Asymptotic Properties of Kernel Density Estimator with Global Plug-In and Its Accompanying Pilot Bandwidth Shunsuke Imai Yoshihiko Nishiyama

2025-05-06 0 0 731.98KB 58 页 10玖币

侵权投诉

Higher-Order Asymptotic Properties of Kernel Density Estimator

with Global Plug-In and Its Accompanying Pilot Bandwidth

Shunsuke Imai *Yoshihiko Nishiyama †

October 5, 2022

Abstract

This study investigates the effect of bandwidth selection via a plug-in method on the asymptotic structure of

the nonparametric kernel density estimator. We generalise the result of Hall and Kang (2001) and ﬁnd that the

plug-in method has no effect on the asymptotic structure of the estimator up to the order of O{(nh0)−1/2+hL

0}=

O(n−L/(2L+1))for a bandwidth h0and any kernel order Lwhen the kernel order for pilot estimation Lpis high

enough. We also provide the valid Edgeworth expansion up to the order of O{(nh0)−1+h2L

0}and ﬁnd that, as

long as the Lpis high enough , the plug-in method has an effect from on the term whose convergence rate is

O{(nh0)−1/2h0+hL+1

0}=O(n−(L+1)/(2L+1)). In other words, we derive the exact achievable convergence rate of

the deviation between the distribution functions of the estimator with a deterministic bandwidth and with the plug-in

bandwidth. In addition, we weaken the conditions on kernel order Lpfor pilot estimation by considering the effect

of pilot bandwidth associated with the plug-in bandwidth. We also show that the bandwidth selection via the global

plug-in method possibly has an effect on the asymptotic structure even up to the order of O{(nh0)−1/2+hL

0}. Finally,

Monte Carlo experiments are conducted to see whether our approximation improves previous results.

Keywords: nonparametric statistics, kernel density estimator, plug-in bandwidth, Edgeworth expansion, coverage

probability

*Graduate School of Economics, Kyoto University, Yoshidahonmachi, Sakyoku, Kyoto, 606–8501, JAPAN, imai.shunsuke.57n@st.kyoto-u.ac.jp

†Institute of Economic Research, Kyoto University, Yoshidahonmachi, Sakyoku, Kyoto, 606–8501, JAPAN, nishiyama@kier.kyoto-u.ac.jp

arXiv:2210.01411v1 [math.ST] 4 Oct 2022

1 Introduction

In nonparametric statistics, the target of statistical inference is a function or an inﬁnite dimensional vector fthat is not

speciﬁcally modelled itself (See Wasserman (2006) for introductive overviews, Gin´

e and Nickl (2016) for mathemat-

ically uniﬁed understanding and Ichimura and Todd (2007) and Chen (2007) for overviews especially in the context

of economic literature). One of the important components of the function fis the density function because, in statis-

tics and its related ﬁelds, there are cases where we are interested in the distribution as a wage distribution (See e.g.

DiNardo et al. (1996)) or where a target of statistical inference depends on the density function as a conditional expec-

tation function. Although there are different methods for estimating a density function, we focus on the estimator based

on the kernel method, namely kernel density estimator (KDE), also called Rosenblatt estimator or Rosenblatt-Parzen

estimator after their pioneering works (Rosenblatt (1956) and Parzen (1962)).

The ﬁrst-order asymptotic properties of KDE have been studied over a long period and it has been proven that,

under certain conditions, KDE has pointwise consistency and asymptotic normality (see e.g. Parzen (1962), and the

monograph by (Li and Racine,2007, pp.28-30)). As we will review in Section 2, the rate of convergence of KDE is

slower than the parametric rate, and furthermore, becomes slower as the dimension increases. This property is called

the curse of dimensionality. We can understand this as being the cost of using local data to avoid misspeciﬁcation. Hall

(1991) has clariﬁed the higher-order asymptotic properties of the estimator in both non-Studentised and Studentised

cases. The asymptotic expansion of KDE is no longer a series of n−1/2as parametric estimators, but a series of

(nh)−1/2, even in the non-Studentised case; it is a more complicated series in the Studentised case, where nand hare

the sample size and bandwidth, respectively.

Bandwidth hspeciﬁes the ﬂexibility of statistic models and is adjusted between the bias and variance trade-offs in

the sense that creating ﬂexible models and consequently decreasing the bias results in increasing variance while creating

non-ﬂexible models and decreasing the variance results in increasing bias. It is well known that the performance of

the kernel-based estimators depends greatly on the bandwidth, not so much on the kernel function. By deﬁning a loss

function, one can compute the theoretically optimal bandwidth h0that minimises loss. For example, mean integrated

squared error (MISE) is the most commonly used global loss measure. However, in practice, such a bandwidth is

typically infeasible because it depends on the unknown density. Therefore, one has to choose the bandwidth in a

data-driven manner. Among the many bandwidth selection methods, two famous ones are cross-validation and plug-in

method. In this paper, we focus on the latter.

It is natural to ask whether the choice of bandwidth affects the asymptotic structure of the estimator. Ichimura

(2000) and Li and Li (2010) have considered the asymptotic distribution of kernel-based non/semiparametric estimators

with data-driven bandwidth. They argue that, under certain conditions, the bandwidth selection has no effect on the

ﬁrst-order asymptotic structure of the estimators. Hall and Kang (2001) showed that the bandwidth selection by the

global plug-in method also has no effect on the asymptotic structure of KDE up to the order of O(n−2/5)for L=2 and

Lp=6, where Land Lpare kernel orders for the density estimation and estimation of an unknown part of the optimal

bandwidth, respectively.

Our contributions are ﬁvefold. First, we provide the Edgeworth expansion of KDE with global plug-in bandwidth

up to the order of O{(nh0)−1+h2L

0}=O(n−2L

2L+1)and show that the bandwidth selection by the plug-in method begins

to affect the term whose convergence rate is O{(nh0)−1/2h0+hL+1

0}=O(n−(L+1)

2L+1)under the condition that Lpis large

enough. Second, we generalise Theorem 3.2 of Hall and Kang (2001), which states that bandwidth selection via the

global plug-in method has no effect on the asymptotic structure of KDE up to the order of O{(nh0)−1/2+hL

0}=

O(n−L

2L+1). Their results limit the order of kernel functions K(u)and H(u)to L=2,Lp=6, respectively, but we show

that they are valid for general orders Las well under the condition that Lpis large enough. Third, we explore Edgeworth

expansion of KDE with deterministic bandwidth in more detail than Hall (1991). We show that Edgeworth expansion

of Standardised KDE with deterministic bandwidth has the term of order O{(nh0)−1/2+hL

0}=O(n−L

2L+1)right after

the term Φ(z)with a gap between them. After that however, the terms decrease at the rate of O(h0) = O(n−1

2L+1).

However, the result of Hall and Kang (2001) and our results above need the kernel order Lpfor the estimation of

unknown parts of the optimal bandwidth to be high enough. We have two motivations to avoid imposing this condition

on Lp. One is that although the higher-order kernel is theoretically justiﬁed, in terms of implementation using a

computer, it has undesirable properties. The other is that the condition forces pilot bandwidth to be relatively large

but the range is restrictive especially in multidimensional settings. For details of the latter motivation, see the seminal

works of Cattaneo et al. (2010,2013,2014a,b) and Cattaneo and Jansson (2018). Then, as a fourth contribution ,

we weaken this condition on Lpassumed by Hall and Kang (2001) and our Theorem 3.1 and provide the Edgeworth

expansion including the effect of pilot bandwidth up to the order of O{(nh0)−1+h2L

0}. In this situation, the bandwidth

selection via the global plug-in method possibly has an effect on the asymptotic structure of KDE even up to the

order of O{(nh0)−1/2+hL

0}(for example, when L=2 and Lp=2). Finally, we consider the intersectional effect of

the bandwidth selection via the global plug-in method, its accompanying pilot bandwidth, and Studentisation. The

proof of our main theorem owes much to Nishiyama and Robinson (2000). They have established the valid Edgeworth

expansion for the semiparametric density-weighted averaged derivatives estimator of the single index model, which

has an exact second-order U-statistic form. Although the higher-order asymptotic structure of U-statistics had been

studied before Nishiyama and Robinson (2000) (See e.g. Callaert et al. (1980)), the estimator is different from standard

U-statistics in that it is U-statistics whose kernel depends on the sample size nthrough the bandwidth. Since KDE with

plug-in bandwidth can also be approximated by a sum of ﬁrst- and second-order U-statistics whose kernel depends on

the sample size nthrough the bandwidth, we can beneﬁt from their proof.

The remainder of this paper is organised as follows. In the next section, we introduce KDE and review its known

properties. Section 3provides the main results, namely the Edgeworth expansion of the estimator with the global

plug-in bandwidth. In section 4, we employ Monte Carlo studies to compare our results with those of previous works.

Section 5concludes and discusses future research directions.

2 Review of the Estimator’s Properties

2.1 Estimator and Its First Order Properties

Assumption 1. Let {Xi}n

i=1be a random sample with an absolutely continuous distribution with Lebesgue density f .

First, we introduce nonparametric KDE ˆ

ffor unknown density f. Estimator ˆ

fat a point xwith a bandwidth his

deﬁned as follows:

fh(x)≡1

∑

i=1

KXi−x

h≡1

∑

i=1

Ki,h(x),

where Kis a kernel function, and we say that Kis a L-th order kernel, for a positive integer L, if

ZulK(u)du =









1(l=0)

0(15l5L−1)

C6=0,< ∞(l=L).

Assumption 2. In a neighbourhood of x, f is L times continuously differentiable and its ﬁrst L derivatives are bounded.

Assumption 3. Kernel function K is a bounded, even function with a compact support, of order L =2and RK(u)du =

Assumption 4. x is an interior point in the support of X .

Assumption 5. h→0,nh →∞as n →∞

KDE has pointwise consistency and asymptotic normality for an interior point in the support of X. Although it

also converges uniformly for an interior point in the support of X, we only review pointwise properties because we

investigate the pointwise higher-order asymptotics of KDE with global plug-in bandwidth. Under Assumption 1–3, we

can expand mean squared error (MSE) of ˆ

fh(x)as follows:

MSE[ˆ

f(x)] ≡E[{ˆ

f(x)−f(x)}2] = CLf(L)(x)hL2+R(K)f(x)

nh +o{h2L+ (nh)−1},(2.1)

where R(K) = RK(u)2du,CL=1

L!RuLK(u)du. Therefore, Markov’s inequality, Assumptions 1–5, and (2.1) imply

pointwise consistency ˆ

fh(x)p

−→ f(x). Moreover, we can show that KDE has asymptotic normality by applying Lindberg-

Feller’s central limit theorem: √nhˆ

fh(x)−Eˆ

fh(x)d

−→ N0,R(K)f(x).

Remark 1. Since E[ˆ

fh(x)] ≈f(x) +CLf(L)(x)hL, the statistics centred by f (x)asymptotically follows a zero-mean

normal distribution if nh2L+1→0holds. However, the theoretically optimal bandwidth does not satisfy this condition,

as we will discuss later. Therefore, we consider the statistics centred by E[ˆ

fh(x)], not f (x). For recent studies on

asymptotic bias of KDE, see, for example, Hall and Horowitz (2013) and Calonico et al. (2018). For other nonpara-

metric estimators, recent related studies are those by Armstrong and Koles´

ar (2018), Calonico et al. (2014),Calonico

et al. (2020,2022) and Schennach (2020).

2.2 Plug-In Method

Bandwidth his a parameter that analysts need to choose in advance. One of the criteria for bandwidth selection is the

mean integrated squared error (MISE):

MISE(h) = ZE[{ˆ

fh(x)−f(x)}2]dx.

The theoretically optimal bandwidth is the one that minimises MISE and, from the MISE expansion, this bandwidth is

deﬁned as follows:

h0=R(K)

2LC2

LIL1

2L+1

n−1

2L+1,

where IL=Rf(L)(x)2dx. Although h0would perform the best, it is infeasible because ILis unknown, so one has to

select the bandwidth from the available data. We examine the effect of a certain plug-in method on the distribution of

the estimator.

Several plug-in methods have been proposed so far (see e.g. Hall et al. (1991), Sheather and Jones (1991)). In this

paper, we adopt as Hall and Kang (2001), a simple plug-in method that estimates ILdirectly and nonparametrically

using the estimator proposed by Hall and Marron (1987). Their estimator, ˆ

ILfor IL, is given as follows:

IL=n

2−1n−1

∑

i=1

∑

j=i+1

b−(2L+1)H(2L)Xi−Xj

b≡n

2−1n−1

∑

i=1

∑

j=i+1

ILi j,

where b(called pilot bandwidth) is a bandwidth for estimation of IL, different from h, and His a kernel function of

order Lp.

Another estimator for ILproposed by Hall and Marron (1987) is

Znˆ

f(L)(x)o2

dx =1

nb2L+1¯

H(L)(0) + 1

n2b2L+1

∑

i=1

∑

j6=i

H(L)Xi−Xj

b(2.2)

where ˆ

f(L)(x)≡1

nbL+1∑n

i=1K(L)Xi−x

band ¯

H(L)(v)≡RH(L)(u)H(L)(v−u)du.Hall and Marron (1987) state that ’the

ﬁrst term does not make use of the data, and hence may be thought of as adding a type of bias in the estimator. This

motivates the estimator’.

Iconvo

L≡1

n(n−1)b2L+1

∑

i=1

∑

j6=i

H(L)Xi−Xj

b.(2.3)

Remark 2. ˆ

ILand ˆ

Iconvo

Lcan be negative in small samples. Although they are asymptotically justiﬁed, it can cause

problems in empirical applications. Hall and Kang (2001) avoid this problem by using |ˆ

IL|instead of ˆ

IL. Another way

is to use Rnˆ

f(L)(x)o2

dx instead of ˆ

Iconvo

L. In Section 4, we employ the Monte Carlo Study in these two ways.

Assumption 6. b=cn−2/(4L+2Lp+1)

Proposition 2.1 provides the expansion of the plug-in bandwidth (deﬁned as ˆ

h) and plays an essential role in the

derivation of the asymptotic expansion of KDE with the plug-in bandwidth. We assume additional conditions for

Proposition 2.1:

Assumption 7. In a neighbourhood of x, f is (2L+Lp)-times continuously differentiable and its ﬁrst (2L+Lp)

derivatives are bounded.

Assumption 8. Kernel function H is a bounded, even function with compact support, of order Lp=2, (2L)-times

continuously differentiable and for all integers k such that 15k52L−1,limu→±∞|H(k)(u)| → 0.

Assumption 7gives regularity conditions on the smoothness of the estimand, which implies Assumption 2. As-

sumption 8is on the kernel function Hfor the estimation of IL, and the condition at the inﬁnity of uis necessary for

integration by parts in the expanding process of (ˆ

h−h0)/h0. These assumptions can be interpreted as a generalisation

of assumption (Agpi)of Hall and Kang (2001) to Kof order Land Hof order Lp.

Proposition 2.1 (Expansion of Plug-In Bandwidth).Under Assumptions 1,3,4,6,7and 8, and additionally 13 for

Theorem 2.4,15 for Theorem 3.1, and 16 and 17 for Theorem 3.5 and 3.6, we can expand (ˆ

h−h0)/h0as follows:

h−h0

=−CPI

∑

i=1

Vi−CPI

2n

2−1n−1

∑

i=1

∑

j6=i

Wi j +op{(nh0)−1+h2L

0}(2.4)

where

CPI =2

2L+1I−1

Vi≡ {f(2L)(Xi)−Ef(2L)(Xi)}+RuLpH(u)du

(Lp)!bLpnf(2L+Lp)(Xi)−Ef(2L+Lp)(Xi)o+op(n−1/2bLp),

Wi j ≡nˆ

ILi j −Eˆ

ILi j|Xi−Eˆ

ILi j|Xj+Eˆ

ILi jo.

The proof is in A.1.

Remark 3. The ﬁrst term on the right-hand side of (2.4) reﬂects the projection term of the Hoeffding-decomposition

of ˆ

IL, whose convergence rate is Op(n−1/2). The second term reﬂects the quadratic term of the decomposed ˆ

IL, whose

convergence rate is Op(n−1b−(4L+1)/2).

Remark 4. Since the MSE optimal rate of b is Op(n−2

4L+2Lp+1)from Hall and Marron (1987), for example, when

one chooses the pilot bandwidth via the rule of thumb (see Silverman (1986)) or second-stage plug-in method, the

convergence rate of the second term in (2.4) is Op(n−1/2bLp) = Op(n−4L−6Lp−1

2(4L+2Lp+1)). We can make the second term in Vi

as small as we like up to the order of O(n−3/2)by letting kernel order Lpbe large enough. This is not an unrealistic

statement; for example, when one uses a second order kernel function K, adopting a second order kernel function is

sufﬁcient to make the effect of the second order term negligible in the sense that they do not affect on the asymptotic

structure of KDE up to the order of O{(nh0)−1+h2L

0}.

Remark 5. Since the MSE optimal rate of b is Op(n−2

4L+2Lp+1)from Hall and Marron (1987), for example, when one

choose the pilot bandwidth via rule of thumb (see Silverman (1986)), the convergence rate of the second term in (2.4)

is Op(n−1b−(4L+1)/2) = Op(n−2Lp

4L+2Lp+1). This implies that we can also make the third term of (2.4) as small as we like

up to the order of O(n−1)by letting kernel order Lpbe large enough. Although we cannot immediately identify how

large Lpneeds to be to make the effect of pilot bandwidth negligible without deriving the Edgeworth expansion with

pilot bandwidth, as we will see later, one has to adopt a considerably large Lp.

Remark 6. Since the convergence rate of the second term is Op(n−1b−(4L+1)/2) = Op(n−2Lp

4L+2Lp+1), if not Lp>(4L+

1)/2, the convergence rate of the second term is slower than that of the ﬁrst term. In order to ignore the effect of the

second term, Hall and Kang (2001) provide the expansion under the condition that L =2and Lp=6. The generalised

version of this assumption is provided as Assumption 13. In addition, we weaken the condition by considering the effect

of pilot bandwidth. We provide such results as Theorem 3.5 and 3.6.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Higher-OrderAsymptoticPropertiesofKernelDensityEstimatorwithGlobalPlug-InandItsAccompanyingPilotBandwidthShunsukeImai*YoshihikoNishiyamaOctober5,2022AbstractThisstudyinvestigatestheeffectofbandwidthselectionviaaplug-inmethodontheasymptoticstructureofthenonparametrickerneldensityestimator.Wegenerali...

展开>> 收起<<

Higher-Order Asymptotic Properties of Kernel Density Estimator with Global Plug-In and Its Accompanying Pilot Bandwidth Shunsuke Imai Yoshihiko Nishiyama.pdf

共58页,预览5页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Higher-Order Asymptotic Properties of Kernel Density Estimator with Global Plug-In and Its Accompanying Pilot Bandwidth Shunsuke Imai Yoshihiko Nishiyama

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: