
1 Introduction
In nonparametric statistics, the target of statistical inference is a function or an infinite dimensional vector fthat is not
specifically modelled itself (See Wasserman (2006) for introductive overviews, Gin´
e and Nickl (2016) for mathemat-
ically unified understanding and Ichimura and Todd (2007) and Chen (2007) for overviews especially in the context
of economic literature). One of the important components of the function fis the density function because, in statis-
tics and its related fields, there are cases where we are interested in the distribution as a wage distribution (See e.g.
DiNardo et al. (1996)) or where a target of statistical inference depends on the density function as a conditional expec-
tation function. Although there are different methods for estimating a density function, we focus on the estimator based
on the kernel method, namely kernel density estimator (KDE), also called Rosenblatt estimator or Rosenblatt-Parzen
estimator after their pioneering works (Rosenblatt (1956) and Parzen (1962)).
The first-order asymptotic properties of KDE have been studied over a long period and it has been proven that,
under certain conditions, KDE has pointwise consistency and asymptotic normality (see e.g. Parzen (1962), and the
monograph by (Li and Racine,2007, pp.28-30)). As we will review in Section 2, the rate of convergence of KDE is
slower than the parametric rate, and furthermore, becomes slower as the dimension increases. This property is called
the curse of dimensionality. We can understand this as being the cost of using local data to avoid misspecification. Hall
(1991) has clarified the higher-order asymptotic properties of the estimator in both non-Studentised and Studentised
cases. The asymptotic expansion of KDE is no longer a series of n−1/2as parametric estimators, but a series of
(nh)−1/2, even in the non-Studentised case; it is a more complicated series in the Studentised case, where nand hare
the sample size and bandwidth, respectively.
Bandwidth hspecifies the flexibility of statistic models and is adjusted between the bias and variance trade-offs in
the sense that creating flexible models and consequently decreasing the bias results in increasing variance while creating
non-flexible models and decreasing the variance results in increasing bias. It is well known that the performance of
the kernel-based estimators depends greatly on the bandwidth, not so much on the kernel function. By defining a loss
function, one can compute the theoretically optimal bandwidth h0that minimises loss. For example, mean integrated
squared error (MISE) is the most commonly used global loss measure. However, in practice, such a bandwidth is
typically infeasible because it depends on the unknown density. Therefore, one has to choose the bandwidth in a
data-driven manner. Among the many bandwidth selection methods, two famous ones are cross-validation and plug-in
method. In this paper, we focus on the latter.
It is natural to ask whether the choice of bandwidth affects the asymptotic structure of the estimator. Ichimura
(2000) and Li and Li (2010) have considered the asymptotic distribution of kernel-based non/semiparametric estimators
with data-driven bandwidth. They argue that, under certain conditions, the bandwidth selection has no effect on the
first-order asymptotic structure of the estimators. Hall and Kang (2001) showed that the bandwidth selection by the
global plug-in method also has no effect on the asymptotic structure of KDE up to the order of O(n−2/5)for L=2 and
Lp=6, where Land Lpare kernel orders for the density estimation and estimation of an unknown part of the optimal
bandwidth, respectively.
Our contributions are fivefold. First, we provide the Edgeworth expansion of KDE with global plug-in bandwidth
up to the order of O{(nh0)−1+h2L
0}=O(n−2L
2L+1)and show that the bandwidth selection by the plug-in method begins
to affect the term whose convergence rate is O{(nh0)−1/2h0+hL+1
0}=O(n−(L+1)
2L+1)under the condition that Lpis large
enough. Second, we generalise Theorem 3.2 of Hall and Kang (2001), which states that bandwidth selection via the
global plug-in method has no effect on the asymptotic structure of KDE up to the order of O{(nh0)−1/2+hL
0}=
O(n−L
2L+1). Their results limit the order of kernel functions K(u)and H(u)to L=2,Lp=6, respectively, but we show
that they are valid for general orders Las well under the condition that Lpis large enough. Third, we explore Edgeworth
expansion of KDE with deterministic bandwidth in more detail than Hall (1991). We show that Edgeworth expansion
of Standardised KDE with deterministic bandwidth has the term of order O{(nh0)−1/2+hL
0}=O(n−L
2L+1)right after
the term Φ(z)with a gap between them. After that however, the terms decrease at the rate of O(h0) = O(n−1
2L+1).
However, the result of Hall and Kang (2001) and our results above need the kernel order Lpfor the estimation of
unknown parts of the optimal bandwidth to be high enough. We have two motivations to avoid imposing this condition
on Lp. One is that although the higher-order kernel is theoretically justified, in terms of implementation using a
computer, it has undesirable properties. The other is that the condition forces pilot bandwidth to be relatively large
but the range is restrictive especially in multidimensional settings. For details of the latter motivation, see the seminal
works of Cattaneo et al. (2010,2013,2014a,b) and Cattaneo and Jansson (2018). Then, as a fourth contribution ,
we weaken this condition on Lpassumed by Hall and Kang (2001) and our Theorem 3.1 and provide the Edgeworth
2