
2
to compare subsample local periodograms against a full sample version. The maximum is taken over
the L2-distance between periodograms over all time points. An asymptotic theory for the max-statistic,
however, is not provided, although an approximation theory is (see their Lemmas 1 and 3). Furthermore,
conforming with many offerings in the literature, under the null 𝑋𝑡is a linear process with iid Gaus-
sian innovations. Dette, Preuß and Vetter (2011) study locally stationary processes, and impose linearity
with iid Gaussian innovations. Their statistic is based on the minimum L2-distance between a spec-
tral density and its version under stationarity, and local power is non-trivial against 𝑇1/4-alternatives.
Aue et al. (2009) propose a nonparametric test for a break in covariance for multivariate time series
based on a version of a cumulative sum statistic.
Wavelet methods have arisen in various forms recently. von Sachs and Neumann (2000), using
technical wavelet decomposition components from Neumann and von Sachs (1997), propose a Haar
wavelet based localized periodogram test of covariance stationarity for locally stationary processes (cf.
Dahlhaus,1997,2009), but neglect to characterize power. Haar wavelet functions form an orthonormal
basis on L2[0,1), but the proposed frequency domain tests are complicated, a local power analysis
is not feasible, and empirical power may be weak (see simulation evidence from Jin, Wang and Wang
(2015)).
Dwivedi and Subba Rao (2011) and Jentsch and Rao (2015) use the discrete Fourier transform
[DFT] 𝐽𝑇(𝜔𝑘)=(2𝜋𝑇)−1/2Í𝑇
𝑡=1𝑋𝑡exp {𝑖𝑡𝜔𝑘}at canonical frequencies 𝜔𝑘=2𝜋𝑘/𝑇and 1 ≤
𝑘≤𝑇.Dwivedi and Subba Rao (2011) generate a portmanteau statistic from a normalized sam-
ple DFT covariance, exploiting the fact that an uncorrelated DFT implies second order stationar-
ity. Nason (2013) presents a covariance stationarity test based on Haar wavelet coefficients of the
wavelet periodogram, they assume linear local stationarity, and do not treat local power. See also
Nason, von Sachs and Kroisandt (2000).
In a promising offering in the wavelet literature, Jin, Wang and Wang (2015) [JWW] exploit so-
called Walsh functions (akin to “global square waves” although not truly wavelets; cf. Walsh (1923)
and their implied systematic samples for comparing sub-sample covariances with the full sample one.
They utilize a sample-size dependent maximum lag H𝑇and maximum systematic sample counter K𝑇,
and show their Wald test exhibits non-negligible local power against √𝑇-alternatives. They do not
consider any other orthonormal transformation because Walsh functions, they argue, have “desirable
properties” based primarily on simulation evidence, asymptotic independence of a sub-sample and
sample covariance difference (√𝑇(ˆ𝛾(𝑘1)
ℎ−ˆ𝛾ℎ),√𝑇(ˆ𝛾(𝑘2)
ℎ−ˆ𝛾ℎ)) across systematic samples 𝑘1≠𝑘2,
and joint asymptotic normality (JWW, p. 897). It seems, however, that such theoretical properties are
available irrespective of the orthonormal basis used, although we do not provide a proof. See Section
2.1, below, for definitions and notation. We do, however, find in the sequel that the Walsh basis has
superlative properties vis-à-vis a Haar wavelet basis.
JWW’s asymptotic analysis is driven by local stationarity and linearity 𝑋𝑡=Í∞
𝑖=0𝜓𝑖𝑍𝑡−𝑖, with zero
mean iid 𝑍𝑡, and 𝐸|𝑍𝑡|4+𝛿<∞,𝛿 > 0, which expedites characterizing a parametric asymptotic covari-
ance matrix estimator. The iid and linearity assumptions, however, rule out many important processes,
including nonlinear models like regime switching, and random coefficient processes, and any pro-
cess with a non-iid error (e.g. nonlinear ARMA-GARCH). JWW’s Wald-type test statistic requires an
inverted parametric variance estimator that itself requires five tuning parameters and choice of two ker-
nels.1Indeed, most of the tuning parameters only make sense under linearity given how they approach
asymptotic covariance matrix estimation.
1One tuning parameter 𝜆∈ (0, .5)governs the number 𝑄𝑇=[𝑇𝜆]of sample covariances that enter the asymptote covariance
matrix estimator (see their p. 899). The remaining four (𝑐1, 𝑐2;𝜉1, 𝜉2)are used for kernel bandwidths 𝑏𝑗=𝑐𝑗𝑇−𝜉𝑗,𝑗=1,2,
for computing the kurtosis of the iid process 𝑍𝑡under linearity (see p. 902-903). The authors set 𝑐𝑗equal to 1.2 times a so-called
"crude scale estimate" which is nowhere defined.