
Bootstrap in High Dimension with Low Computation
in deriving finite-sample CLT errors for technicality reasons.
Our bounds shed light that, at least for this wide class of
models, using a small number of resamples can achieve a
good coverage even in a dimension pgrowing closely with
n.
Bootstrap Bounds on Linear Models Independent of
p
:
We further specialize our bounds to linear functions with
weaker tail conditions, which have orders independent of
p
under certain conditions on the
Lp
norm or Orlicz norm of
the linearly scaled random variable.
In addition to theoretical bounds, we investigate the empiri-
cal performances of bootstraps using few resamples on large-
scale problems, including high-dimensional linear regres-
sion, high-dimensional logistic regression, computational
simulation modeling, and a real-world data set RCV1-v2
(Lewis et al.,2004). To give a sense of our comparisons
that support using the cheap bootstrap in high dimension,
here is a general conclusion observed in our experiments:
Figure 1(a) shows the coverage probabilities of
95%
-level
confidence intervals for three regression coefficients with
corresponding true values
0,2,−1
in a 9000-dimensional
linear regression (in Section 4). The cheap bootstrap cov-
erage probabilities are close to the nominal level
95%
even
with one resample, but the basic and percentile bootstraps
only attain around
80%
coverage with ten resamples. In
this example, one Monte Carlo replication to obtain each
resample estimate takes around 4 minutes in the virtual ma-
chine e2-highmem-2 in Google Cloud Platform. Therefore,
the cheap bootstrap only requires 4 minutes to obtain a sta-
tistically valid interval, but the standard bootstrap methods
are still far from the nominal coverage even after more than
a 40-minute run. Figure 1(b) shows the average interval
widths. This reveals the price of a wider interval for the
cheap bootstrap when the Monte Carlo budget is very small,
but considering the low coverages in the other two methods
and the fast decay of the cheap bootstrap width for the first
few number of resamples, such a price appears secondary.
Notation. For a random vector
X
, we write
Xk
as the ten-
sor power
X⊗k
. The vector norm is taken as the usual
Euclidean norm. The matrix and tensor norms are taken
as the operator norm. For a square matrix
M
,
tr(M)
de-
notes the trace of
M
.
Ip×p
denotes the identity matrix in
Rp×p
and
1p
denotes the vector in
Rp
whose components
are all
1
.
Φ
denotes the cumulative distribution function of
the standard normal.
C2(Rp)
denotes the set of twice con-
tinuously differentiable functions on
Rp
. Throughout the
whole paper, we use
C > 0
(without subscripts) to denote
a universal constant which may vary each time it appears.
We use
C1, C2, . . .
to denote constants that could depend on
other parameters and we will clarify their dependence when
using them.
2. Background on Bootstrap Methods
We briefly review standard bootstrap methods and from
there the recent cheap bootstrap. Suppose we are interested
in estimating a target statistical quantity
ψ:= ψ(PX)
where
ψ(·) : P 7→ R
is a functional defined on the probability mea-
sure space
P
. Given i.i.d. data
X1, . . . , Xn∈Rp
following
the unknown distribution
PX
, we denote the empirical dis-
tribution as
ˆ
PX,n(·) := (1/n)Pn
i=1 I(Xi∈ ·)
. A natural
point estimator is ˆ
ψn:= ψ(ˆ
PX,n).
To construct a confidence interval from
ˆ
ψn
, a typical be-
ginning point is the distribution of
ˆ
ψn−ψ
from which we
can pivotize. As this distribution is unknown in general,
the bootstrap idea is to approximate it using the resample
counterpart, as if the empirical distribution was the true
distribution. More concretely, conditional on
X1, . . . , Xn
,
we repeatedly, say for
B
times, resample (i.e., sample
with replacement) the data
n
times to obtain resamples
{X∗b
1, . . . , X∗b
n}, b = 1, . . . , B
. Denoting
ˆ
P∗b
X,n
as the re-
sample empirical distributions, we construct
B
resample
estimates
ˆ
ψ∗b
n:= ψ(ˆ
P∗b
X,n)
. Then we use the
α/2
and
(1 −α/2)
-th quantiles of
ˆ
ψ∗b
n−ˆ
ψn
, called
qα/2
and
q1−α/2
,
to construct
[ˆ
ψn−q1−α/2,ˆ
ψn−qα/2]
as a
(1 −α)
-level
confidence interval, which is known as the basic bootstrap
(Davison & Hinkley (1997) Section 5.2). Alternatively, we
could also use the
α/2
and
(1−α/2)
-th quantiles of
ˆ
ψ∗b
n
, say
q′
α/2
and
q′
1−α/2
, to form
[q′
α/2, q′
1−α/2]
, which is known as
the percentile bootstrap (Davison & Hinkley (1997) Section
5.3). There are numerous other variants in the literature,
such as studentization (Hall,1988), calibration or iterated
bootstrap (Hall,1986a;Beran,1987), and bias correction
and acceleration (Efron,1987;DiCiccio et al.,1996;DiCic-
cio & Tibshirani,1987), with the general goal of obtaining
more accurate coverage.
All the above methods rely on the principle that
ˆ
ψn−ψ
and
ˆ
ψ∗
n−ˆ
ψn
(conditional on
X1, . . . , Xn
) are close in distri-
bution. Typically, this means that, with a
√n
-scaling, they
both converge to the same normal distribution. In contrast,
the cheap bootstrap proposed in Lam (2022a;b) constructs a
(1 −α)-level confidence interval via
hˆ
ψn−tB,1−α/2Sn,B ,ˆ
ψn+tB,1−α/2Sn,B i,(1)
where
S2
n,B = (1/B)PB
b=1(ˆ
ψ∗b
n−ˆ
ψn)2
, and
tB,1−α/2
is
the
(1 −α/2)
-th quantile of
tB
, the
t
-distribution with de-
gree of freedom
B
. The quantity
S2
n,B
resembles the sample
variance of the resample estimates
ˆ
ψ∗b
n
’s, in the sense that as
B→ ∞
,
S2
n,B
approaches the bootstrap variance
V ar∗(ˆ
ψ∗
n)
(where
V ar∗(·)
denotes the variance of a resample condi-
tional on the data). In this way,
(1)
reduces to the normality
interval with a “plug-in” estimator of the standard error term
when
B
and
n
are both large. However, intriguingly,
B
does
not need to be large, and
S2
n,B
is not necessarily close to
2