(strong law of large numbers) theorem for this process, which is followed by a second order
analysis establishing convergence of probability measures.
To get a feel for potential applicability of partial sorting with bubble sort consider
Figure 1 which is based on an iid sequence of
n
= 1
,
000 observations from a standard normal
distribution. The figure presents the original data, followed by a display of the partially
sorted data for
β
= 0
.
25,
β
= 0
.
5, and finally
β
= 1 (fully sorted). The nature of bubble
sort is that with sorting level
β
the highest valued
βn
(rounded to an integer) observations
are perfectly sorted in the respective top positions of the array but the remaining (1
−β
)
n
observations are only partially sorted. Nevertheless as is evident from the figure a pattern
emerges. Specifically observe the right frontier in the case of
β
= 0
.
25 and
β
= 0
.
5 and note
that it follows a regular shape different from the cdf of the observations. We use this regular
pattern to define the bubble sort curve in the next section and make use of it throughout the
paper. In fact, we are able to characterize the bubble sort curve in terms of F0and β.
Our statistical method compares the bubble sort curve expected under
H0
with an
empirical bubble sort curve obtained from the right frontier of the data. This operates in a
manner that resembles the KS test. Remarkably, like in the case of the KS test we are able
to characterize the deviations between the empirical bubble sort curve and the theoretical
curve under
H0
. Specifically, as
n→ ∞
, the scaled deviations between the curves converge
to a well defined Gaussian process. We use this process to develop a test statistic with a
distribution under
H0
which is agnostic to
F0
and thus directly generalizes the Kolmogorov
distribution associated with the KS test.
To get a feel for the large sample asymptotics consider Figure 2 which is based on standard
iid uniform observations, each partially sorted with
β
= 0
.
25. The figure hints that as
n→ ∞
the empirical bubble sort curve (the right frontier of the data) converges. In Theorem 3.1
we establish the associated first order convergence and further in Theorem 3.2 we study
the fluctuations around this frontier and characterize a limiting stochastic process of these
fluctuations. Finally in Theorem 3.4 we compute the distribution of a statistic arising from
the fluctuations, similar to the KS statistic.
Similarly to the analysis that surrounds the KS test, our analysis is based both on a
law of large numbers type theorem and a second order theorem dealing with convergence
of probability measures. The former is an analogue of the celebrated Glivenko–Cantelli
theorem, see [
27
, Ch. 19] and [
12
]. The latter is analogous to Donsker’s theorem for cdfs; see
[
7
]. Hence our work is an extension of the basic development of the asymptotics of the KS
test as in [
7
], [
8
], and [
9
]. Specifically we develop process level convergence that is similar in
spirit to the process level convergence which appears in the analysis of the KS test.
A good historical account of Kolmogorov’s original paper [
20
] can be found in [
26
]. Since
that initial work, there have been dozens of extensions and generalizations, different in nature
to our work here. We now mention a few notable publications in this space. In [
21
] the
authors consider R´enyi-type statistics (see [
23
]) and a adapt them to create modified KS
tests that are sensitive to tail alternatives. In [
22
] the authors continue investigation of
fitting tails of the distribution and study exact properties of the so-called
Mn
statistics of
Berk and Jones [
4
]. Related is the recent work [
10
] dealing with two-sample KS type tests,
using local levels. In [
16
] the authors consider goodness of fit tests via
φ
-divergences. Beyond
these works, many other papers have explored variations and adaptations of KS tests; see
the notable papers [
6
,
11
,
13
,
14
,
15
,
17
,
18
]. Nevertheless, to the best of our knowledge, the
use of partial sorting to enhance goodness of fit procedures, as we present here, is new.
The remainder of this paper is structured as follows. In Section 2 we define the bubble sort
algorithm and the goodness of fit test that uses the partially sorted sample. This also includes
the definition of the generalized Kolmogorov distribution. In Section 3 the main theoretical
results of this paper our presented. Then sections 4, 5, and 6 contain the derivations and
proofs of the results. In Section 4 we establish the law of large numbers type convergence
results. In Section 5 we establish the weak convergence results. In Section 6 these results are
assembled to obtain the distribution of
Dβ
. In Section 7 we present numerical examples of
4