
www.advancedsciencenews.com www.ann-phys.org
dataset output by LALInference in Section 4 and present the re-
sults on the GW events in Section 5. Concluding remarks are
given in Section 6.
2. Insertion Order Statistics in Nested Sampling
2.1. Nested Sampling
For a given GW event associated with the coalescence of a com-
pact binary, we can describe its source properties by a parame-
ter vector 𝜃∈Θ,whereΘdenotes the corresponding parameter
space, including the mass and spin of each component, the dis-
tance to the source, its sky-location and orientation angles, time,
and phase of coalescence, as well as any additional parameters
relating to matter properties in case of a neutron star, orbital ec-
centricity, etc. Given the observed data D, our aim is to infer the
parameters 𝜃of the source, i.e. estimate the posterior distribu-
tion P(𝜃|D, ) under the assumption that our background infor-
mation about the nature of the source, the behavior of our de-
tectors and the validity of GR as the underlying theory is correct.
In Bayesian statistics, this amounts to updating our prior ex-
pectations quantified by P(𝜃|) by making appropriate use of
Bayes’ theorem
P(D|𝜃,)×P(𝜃|)=P(D|)×P(𝜃|D, )(1)
L(𝜃)×𝜋(𝜃)d𝜃=Z×p(𝜃)d𝜃
L(𝜃)=P(D|𝜃,), known as the likelihood function, and 𝜋(𝜃)=
P(𝜃|), the prior, give the desired quantities Z=P(D|), the evi-
dence and p=P(𝜃|D, ), the posterior. Computing the likelihood
function (the probability density for observing data D, given the
model and the true values of the parameters) requires models
for both the detector signal and noise—in LIGO’s case, LALSim-
ulation can generate a waveform model for the signal, while the
noise for each detector is assumed to be Gaussian and is charac-
terized by a power spectral density (PSD) which is pre-estimated
based on a stretch of data around the time of the event.[15] Infor-
mation from all detectors in operation is combined into a coher-
ent network likelihood which is the product of individual detec-
tor likelihoods.[10] The task of efficiently sampling the parameter
space to map the likelihood function, is carried out by the nested
sampling algorithm.
The evidence Z—the probability of observing the measured
data, given the model—is defined as
Z=∫L(𝜃)𝜋(𝜃)d𝜃(2)
This is an important quantity in Bayesian data analysis, as
the evidences produced by different models can be directly com-
pared. Hence, the evidence can be used to rank competing hy-
potheses and quantify how much a given model is supported by
the data. dX=𝜋(𝜃)d𝜃is known as the element of prior mass. If
the prior mass contained by a likelihood contour
X(𝜆)=∫L(𝜃)>𝜆
𝜋(𝜃)d𝜃(3)
is known, the evidence can be written as a 1D integral,
Z=∫1
0
L(X)dX(4)
which is more computationally manageable than integrating
across a high-dimensional parameter space Θ.
Nested sampling is a method for computing evidence that
takes advantage of this formulation, relying on the statistical
properties of prior sampling to provide a fast and accurate es-
timate of the prior mass at each integration step.
2.2. Summary of the Nested Sampling Algorithm
Nested sampling relies on sampling from the constrained prior:
points from the prior with likelihood higher than some mini-
mum value. As points from the constrained prior are sampled
and discarded throughout the algorithm, the samples used at
each step are called live points.
The nested sampling algorithm proceeds as follows:
1. Choose the number of live points nlive and sample nlive ini-
tial points from the constrained prior. Also, set an evidence
threshold 𝜖.
2. Identify the live point with the lowest likelihood L∗
i. Discard
the live point and record its likelihood.
3. Sample a new live point from 𝜋(𝜃) with L>L∗
i. At this stage,
the prior volume compresses exponentially, giving prior vol-
ume Xi≈exp(−1∕nlive) on the ith step (the proof is nontrivial,
see ref. [5]).
4. Integrate the evidence Ziusing L∗
iand Xi.
5. Repeat steps (2)–(4) until a stopping condition is reached:
LmaxXi∕Zi<e𝜖,whereLmax is the highest likelihood discov-
ered so far, Xiis the prior volume inside the current iso-
likelihood contour L∗
i,andZiis the current estimate of the
evidence. For LALInference,𝜖=0.1; essentially, if all the live
points were to have the maximum discovered likelihood, the
evidence would only change by a factor of less than 0.1.[10]
Nested sampling requires faithful sampling from the con-
strained prior to produce accurate evidences and posteriors. In
practice, sampling from the entire prior and accepting only
points with high enough likelihood is impractically slow, be-
cause the volume of acceptable points decreases exponentially
in time. So, most implementations of nested sampling sample
from a restricted region of parameter space drawn around the
live points. LALInference, in particular, generates samples by an
MCMC chain from a randomly chosen previous livepoint, and
choosing the length of the MCMC chain is a tradeoff between
speed and accuracy.[10]
If the restricted region is too small or the MCMC chains too
short, the constrained prior may not fully cover the iso-likelihood
contour, violating the fundamental assumptions of nested sam-
pling. Plateaus—regions of constant L(𝜃)—also violate the as-
sumptions of nested sampling, causing live points to be nonuni-
formly distributed in X.
2.3. Insertion Order Crosscheck
The insertion index is the position where an element must be
inserted in a sorted list to preserve order. More concretely, if xis
a sorted list and there exists a sample ysuch that
xi−1<y<xi(5)
Ann. Phys. (Berlin)2022, 2200271 2200271 (2 of 8) © 2022 The Authors. Annalen der Physik published by Wiley-VCH GmbH