Model exploration in gravitational-wave astronomy with the maximum population likelihood Ethan Payne1 2 3 4 aand Eric Thrane3 4 b

2025-05-06 0 0 1.41MB 20 页 10玖币
侵权投诉
Model exploration in gravitational-wave astronomy with the
maximum population likelihood
Ethan Payne1, 2, 3, 4, aand Eric Thrane3, 4, b
1Department of Physics, California Institute of Technology, Pasadena, California 91125, USA
2LIGO Laboratory, California Institute of Technology, Pasadena, California 91125, USA
3School of Physics and Astronomy, Monash University, VIC 3800, Australia
4OzGrav: The ARC Centre of Excellence for Gravitational-Wave Discovery, Clayton, VIC 3800, Australia
Hierarchical Bayesian inference is an essential tool for studying the population properties of com-
pact binaries with gravitational waves. The basic premise is to infer the unknown prior distribution
of binary black hole and/or neutron star parameters such component masses, spin vectors, and
redshift. These distributions shed light on the fate of massive stars, how and where binaries are
assembled, and the evolution of the Universe over cosmic time. Hierarchical analyses model the
binary black hole population using a prior distribution conditioned on hyper-parameters, which are
inferred from the data. However, a misspecified model can lead to faulty astrophysical inferences.
In this paper we answer the question: given some data, which prior distribution––from the set of all
possible prior distributions––produces the largest possible population likelihood? This distribution
(which is not a true prior) is
π(pronounced “pi stroke”), and the associated maximum population
likelihood is
L(pronounced “L stroke”). The structure of
πis a linear superposition of delta func-
tions, a result which follows from Carath´eodory’s theorem. We show how
πand
Lcan be used for
model exploration/criticism. We apply this
Lformalism to study the population of binary black hole
mergers observed in LIGO–Virgo–KAGRA’s third Gravitational-Wave Transient Catalog. Based on
our results, we discuss possible improvements for gravitational-wave population models.
I. MOTIVATION
Bayesian inference has become a mainstay of mod-
ern scientific data analysis as a means of analysing sig-
nals in noisy observations. This procedure determines
the posterior distributions for parameters given one or
more model. In order to study the population prop-
erties of a set of uncertain observations, a hierarchi-
cal Bayesian framework can be employed. The basic
idea is to model the population using a conditional prior
π(θ|Λ, M), which describes, for example, the distribution
of black hole masses {m1, m2} ∈ θgiven some hyper-
parameters Λ, which determine the shape of the prior
distribution. Here, Mdenotes the choice of model. One
then carries out Bayesian inference using a “population
likelihood”
L(d|Λ, M) =
N
Y
i
1
ξ(Λ) ZdθiL(di|θi)π(θi|Λ, M),(1)
where L(di|θi) is the likelihood for data associated with
event igiven parameters θi, and ξ(Λ) is the detected
fraction for a choice of hyper-parameters. Meanwhile,
Nis the total number of observations. For an overview
of hierarchical modeling in gravitational-wave astronomy
including selection effects, see Refs. [13].
The LIGO-Virgo-KAGRA (LVK) Collaboration’s
third gravitational-wave transient catalog (GWTC-3) [4]
contains the cumulative set of observations of N= 69
aepayne@caltech.edu
beric.thrane@monash.edu
confident binary black-hole mergers [5] detected by the
LVK [68]. Additional detection candidates have been
put forward by independent groups [913]. Hierarchical
inference is employed to study the population properties
these merging binary black holes; see, e.g., Refs. [1432].
These analyses have revealed a number of exciting re-
sults, such as the surprising excess rate of mergers with
a primary black hole mass of 35 M[15], and the evo-
lution of the binary merger rate with redshift [16], to
name just two.
However, Bayesian inference has its limitations. One
can use Eq. (1) in order to infer the distribution of binary
black hole parameters—given some model ; and one can
compare the marginal likelihoods of two models to see
which one better describes the data. However, Bayesian
inference does not tell us if any of the models we are using
are suitable descriptions of the data. While all models for
the distribution of binary black hole parameters are likely
to be imperfect, some may be adequate for describing our
current dataset [33]. When a model fails to capture some
salient feature of the data, it is said to be “misspecified”
[34,35]. Some effort has been made to assess the suit-
ability of gravitational-wave models, both qualitatively
and quantitatively; see, e.g., [15,16,34,36]. However,
the idea of “model criticism”—testing the suitability of
Bayesian models—is still being developed within the con-
text of gravitational-wave astronomy and beyond.
Hierarchical Bayesian inference studies often depend
upon parametric models. Modelers design parameteri-
zations in order to capture the key features of the as-
trophysical distributions. However, one must still worry
about “unknown unknowns”—features which do not oc-
cur to the modeler to add. For example, recent stud-
ies [15,16,37,38] find a sub-population of binary black
arXiv:2210.11641v2 [astro-ph.IM] 22 Feb 2023
2
holes merge with spin vectors that are misaligned with
respect to the orbital angular momentum axis. How-
ever, the degree to which the spins are misaligned might
be model dependent. In Refs. [15,16,37], the inferred
minimum spin tilt is confidently &90. In contrast,
Refs. [17,28,38] argue this signature could be due to
a lack of flexibility in LVK models to account for a sub-
population of black holes with negligible spin magnitude,
finding support for misalignment at smaller minimum tilt
angles. The inferred population distribution of spin mis-
alignment has important consequences for understand-
ing the formation channels of binary black-hole channels.
This debate highlights how astrophysical inferences can
be affected by model design.
In order to help alleviate some of the issues arising from
model misspecification in Bayesian inference, we present
a framework for assessing the suitability of a model.
This framework is built around the concept of the maxi-
mum population likelihood
L(pronounced “L stroke”)—
the largest possible value of L(d|Λ) in Eq. (1), maximized
over all possible choices of population model π(θ|Λ) in-
dependent of the choice of parameterization. The “prior”
distribution, which yields this maximum is
π(θ) (pro-
nounced “pi stroke”). It is not a true prior because it is
determined by the data. The theory behind the maxi-
mization of population likelihoods has been studied pre-
viously in optimization and statistics literature [3944].
This work is underpinned by Carath´eodory’s theorem [45]
and the mathematics of convex hulls [43]. However, its
application to observational science has been somewhat
limited as far as we can tell.
The
Lframework is useful for several reasons. First,
the numerical value of
Lis an upper bound on the popula-
tion likelihood. We can compare the maximum likelihood
for a specific model
Lmax(M) = max
Λp|d)L(d|Λ, M) (2)
to
L. Often in Bayesian model selection, the Bayesian
evidence values (Zi) of two hypotheses can be used to
determine the extent to which one model is preferred
over the other. A typical threshold chosen to rule out
one model in favor of another is that ln(Z1/Z2)>8 [46].
In a similar vein, if ln(
L/Lmax(M)) .8, we can be sure
the model Mis not badly misspecified since there is no
second model M0that can be written down with that will
yield a statistically significant improvement. We empha-
size that a model which does not satisfy this condition is
not necessarily misspecified.
Second, the
Lframework can be used to quantitatively
assess if a model Mis misspecified. By generating syn-
thetic data from M, one can generate the expected dis-
tribution of (
L,Lmax(M)). In this paper, we show how
one can compare the observed values of (
L,Lmax(M)) to
the expected distribution in order to determine the ex-
tent to which Mis misspecified—and the way in which
it is misspecified.
Third, the
Lframework can be used for “model
exploration”—providing clues of where in parameter
space unmodeled features might be lurking. By compar-
ing
π(θ) with the prior from our phenomenological model
π(θ|M), one can see if the phenomenological model is
capturing key structure present in
πand use the compari-
son to design new models to test on forthcoming datasets.
The remainder of this paper is organized as follows.
In Sec. II, we introduce the
Lformalism, illustrating key
features with a simple toy model. In Sec. III, we show
how the formalism can be used for model criticism. In
Sec. IV, we apply the formalism to study the population
properties of merging binary black holes observed by the
LVK. Our concluding remarks are presented in Sec. V.
II. THE MAXIMUM POPULATION
LIKELIHOOD
L
A. Preliminaries
We begin with a brief review of Bayesian hierarchical
inference with a parametric model. Our starting point is
the population likelihood (copied here from Eq. (1)):
L(d|Λ, M) =
N
Y
i
1
ξ(Λ) ZdθiL(di|θi)π(θi|Λ, M).(3)
Here, L(di|θi) is the likelihood of event-idata digiven
parameters θi. The quantity π(θi|Λ, M ) is a conditional
prior for θigiven hyper-parameters for some population
model M, which describes the shape of the prior distri-
bution. The term ξ(Λ) accounts for selection effects; for
example, high-mass systems are typically easier to detect
than low-mass systems. It is the detectable fraction of
the population given the model given hyper-parameters
Λ
ξ(Λ) = Zdθ pdet(θ)π(θ|Λ, M).(4)
Here, pdet(θ) is the detection probability of an observa-
tion with parameters θ.
B. The maximum population likelihood
L
The maximum population likelihood
Lis obtained
by taking Eq. (3) and maximizing over all possible prior
distributions π(θ). Thus,
Lis an upper bound (or supre-
mum) on the set of likelihoods from all possible choices
of models for π(θ) such that
L≡L(d|
M)≥ L(d|Λ, M),(5)
for all models M. The “prior” distribution that yields
L
is denoted
π(θ) (6)
3
(pronounced “pi stroke”). It is not a true prior because
the distribution which maximizes the population likeli-
hood in Eq. (3) depends on the data. One should there-
fore refer to
πas a pseudo-prior. The associated model
is denoted
M(pronounced “M stroke”). Combining this
notation into a single equation, we have
L ≡
N
Y
i=1
1
ξ(
M)ZdθiL(di|θi)
π(θi).(7)
C. Calculating
π: special cases
Having introduced the concept of
Land
π, the nat-
ural next question is: given data d, how does one cal-
culate these quantities? Before answering this question,
we study three special cases where we can work out
π
from intuition. This discussion will help sharpen our in-
stincts for the more general solution that follows. Read-
ers looking to skip to the punchline may wish to skip this
subsection.
1. A single measurement
For the first case, we consider a single measurement
(N= 1) with a unimodal likelihood function L(d|θ),
which is maximal when the parameter θis equal to the
maximum likelihood value b
θ. For the sake of simplicity,
we ignore selection effects so that ξ(
M) = 1. In this case,
Lin Eq. (7) is clearly maximized if the prior support is
entirely concentrated at b
θ. Thus,
πis a delta function
π(θ) = δ(θb
θ),(8)
which yields
L=ZdθL(d|θ)δ(θb
θ)
=L(d|b
θ).(9)
This result is intuitive: the prior that maximizes the pop-
ulation likelihood is the one that concentrates all its sup-
port at the maximum-likelihood value of θ.
2. Nsignals in the high-SNR Limit
For the second case, we consider a scenario in which the
data consists of Nobservations carried out in the high-
SNR limit. In this limit, the likelihood of the data for
each measurement digiven some parameter θapproaches
a delta function
L(di|θi) = δ(θib
θi),(10)
located at the maximum-likelihood value b
θi. We as-
sume that each measurement is distinct so that no
two maximum-likelihood values b
θiare exactly the same.
Again, for the sake of simplicity, we ignore selection ef-
fects so that ξ(
M) = 1, though, the argument here holds
even if we relax this assumption. Equation (7) becomes
L=
N
Y
i=1 Zdθiδ(θib
θi)
π(θi).(11)
The population likelihood is maximized when
πis a sum
of delta functions peaking at the set of {b
θi}:
π(θ) =
N
X
k=1
wkδ(θb
θk) (12)
wk=1/N. (13)
This solution for
πensures that there is maximal prior
support at every likelihood peak. Obviously, the popula-
tion likelihood is not maximized if any prior probability
density is wasted to values of θwhere all the likelihood
functions are zero. Choosing an equal weight for each
delta function wi= 1/N produces the largest possible
population likelihood [47].
We illustrate this case in Fig. 1(a) using high-SNR,
toy-model data drawn from a mean-zero, unit-variance
Gaussian distribution. In the top-panel, we plot the set
of N= 10 maximum likelihood points {b
θi}and the po-
sition of the delta functions (blue). In the lower panel,
we “plot” the
π(θ) for these ten data points. We put the
word “plot” in quotation marks because, technically, we
are not plotting
π(θ), which goes to infinity, but rather
we are plotting the weights wk(Eq. (13)), which allows us
to see the relative weight given to each delta function—
something that will prove useful below. Throughout the
paper, when we refer to plots of
π(θ), it should be un-
derstood that we are actually plotting representations of
π(θ) using the weights wk. Finally, note that each peak
in the distribution of
π(θ) matches up with one of the
maximum likelihood points in the upper panel.
3. Nidentical measurements
For the third case, we consider a set of Nobser-
vations. This time, we do not assume the high-SNR
limit, but we assume that every measurement has the
same maximum-likelihood value of b
θ. This case is highly
contrived—one does not typically work with multiple
identical measurements—but the example is nonetheless
helpful for illustrative purposes. In this case, the integral
in Eq. (7) is maximized when the prior support is entirely
concentrated at b
θ(where all of the likelihood functions
peak), so that
πis a single delta function:
π(θ) = δ(θb
θ),(14)
while
L=
N
Y
i=1
L(di|b
θ).(15)
4
21 0 1 2
θ
0.0
0.2
0.4
0.6
0.8
1.0
Delta function weights, wk
(a)
21 0 1 2
θ
(b)
21 0 1 2
θ
(c)
FIG. 1. Examples of the distribution
π(θ) described in Subsections II B-II D. Each column represents a different dataset. The
top-panel dots show the set of N= 10 maximum-likelihood estimates {
b
θi}. The top-panel horizontal lines represent error bars;
(in the first column they are too small to see), and the vertical lines (blue) indicate the inferred delta function locations. The
bottom panels show the distribution of
π(θ) associated with each data set. The left-hand column (a) represents data in the
high-SNR limit so that the likelihood functions for each measurement approach delta functions (this is why the error bars are
not visible). In this case,
π(θ) consists of Ndelta functions, each associated with one of the maximum likelihood points
b
θi.
In the middle column (b), we are no longer in the high-SNR limit, but the maximum likelihood points are all assumed to be
identical with
b
θi= 0. In this case,
π(θ) consists of one delta function peaking at θ= 0. In the right-hand column (c), the data
are not in the high-SNR limit, and each
b
θiis random. In this case,
π(θ) consists of n= 3 delta functions, each with different
heights.
This scenario is demonstrated in Fig. 1(b). The top panel
shows the set of N= 10 maximum-likelihood points {b
θi},
all with the same value. The horizontal lines represent
the error bars for each measurement, which we draw from
a uniform distribution on the interval (0.01,1). In the
lower panel, we plot
π(θ) for these ten data points. This
time, since every measurement is identical,
π(θ) is a single
delta function peaking at θ= 0.
From these three examples, we observe a pattern: in
each case,
π(θ) can be written as a weighted sum of
delta functions. Indeed, it has been proven that this
is in fact the case [3944]. We refer readers interested
in an explanation of the delta function structure of
πto
Appendix A, where we summarize the key concepts sur-
rounding the proof outlined in Ref. [43] using the math-
ematics of convex hulls. We do not reproduce the proof
in its entirety, but rather we use visualisations to explain
how it works with N= 2 observations, before providing
a qualitative explanation for how it generalizes to arbi-
trary values of N. We explore this general structure and
the consequences thereof in the next subsection.
D. The general form of
π
We proceed with the knowledge that Eq. (7) is true in
general, regardless of the form of the likelihood L(d|θ)
and the selection effect term pdet(θ). For any set of ob-
servations,
π(θ)is always of the form,
π(θ) =
n
X
k=1
wkδ(θθk),(16)
where wkare weights which sum to unity
n
X
k=1
wk= 1.(17)
The number of delta function is always less than or equal
to the number of measurements and the solution is unique
in all but the most pathological of cases (e.g., multimodal
distributions with regions of equivalent maximum likeli-
hoods) so that
nN. (18)
The ratio
I n/N, (19)
is a measure of the “informativeness” of the data. It
compares the typical likelihood width to the scatter in
the astrophysical distribution. In the high-SNR limit,
I= 1, since a delta function is required for every data
摘要:

Modelexplorationingravitational-waveastronomywiththemaximumpopulationlikelihoodEthanPayne1,2,3,4,aandEricThrane3,4,b1DepartmentofPhysics,CaliforniaInstituteofTechnology,Pasadena,California91125,USA2LIGOLaboratory,CaliforniaInstituteofTechnology,Pasadena,California91125,USA3SchoolofPhysicsandAstronom...

展开>> 收起<<
Model exploration in gravitational-wave astronomy with the maximum population likelihood Ethan Payne1 2 3 4 aand Eric Thrane3 4 b.pdf

共20页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:20 页 大小:1.41MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 20
客服
关注