Model exploration in gravitational-wave astronomy with the maximum population likelihood Ethan Payne1 2 3 4 aand Eric Thrane3 4 b

2025-05-06 0 0 1.41MB 20 页 10玖币

侵权投诉

Model exploration in gravitational-wave astronomy with the

maximum population likelihood

Ethan Payne1, 2, 3, 4, aand Eric Thrane3, 4, b

1Department of Physics, California Institute of Technology, Pasadena, California 91125, USA

2LIGO Laboratory, California Institute of Technology, Pasadena, California 91125, USA

3School of Physics and Astronomy, Monash University, VIC 3800, Australia

4OzGrav: The ARC Centre of Excellence for Gravitational-Wave Discovery, Clayton, VIC 3800, Australia

Hierarchical Bayesian inference is an essential tool for studying the population properties of com-

pact binaries with gravitational waves. The basic premise is to infer the unknown prior distribution

of binary black hole and/or neutron star parameters such component masses, spin vectors, and

redshift. These distributions shed light on the fate of massive stars, how and where binaries are

assembled, and the evolution of the Universe over cosmic time. Hierarchical analyses model the

binary black hole population using a prior distribution conditioned on hyper-parameters, which are

inferred from the data. However, a misspeciﬁed model can lead to faulty astrophysical inferences.

In this paper we answer the question: given some data, which prior distribution––from the set of all

possible prior distributions––produces the largest possible population likelihood? This distribution

(which is not a true prior) is –

π(pronounced “pi stroke”), and the associated maximum population

likelihood is –

L(pronounced “L stroke”). The structure of –

πis a linear superposition of delta func-

tions, a result which follows from Carath´eodory’s theorem. We show how –

πand –

Lcan be used for

model exploration/criticism. We apply this –

Lformalism to study the population of binary black hole

mergers observed in LIGO–Virgo–KAGRA’s third Gravitational-Wave Transient Catalog. Based on

our results, we discuss possible improvements for gravitational-wave population models.

I. MOTIVATION

Bayesian inference has become a mainstay of mod-

ern scientiﬁc data analysis as a means of analysing sig-

nals in noisy observations. This procedure determines

the posterior distributions for parameters given one or

more model. In order to study the population prop-

erties of a set of uncertain observations, a hierarchi-

cal Bayesian framework can be employed. The basic

idea is to model the population using a conditional prior

π(θ|Λ, M), which describes, for example, the distribution

of black hole masses {m1, m2} ∈ θgiven some hyper-

parameters Λ, which determine the shape of the prior

distribution. Here, Mdenotes the choice of model. One

then carries out Bayesian inference using a “population

likelihood”

L(d|Λ, M) =

ξ(Λ) ZdθiL(di|θi)π(θi|Λ, M),(1)

where L(di|θi) is the likelihood for data associated with

event igiven parameters θi, and ξ(Λ) is the detected

fraction for a choice of hyper-parameters. Meanwhile,

Nis the total number of observations. For an overview

of hierarchical modeling in gravitational-wave astronomy

including selection eﬀects, see Refs. [1–3].

The LIGO-Virgo-KAGRA (LVK) Collaboration’s

third gravitational-wave transient catalog (GWTC-3) [4]

contains the cumulative set of observations of N= 69

aepayne@caltech.edu

beric.thrane@monash.edu

conﬁdent binary black-hole mergers [5] detected by the

LVK [6–8]. Additional detection candidates have been

put forward by independent groups [9–13]. Hierarchical

inference is employed to study the population properties

these merging binary black holes; see, e.g., Refs. [14–32].

These analyses have revealed a number of exciting re-

sults, such as the surprising excess rate of mergers with

a primary black hole mass of ∼35 M[15], and the evo-

lution of the binary merger rate with redshift [16], to

name just two.

However, Bayesian inference has its limitations. One

can use Eq. (1) in order to infer the distribution of binary

black hole parameters—given some model ; and one can

compare the marginal likelihoods of two models to see

which one better describes the data. However, Bayesian

inference does not tell us if any of the models we are using

are suitable descriptions of the data. While all models for

the distribution of binary black hole parameters are likely

to be imperfect, some may be adequate for describing our

current dataset [33]. When a model fails to capture some

salient feature of the data, it is said to be “misspeciﬁed”

[34,35]. Some eﬀort has been made to assess the suit-

ability of gravitational-wave models, both qualitatively

and quantitatively; see, e.g., [15,16,34,36]. However,

the idea of “model criticism”—testing the suitability of

Bayesian models—is still being developed within the con-

text of gravitational-wave astronomy and beyond.

Hierarchical Bayesian inference studies often depend

upon parametric models. Modelers design parameteri-

zations in order to capture the key features of the as-

trophysical distributions. However, one must still worry

about “unknown unknowns”—features which do not oc-

cur to the modeler to add. For example, recent stud-

ies [15,16,37,38] ﬁnd a sub-population of binary black

arXiv:2210.11641v2 [astro-ph.IM] 22 Feb 2023

holes merge with spin vectors that are misaligned with

respect to the orbital angular momentum axis. How-

ever, the degree to which the spins are misaligned might

be model dependent. In Refs. [15,16,37], the inferred

minimum spin tilt is conﬁdently &90◦. In contrast,

Refs. [17,28,38] argue this signature could be due to

a lack of ﬂexibility in LVK models to account for a sub-

population of black holes with negligible spin magnitude,

ﬁnding support for misalignment at smaller minimum tilt

angles. The inferred population distribution of spin mis-

alignment has important consequences for understand-

ing the formation channels of binary black-hole channels.

This debate highlights how astrophysical inferences can

be aﬀected by model design.

In order to help alleviate some of the issues arising from

model misspeciﬁcation in Bayesian inference, we present

a framework for assessing the suitability of a model.

This framework is built around the concept of the maxi-

mum population likelihood –

L(pronounced “L stroke”)—

the largest possible value of L(d|Λ) in Eq. (1), maximized

over all possible choices of population model π(θ|Λ) in-

dependent of the choice of parameterization. The “prior”

distribution, which yields this maximum is –

π(θ) (pro-

nounced “pi stroke”). It is not a true prior because it is

determined by the data. The theory behind the maxi-

mization of population likelihoods has been studied pre-

viously in optimization and statistics literature [39–44].

This work is underpinned by Carath´eodory’s theorem [45]

and the mathematics of convex hulls [43]. However, its

application to observational science has been somewhat

limited as far as we can tell.

The –

Lframework is useful for several reasons. First,

the numerical value of –

Lis an upper bound on the popula-

tion likelihood. We can compare the maximum likelihood

for a speciﬁc model

Lmax(M) = max

Λ∼p(Λ|d)L(d|Λ, M) (2)

to –

L. Often in Bayesian model selection, the Bayesian

evidence values (Zi) of two hypotheses can be used to

determine the extent to which one model is preferred

over the other. A typical threshold chosen to rule out

one model in favor of another is that ln(Z1/Z2)>8 [46].

In a similar vein, if ln(–

L/Lmax(M)) .8, we can be sure

the model Mis not badly misspeciﬁed since there is no

second model M0that can be written down with that will

yield a statistically signiﬁcant improvement. We empha-

size that a model which does not satisfy this condition is

not necessarily misspeciﬁed.

Second, the –

Lframework can be used to quantitatively

assess if a model Mis misspeciﬁed. By generating syn-

thetic data from M, one can generate the expected dis-

tribution of (–

L,Lmax(M)). In this paper, we show how

one can compare the observed values of (–

L,Lmax(M)) to

the expected distribution in order to determine the ex-

tent to which Mis misspeciﬁed—and the way in which

it is misspeciﬁed.

Third, the –

Lframework can be used for “model

exploration”—providing clues of where in parameter

space unmodeled features might be lurking. By compar-

ing –

π(θ) with the prior from our phenomenological model

π(θ|M), one can see if the phenomenological model is

capturing key structure present in –

πand use the compari-

son to design new models to test on forthcoming datasets.

The remainder of this paper is organized as follows.

In Sec. II, we introduce the –

Lformalism, illustrating key

features with a simple toy model. In Sec. III, we show

how the formalism can be used for model criticism. In

Sec. IV, we apply the formalism to study the population

properties of merging binary black holes observed by the

LVK. Our concluding remarks are presented in Sec. V.

II. THE MAXIMUM POPULATION

LIKELIHOOD –

A. Preliminaries

We begin with a brief review of Bayesian hierarchical

inference with a parametric model. Our starting point is

the population likelihood (copied here from Eq. (1)):

L(d|Λ, M) =

ξ(Λ) ZdθiL(di|θi)π(θi|Λ, M).(3)

Here, L(di|θi) is the likelihood of event-idata digiven

parameters θi. The quantity π(θi|Λ, M ) is a conditional

prior for θigiven hyper-parameters for some population

model M, which describes the shape of the prior distri-

bution. The term ξ(Λ) accounts for selection eﬀects; for

example, high-mass systems are typically easier to detect

than low-mass systems. It is the detectable fraction of

the population given the model given hyper-parameters

ξ(Λ) = Zdθ pdet(θ)π(θ|Λ, M).(4)

Here, pdet(θ) is the detection probability of an observa-

tion with parameters θ.

B. The maximum population likelihood –

The maximum population likelihood –

Lis obtained

by taking Eq. (3) and maximizing over all possible prior

distributions π(θ). Thus, –

Lis an upper bound (or supre-

mum) on the set of likelihoods from all possible choices

of models for π(θ) such that

–

L≡L(d|–

M)≥ L(d|Λ, M),(5)

for all models M. The “prior” distribution that yields –

is denoted

–

π(θ) (6)

(pronounced “pi stroke”). It is not a true prior because

the distribution which maximizes the population likeli-

hood in Eq. (3) depends on the data. One should there-

fore refer to –

πas a pseudo-prior. The associated model

is denoted –

M(pronounced “M stroke”). Combining this

notation into a single equation, we have

–

L ≡

i=1

ξ(–

M)ZdθiL(di|θi)–

π(θi).(7)

C. Calculating –

π: special cases

Having introduced the concept of –

Land –

π, the nat-

ural next question is: given data d, how does one cal-

culate these quantities? Before answering this question,

we study three special cases where we can work out –

from intuition. This discussion will help sharpen our in-

stincts for the more general solution that follows. Read-

ers looking to skip to the punchline may wish to skip this

subsection.

1. A single measurement

For the ﬁrst case, we consider a single measurement

(N= 1) with a unimodal likelihood function L(d|θ),

which is maximal when the parameter θis equal to the

maximum likelihood value b

θ. For the sake of simplicity,

we ignore selection eﬀects so that ξ(–

M) = 1. In this case,

–

Lin Eq. (7) is clearly maximized if the prior support is

entirely concentrated at b

θ. Thus, –

πis a delta function

–

π(θ) = δ(θ−b

θ),(8)

which yields

–

L=ZdθL(d|θ)δ(θ−b

θ)

=L(d|b

θ).(9)

This result is intuitive: the prior that maximizes the pop-

ulation likelihood is the one that concentrates all its sup-

port at the maximum-likelihood value of θ.

2. Nsignals in the high-SNR Limit

For the second case, we consider a scenario in which the

data consists of Nobservations carried out in the high-

SNR limit. In this limit, the likelihood of the data for

each measurement digiven some parameter θapproaches

a delta function

L(di|θi) = δ(θi−b

θi),(10)

located at the maximum-likelihood value b

θi. We as-

sume that each measurement is distinct so that no

two maximum-likelihood values b

θiare exactly the same.

Again, for the sake of simplicity, we ignore selection ef-

fects so that ξ(–

M) = 1, though, the argument here holds

even if we relax this assumption. Equation (7) becomes

–

i=1 Zdθiδ(θi−b

θi)–

π(θi).(11)

The population likelihood is maximized when –

πis a sum

of delta functions peaking at the set of {b

θi}:

–

π(θ) =

k=1

wkδ(θ−b

θk) (12)

wk=1/N. (13)

This solution for –

πensures that there is maximal prior

support at every likelihood peak. Obviously, the popula-

tion likelihood is not maximized if any prior probability

density is wasted to values of θwhere all the likelihood

functions are zero. Choosing an equal weight for each

delta function wi= 1/N produces the largest possible

population likelihood [47].

We illustrate this case in Fig. 1(a) using high-SNR,

toy-model data drawn from a mean-zero, unit-variance

Gaussian distribution. In the top-panel, we plot the set

of N= 10 maximum likelihood points {b

θi}and the po-

sition of the delta functions (blue). In the lower panel,

we “plot” the –

π(θ) for these ten data points. We put the

word “plot” in quotation marks because, technically, we

are not plotting –

π(θ), which goes to inﬁnity, but rather

we are plotting the weights wk(Eq. (13)), which allows us

to see the relative weight given to each delta function—

something that will prove useful below. Throughout the

paper, when we refer to plots of –

π(θ), it should be un-

derstood that we are actually plotting representations of

–

π(θ) using the weights wk. Finally, note that each peak

in the distribution of –

π(θ) matches up with one of the

maximum likelihood points in the upper panel.

3. Nidentical measurements

For the third case, we consider a set of Nobser-

vations. This time, we do not assume the high-SNR

limit, but we assume that every measurement has the

same maximum-likelihood value of b

θ. This case is highly

contrived—one does not typically work with multiple

identical measurements—but the example is nonetheless

helpful for illustrative purposes. In this case, the integral

in Eq. (7) is maximized when the prior support is entirely

concentrated at b

θ(where all of the likelihood functions

peak), so that –

πis a single delta function:

–

π(θ) = δ(θ−b

θ),(14)

while

–

i=1

L(di|b

θ).(15)

−2−1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Delta function weights, wk

(a)

−2−1 0 1 2

(b)

−2−1 0 1 2

(c)

FIG. 1. Examples of the distribution –

π(θ) described in Subsections II B-II D. Each column represents a diﬀerent dataset. The

top-panel dots show the set of N= 10 maximum-likelihood estimates {

θi}. The top-panel horizontal lines represent error bars;

(in the ﬁrst column they are too small to see), and the vertical lines (blue) indicate the inferred delta function locations. The

bottom panels show the distribution of –

π(θ) associated with each data set. The left-hand column (a) represents data in the

high-SNR limit so that the likelihood functions for each measurement approach delta functions (this is why the error bars are

not visible). In this case, –

π(θ) consists of Ndelta functions, each associated with one of the maximum likelihood points

θi.

In the middle column (b), we are no longer in the high-SNR limit, but the maximum likelihood points are all assumed to be

identical with

θi= 0. In this case, –

π(θ) consists of one delta function peaking at θ= 0. In the right-hand column (c), the data

are not in the high-SNR limit, and each

θiis random. In this case, –

π(θ) consists of n= 3 delta functions, each with diﬀerent

heights.

This scenario is demonstrated in Fig. 1(b). The top panel

shows the set of N= 10 maximum-likelihood points {b

θi},

all with the same value. The horizontal lines represent

the error bars for each measurement, which we draw from

a uniform distribution on the interval (0.01,1). In the

lower panel, we plot –

π(θ) for these ten data points. This

time, since every measurement is identical, –

π(θ) is a single

delta function peaking at θ= 0.

From these three examples, we observe a pattern: in

each case, –

π(θ) can be written as a weighted sum of

delta functions. Indeed, it has been proven that this

is in fact the case [39–44]. We refer readers interested

in an explanation of the delta function structure of –

πto

Appendix A, where we summarize the key concepts sur-

rounding the proof outlined in Ref. [43] using the math-

ematics of convex hulls. We do not reproduce the proof

in its entirety, but rather we use visualisations to explain

how it works with N= 2 observations, before providing

a qualitative explanation for how it generalizes to arbi-

trary values of N. We explore this general structure and

the consequences thereof in the next subsection.

D. The general form of –

We proceed with the knowledge that Eq. (7) is true in

general, regardless of the form of the likelihood L(d|θ)

and the selection eﬀect term pdet(θ). For any set of ob-

servations, –

π(θ)is always of the form,

–

π(θ) =

k=1

wkδ(θ−θk),(16)

where wkare weights which sum to unity

k=1

wk= 1.(17)

The number of delta function is always less than or equal

to the number of measurements and the solution is unique

in all but the most pathological of cases (e.g., multimodal

distributions with regions of equivalent maximum likeli-

hoods) so that

n≤N. (18)

The ratio

I ≡ n/N, (19)

is a measure of the “informativeness” of the data. It

compares the typical likelihood width to the scatter in

the astrophysical distribution. In the high-SNR limit,

I= 1, since a delta function is required for every data

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Modelexplorationingravitational-waveastronomywiththemaximumpopulationlikelihoodEthanPayne1,2,3,4,aandEricThrane3,4,b1DepartmentofPhysics,CaliforniaInstituteofTechnology,Pasadena,California91125,USA2LIGOLaboratory,CaliforniaInstituteofTechnology,Pasadena,California91125,USA3SchoolofPhysicsandAstronom...

展开>> 收起<<

Model exploration in gravitational-wave astronomy with the maximum population likelihood Ethan Payne1 2 3 4 aand Eric Thrane3 4 b.pdf

共20页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Model exploration in gravitational-wave astronomy with the maximum population likelihood Ethan Payne1 2 3 4 aand Eric Thrane3 4 b

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: