Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference Maximilian Dax1Stephen R. Green2Jonathan Gair2Michael P urrer2 3 4 Jonas Wildberger1Jakob H. Macke1 5Alessandra Buonanno2 6and Bernhard Sch olkopf1

2025-04-24 0 0 2.07MB 15 页 10玖币

侵权投诉

Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference

Maximilian Dax,

1, ∗

Stephen R. Green,

2, †

Jonathan Gair,

Michael P¨urrer,

2, 3, 4

Jonas Wildberger,

Jakob H. Macke,

1, 5

Alessandra Buonanno,

2, 6

and Bernhard Sch¨olkopf

Max Planck Institute for Intelligent Systems, Max-Planck-Ring 4, 72076 T¨ubingen, Germany

Max Planck Institute for Gravitational Physics (Albert Einstein Institute), Am M¨uhlenberg 1, 14476 Potsdam, Germany

Department of Physics, East Hall, University of Rhode Island, Kingston, RI 02881, USA

URI Research Computing, Tyler Hall, University of Rhode Island, Kingston, RI 02881, USA

Machine Learning in Science, University of T¨ubingen, 72076 T¨ubingen, Germany

Department of Physics, University of Maryland, College Park, MD 20742, USA

We combine amortized neural posterior estimation with importance sampling for fast and accurate

gravitational-wave inference. We ﬁrst generate a rapid proposal for the Bayesian posterior using

neural networks, and then attach importance weights based on the underlying likelihood and prior.

This provides (1) a corrected posterior free from network inaccuracies, (2) a performance diagnostic

(the sample eﬃciency) for assessing the proposal and identifying failure cases, and (3) an unbiased

estimate of the Bayesian evidence. By establishing this independent veriﬁcation and correction

mechanism we address some of the most frequent criticisms against deep learning for scientiﬁc

inference. We carry out a large study analyzing 42 binary black hole mergers observed by LIGO and

Virgo with the SEOBNRv4PHM and IMRPhenomXPHM waveform models. This shows a median

sample eﬃciency of

≈

10% (two orders-of-magnitude better than standard samplers) as well as a

ten-fold reduction in the statistical uncertainty in the log evidence. Given these advantages, we

expect a signiﬁcant impact on gravitational-wave inference, and for this approach to serve as a

paradigm for harnessing deep learning methods in scientiﬁc applications.

Introduction.—Bayesian inference is a key paradigm for

scientiﬁc discovery. In the context of gravitational waves

(GWs), it underlies analyses including individual-event

parameter estimation [

], tests of gravity [

], neutron-star

physics [

], populations [

], and cosmology [

]. Given a

prior

(

) and a model likelihood

(

d|θ

), the Bayesian

posterior

p(θ|d) = p(d|θ)p(θ)

p(d)(1)

summarises, as a probability distribution, our knowl-

edge of the model parameters

after observing data

. When

(

d|θ

) is tractable (as in the case of GWs)

likelihood-based samplers such as Markov chain Monte

Carlo (MCMC) [

] or nested sampling [

] are typically

used to draw samples from the posterior. If it is possible

to sample

d∼p

(

d|θ

) (i.e., simulate data) one can alter-

natively use amortized simulation-based (or likelihood-

free) inference methods [

]. These approaches are based

on deep neural networks and can be several orders-of-

magnitude faster at inference time. For GW inference,

they have also been shown to achieve similar accuracy

to MCMC [

]. In general, however, it is not clear how

well such networks generalize to out-of-distribution data

and they lack diagnostics to be conﬁdent in results [

These powerful approaches are therefore rarely used in

applications where accuracy is important and likelihoods

are tractable.

In this Letter, we achieve the best of both worlds by

combining likelihood-free and likelihood-based methods

for GW parameter estimation. We take samples from

Dingo

[

]—a fast and accurate likelihood-free method

using normalizing ﬂows [

–

]—and treat these as a

proposal for importance sampling [

]. The combined

method (“Dingo-IS”) generates samples from the exact

posterior and now provides an estimate of the Bayesian

evidence

(

). Moreover, the importance sampling eﬃ-

ciency arises as a powerful and objective performance

metric, which ﬂags potential failure cases. Importance

sampling is fully parallelizable.

After describing the method more fully in the follow-

ing section, we verify on two real events that Dingo-

IS produces results consistent with standard inference

codes [

–

]. Our main result is an analysis of 42

events from the Second and Third Gravitational-Wave

Transient Catalogs (GWTC-2 and GWTC-3) [

], us-

ing two waveform models, IMRPhenomXPHM [

] and

SEOBNRv4PHM [

]. Due to the long waveform simula-

tion times, SEOBNRv4PHM inference would take several

months per event with stochastic samplers. However

Dingo-IS with 64 CPU cores takes just 10 hours for

these waveforms. (Initial Dingo samples are available

typically in under a minute.) Our results indicate that

Dingo(-IS) performs well for the majority of events, and

that failure cases are indeed ﬂagged by low sample eﬃ-

ciency. We also ﬁnd that the log evidence is recovered

with statistical uncertainty reduced by a factor of 10

compared to standard samplers.

Machine learning methods have seen numerous appli-

cations in GW astronomy, including to detection and

1Deep INference for Gravitational-wave Observations.

arXiv:2210.05686v2 [gr-qc] 30 May 2023

parameter estimation [

]. For parameter estimation,

these methods have included variational inference [

likelihood ratio estimation [

], and posterior estimation

with normalizing ﬂows [

]. Aside from di-

rectly estimating parameters, normalizing ﬂows have also

been used to accelerate classical samplers, with signiﬁcant

eﬃciency improvements [30].

Neural density estimation and importance sampling

have previously been combined [

] under the guise of

“neural importance sampling” [

], and similar approaches

have been applied in several contexts [

–

]. Our con-

tributions are to (1) extend this to amortized simulation-

based inference, (2) use it to improve results generated

with classical inference methods such as MCMC, and (3)

to highlight how the use of a forward Kullback-Leibler

(KL) loss improves reliability. We also apply it to the

challenging real-world problem of GW inference.

demonstrate results that far outperform classical methods

in terms of sample eﬃciency and parallelizability, while

maintaining accuracy and including simple diagnostics.

We therefore expect this work to accelerate the devel-

opment and veriﬁcation of probabilistic deep learning

approaches across science.

Method.—Dingo trains a conditional density-

estimation neural network

(

θ|d

) to approximate

(

θ|d

)

based on simulated data sets (

θ, d

) with

θ∼p

(

d∼p

(

d|θ

)—an approach called neural posterior

estimation (NPE) [

]. Once trained, Dingo can

rapidly produce (approximate) posterior samples for any

measured data

. In practice, results may deviate from

the true posterior due to insuﬃcient training, lack of

network expressivity, or out-of-distribution (OOD) data

(i.e., data inconsistent with the training distribution).

Although it was shown in [

] that these deviations are

often negligible, veriﬁcation of results requires comparing

against expensive standard samplers.

Here, we describe an eﬃcient method to verify and cor-

rect Dingo results using importance sampling (IS) [

Starting from a collection of

samples

θi∼q

(

θ|d

)

(the “proposal”) we assign to each one an importance

weight

(

d|θi

)

(

θi

)

(

θi|d

). For a perfect pro-

posal,

constant

, but more generally the num-

ber of eﬀective samples is related to the variance,

neﬀ

= (

Piwi

)

2/Pi

(

) [

]. The sample eﬃciency

neﬀ/n ∈

1] arises naturally as a quality measure

of the proposal.

Importance sampling requires evaluation of

(

d|θ

)

(

)

rather than the normalized posterior. The Bayesian evi-

dence can then be estimated from the normalization of

A similar approach using convolutional networks to parametrize

Gaussian and von Mises proposals was used to estimate the sky

position alone [

] Using the normalizing ﬂow proposal (as we

do here) signiﬁcantly improves the ﬂexiblity of the conditional

density estimator and enables inference of all parameters.

the weights as

(

)=1

/n Piwi

. The standard devia-

tion of the log evidence,

σlog p(d)

p(1 −ϵ)/(n·ϵ)

(see

Supplemental Material), scales with 1

/√n

, enabling very

precise estimates. The evidence is furthermore unbiased

if the support of the posterior is fully covered by the

proposal distribution [

]. The log evidence does have a

bias, but this scales as 1

, and in all cases considered

here is completely negligible (see Supplemental Material).

(

θ|d

) fails to cover the entire posterior, the evidence

itself would also be biased, toward lower values.

NPE is particularly well-suited for IS because of two

key properties. First, by construction the proposal has

tractable density, such that we can not only sample

from

(

θ|d

), but also evaluate it. Second, the NPE pro-

posal is expected to always cover the entire posterior

support. This is because, during training, NPE min-

imizes the forward KL divergence

DKL

(

θ|d

)

||q

(

θ|d

)).

This diverges unless

supp

(

θ|d

))

⊆supp

(

θ|d

)), making

the loss “probability-mass covering”. Probability mass

coverage is not guaranteed for ﬁnite sets of samples gen-

erated with stochastic samplers like MCMC (which can

miss distributional modes), or machine learning meth-

ods with other training objectives like variational infer-

ence [12, 41, 42].

Neural importance sampling can in fact be used to

improve posterior samples from any inference method

provided the likelihood is tractable. If the method pro-

vides only samples (without density) then one must ﬁrst

train an (unconditional) density estimator

(

) (e.g., a

normalizing ﬂow [

]) to use as proposal. This is

generally fast for an unconditional ﬂow, and using the

forward KL loss guarantees that the proposal will cover

the samples. Success, however, relies on the quality of the

initial samples: if they are light-tailed, sample eﬃciency

will be poor, and if they are not mass-covering, the evi-

dence will be biased. Nevertheless, for initial samples that

well represent the posterior, this technique can provide

quick veriﬁcation and improvement.

In the context of GWs, we refer to neural importance

sampling with Dingo as Dingo-IS. Although this tech-

nique requires likelihood evaluations at inference time,

in practice it is much faster than other likelihood-based

methods because of its high sample eﬃciency and par-

allelizability. Indeed, Dingo samples are independent

and identically distributed, trivially enabling full par-

allelization of likelihood evaluations. This is a crucial

advantage compared to inherently sequential methods

such as MCMC.

Results.—For our experiments, we prepare Dingo net-

works as described in [

], with several modiﬁcations.

First, we extend the priors over component masses to

m1, m2∈

[10

120] M

⊙

and dimensionless spin magnitudes

a1, a2∈

99]. We also use the waveform models IM-

RPhenomXPHM [

] and SEOBNRv4PHM [

], which

include higher radiative multipoles and more realistic

precession. Finally, in addition to networks for the ﬁrst

Mean JSD Max JSD log p(d)

Dingo 2.2 7.2 (α) -

Dingo-IS 0.5 1.4 (dL)−15831.87 ±0.01

Bilby 1.8 4.0 (dL)−15831.78 ±0.10

Dingo 9.0 53.4 (Mc) -

Dingo-IS 0.7 2.2 (α)−16412.88 ±0.01

Bilby 1.1 4.1 (α)−16412.73 ±0.09

Table I. Performance for GW150914 (upper block) and

GW151012 (lower) with waveform model IMRPhenomXPHM.

The Jensen-Shannon divergence (JSD) quantiﬁes the deviation

from LALInference-MCMC for one-dimensional marginal

posteriors (all values in 10

−3

nat). The mean is taken across all

parameters. Posteriors with a maximum JSD

≤

−3

nat

are considered indistinguishable [

]; here, maxima occur for

right ascension

, luminosity distance

, and chirp mass

We also report Bilby-dynesty results.

observing run of LIGO and Virgo (O1), we also train

networks based on O3 noise. For the O3 analyses, we

found performance improved by training separate Dingo

models with distance priors [0

3] Gpc, [0

6] Gpc and

12] Gpc. We continue to use frequency-domain strain

data in the range [20

1024] Hz with ∆

= 0

125 Hz and

identical data conditioning as in [

]. The network archi-

tecture, hyperparameters, and training algorithm are also

unchanged. We consider the two LIGO [

] detectors for

all analyses, and leave inclusion of Virgo [

] data to a

future publication of a complete catalog.

In our experiments, we found that Dingo often has

diﬃculty resolving the phase parameter

ϕc

. Although

ϕc

itself is of little physical interest, it is nevertheless needed

to evaluate the likelihood for importance sampling. We

therefore sample

ϕc

synthetically, by ﬁrst evaluating the

likelihood across a

ϕc

grid and caching the waveform

modes for eﬃciency (see Supplemental Material). This

approach is similar to standard phase marginalization [

], but it is valid even with higher modes; it can

therefore be adapted also to stochastic samplers.

For Dingo-IS, with 10

proposal samples per event,

the total time for inference using one NVIDIA A100

GPU and 64 CPU cores is typically less than 1 hour for

IMRPhenomXPHM and

≈

10 hours for SEOBNRv4PHM.

In both cases, the computation time is dominated by

waveform simulations, which could be further reduced

using more CPUs. The rest of the time is taken up to

generate the initial Dingo proposal samples.3

We ﬁrst validate Dingo-IS against standard inference

codes for two real events, GW150914 and GW151012,

using IMRPhenomXPHM. (For SEOBNRv4PHM it is

It takes longer to generate the proposal than to produce low-

latency Dingo samples (

≈

) because of the group-equivariant

NPE (GNPE) algorithm [

] (which breaks access to the

density) and the synthetic phase recovery. See Supplemental

Material for details.

LALInference (IMRPhenomXPHM)

Dingo (IMRPhenomXPHM)

0.25

0.50

0.75

Mc[M]

−0.6

0.0

0.6

1.2

0.25

0.50

0.75

−0.6

0.0

0.6

1.2

LALInference (IMRPhenomXPHM)

Dingo-IS (IMRPhenomXPHM)

Dingo-IS (SEOBNRv4PHM)

0.25

0.50

0.75

Mc[M]

−0.6

0.0

0.6

1.2

0.25

0.50

0.75

−0.6

0.0

0.6

1.2

Figure 1. Chirp mass (

), mass ratio (

) and sky position

(

α, δ

) parameters for GW151012, comparing inference with

Dingo and LALInference-MCMC. Even when initial Dingo

results deviate from LALInference posteriors (upper panel),

IS leads to almost perfect agreement (lower). For comparison,

the lower panel also shows results for SEOBNRv4PHM.

not feasible to run classical samplers, and one would in-

stead need to use faster methods such as RIFT [

].)

We generate reference posteriors using LALInference-

MCMC [

], and compare one-dimensional marginalized

posteriors for each parameter using the Jensen-Shannon

divergence (Tab. I). For both events, the initial small

deviations of Dingo samples from the reference are made

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

NeuralImportanceSamplingforRapidandReliableGravitational-WaveInferenceMaximilianDax,1,∗StephenR.Green,2,†JonathanGair,2MichaelP¨urrer,2,3,4JonasWildberger,1JakobH.Macke,1,5AlessandraBuonanno,2,6andBernhardSch¨olkopf11MaxPlanckInstituteforIntelligentSystems,Max-Planck-Ring4,72076T¨ubingen,Germany2Max...

展开>> 收起<<

Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference Maximilian Dax1Stephen R. Green2Jonathan Gair2Michael P urrer2 3 4 Jonas Wildberger1Jakob H. Macke1 5Alessandra Buonanno2 6and Bernhard Sch olkopf1.pdf

共15页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference Maximilian Dax1Stephen R. Green2Jonathan Gair2Michael P urrer2 3 4 Jonas Wildberger1Jakob H. Macke1 5Alessandra Buonanno2 6and Bernhard Sch olkopf1

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: