Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference Maximilian Dax1Stephen R. Green2Jonathan Gair2Michael P urrer2 3 4 Jonas Wildberger1Jakob H. Macke1 5Alessandra Buonanno2 6and Bernhard Sch olkopf1

2025-04-24 0 0 2.07MB 15 页 10玖币
侵权投诉
Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference
Maximilian Dax,
1,
Stephen R. Green,
2,
Jonathan Gair,
2
Michael P¨urrer,
2, 3, 4
Jonas Wildberger,
1
Jakob H. Macke,
1, 5
Alessandra Buonanno,
2, 6
and Bernhard Scolkopf
1
1
Max Planck Institute for Intelligent Systems, Max-Planck-Ring 4, 72076 T¨ubingen, Germany
2
Max Planck Institute for Gravitational Physics (Albert Einstein Institute), Am M¨uhlenberg 1, 14476 Potsdam, Germany
3
Department of Physics, East Hall, University of Rhode Island, Kingston, RI 02881, USA
4
URI Research Computing, Tyler Hall, University of Rhode Island, Kingston, RI 02881, USA
5
Machine Learning in Science, University of T¨ubingen, 72076 T¨ubingen, Germany
6
Department of Physics, University of Maryland, College Park, MD 20742, USA
We combine amortized neural posterior estimation with importance sampling for fast and accurate
gravitational-wave inference. We first generate a rapid proposal for the Bayesian posterior using
neural networks, and then attach importance weights based on the underlying likelihood and prior.
This provides (1) a corrected posterior free from network inaccuracies, (2) a performance diagnostic
(the sample efficiency) for assessing the proposal and identifying failure cases, and (3) an unbiased
estimate of the Bayesian evidence. By establishing this independent verification and correction
mechanism we address some of the most frequent criticisms against deep learning for scientific
inference. We carry out a large study analyzing 42 binary black hole mergers observed by LIGO and
Virgo with the SEOBNRv4PHM and IMRPhenomXPHM waveform models. This shows a median
sample efficiency of
10% (two orders-of-magnitude better than standard samplers) as well as a
ten-fold reduction in the statistical uncertainty in the log evidence. Given these advantages, we
expect a significant impact on gravitational-wave inference, and for this approach to serve as a
paradigm for harnessing deep learning methods in scientific applications.
Introduction.—Bayesian inference is a key paradigm for
scientific discovery. In the context of gravitational waves
(GWs), it underlies analyses including individual-event
parameter estimation [
1
], tests of gravity [
2
], neutron-star
physics [
3
], populations [
4
], and cosmology [
5
]. Given a
prior
p
(
θ
) and a model likelihood
p
(
d|θ
), the Bayesian
posterior
p(θ|d) = p(d|θ)p(θ)
p(d)(1)
summarises, as a probability distribution, our knowl-
edge of the model parameters
θ
after observing data
d
. When
p
(
d|θ
) is tractable (as in the case of GWs)
likelihood-based samplers such as Markov chain Monte
Carlo (MCMC) [
6
,
7
] or nested sampling [
8
] are typically
used to draw samples from the posterior. If it is possible
to sample
dp
(
d|θ
) (i.e., simulate data) one can alter-
natively use amortized simulation-based (or likelihood-
free) inference methods [
9
]. These approaches are based
on deep neural networks and can be several orders-of-
magnitude faster at inference time. For GW inference,
they have also been shown to achieve similar accuracy
to MCMC [
10
]. In general, however, it is not clear how
well such networks generalize to out-of-distribution data
and they lack diagnostics to be confident in results [
11
].
These powerful approaches are therefore rarely used in
applications where accuracy is important and likelihoods
are tractable.
In this Letter, we achieve the best of both worlds by
combining likelihood-free and likelihood-based methods
for GW parameter estimation. We take samples from
Dingo
1
[
10
]—a fast and accurate likelihood-free method
using normalizing flows [
12
15
]—and treat these as a
proposal for importance sampling [
16
]. The combined
method (“Dingo-IS”) generates samples from the exact
posterior and now provides an estimate of the Bayesian
evidence
p
(
d
). Moreover, the importance sampling effi-
ciency arises as a powerful and objective performance
metric, which flags potential failure cases. Importance
sampling is fully parallelizable.
After describing the method more fully in the follow-
ing section, we verify on two real events that Dingo-
IS produces results consistent with standard inference
codes [
17
20
]. Our main result is an analysis of 42
events from the Second and Third Gravitational-Wave
Transient Catalogs (GWTC-2 and GWTC-3) [
1
,
21
], us-
ing two waveform models, IMRPhenomXPHM [
22
] and
SEOBNRv4PHM [
23
]. Due to the long waveform simula-
tion times, SEOBNRv4PHM inference would take several
months per event with stochastic samplers. However
Dingo-IS with 64 CPU cores takes just 10 hours for
these waveforms. (Initial Dingo samples are available
typically in under a minute.) Our results indicate that
Dingo(-IS) performs well for the majority of events, and
that failure cases are indeed flagged by low sample effi-
ciency. We also find that the log evidence is recovered
with statistical uncertainty reduced by a factor of 10
compared to standard samplers.
Machine learning methods have seen numerous appli-
cations in GW astronomy, including to detection and
1Deep INference for Gravitational-wave Observations.
arXiv:2210.05686v2 [gr-qc] 30 May 2023
2
parameter estimation [
24
]. For parameter estimation,
these methods have included variational inference [
25
,
26
],
likelihood ratio estimation [
27
], and posterior estimation
with normalizing flows [
10
,
26
,
28
,
29
]. Aside from di-
rectly estimating parameters, normalizing flows have also
been used to accelerate classical samplers, with significant
efficiency improvements [30].
Neural density estimation and importance sampling
have previously been combined [
31
] under the guise of
“neural importance sampling” [
32
], and similar approaches
have been applied in several contexts [
33
36
]. Our con-
tributions are to (1) extend this to amortized simulation-
based inference, (2) use it to improve results generated
with classical inference methods such as MCMC, and (3)
to highlight how the use of a forward Kullback-Leibler
(KL) loss improves reliability. We also apply it to the
challenging real-world problem of GW inference.
2
We
demonstrate results that far outperform classical methods
in terms of sample efficiency and parallelizability, while
maintaining accuracy and including simple diagnostics.
We therefore expect this work to accelerate the devel-
opment and verification of probabilistic deep learning
approaches across science.
Method.—Dingo trains a conditional density-
estimation neural network
q
(
θ|d
) to approximate
p
(
θ|d
)
based on simulated data sets (
θ, d
) with
θp
(
θ
),
dp
(
d|θ
)—an approach called neural posterior
estimation (NPE) [
38
]. Once trained, Dingo can
rapidly produce (approximate) posterior samples for any
measured data
d
. In practice, results may deviate from
the true posterior due to insufficient training, lack of
network expressivity, or out-of-distribution (OOD) data
(i.e., data inconsistent with the training distribution).
Although it was shown in [
10
] that these deviations are
often negligible, verification of results requires comparing
against expensive standard samplers.
Here, we describe an efficient method to verify and cor-
rect Dingo results using importance sampling (IS) [
16
].
Starting from a collection of
n
samples
θiq
(
θ|d
)
(the “proposal”) we assign to each one an importance
weight
wi
=
p
(
d|θi
)
p
(
θi
)
/q
(
θi|d
). For a perfect pro-
posal,
wi
=
constant
, but more generally the num-
ber of effective samples is related to the variance,
neff
= (
Piwi
)
2/Pi
(
w2
i
) [
39
]. The sample efficiency
ϵ
=
neff/n
(0
,
1] arises naturally as a quality measure
of the proposal.
Importance sampling requires evaluation of
p
(
d|θ
)
p
(
θ
)
rather than the normalized posterior. The Bayesian evi-
dence can then be estimated from the normalization of
2
A similar approach using convolutional networks to parametrize
Gaussian and von Mises proposals was used to estimate the sky
position alone [
37
] Using the normalizing flow proposal (as we
do here) significantly improves the flexiblity of the conditional
density estimator and enables inference of all parameters.
the weights as
p
(
d
)=1
/n Piwi
. The standard devia-
tion of the log evidence,
σlog p(d)
=
p(1 ϵ)/(n·ϵ)
(see
Supplemental Material), scales with 1
/n
, enabling very
precise estimates. The evidence is furthermore unbiased
if the support of the posterior is fully covered by the
proposal distribution [
40
]. The log evidence does have a
bias, but this scales as 1
/n
, and in all cases considered
here is completely negligible (see Supplemental Material).
If
q
(
θ|d
) fails to cover the entire posterior, the evidence
itself would also be biased, toward lower values.
NPE is particularly well-suited for IS because of two
key properties. First, by construction the proposal has
tractable density, such that we can not only sample
from
q
(
θ|d
), but also evaluate it. Second, the NPE pro-
posal is expected to always cover the entire posterior
support. This is because, during training, NPE min-
imizes the forward KL divergence
DKL
(
p
(
θ|d
)
||q
(
θ|d
)).
This diverges unless
supp
(
p
(
θ|d
))
supp
(
q
(
θ|d
)), making
the loss “probability-mass covering”. Probability mass
coverage is not guaranteed for finite sets of samples gen-
erated with stochastic samplers like MCMC (which can
miss distributional modes), or machine learning meth-
ods with other training objectives like variational infer-
ence [12, 41, 42].
Neural importance sampling can in fact be used to
improve posterior samples from any inference method
provided the likelihood is tractable. If the method pro-
vides only samples (without density) then one must first
train an (unconditional) density estimator
q
(
θ
) (e.g., a
normalizing flow [
12
,
13
,
43
]) to use as proposal. This is
generally fast for an unconditional flow, and using the
forward KL loss guarantees that the proposal will cover
the samples. Success, however, relies on the quality of the
initial samples: if they are light-tailed, sample efficiency
will be poor, and if they are not mass-covering, the evi-
dence will be biased. Nevertheless, for initial samples that
well represent the posterior, this technique can provide
quick verification and improvement.
In the context of GWs, we refer to neural importance
sampling with Dingo as Dingo-IS. Although this tech-
nique requires likelihood evaluations at inference time,
in practice it is much faster than other likelihood-based
methods because of its high sample efficiency and par-
allelizability. Indeed, Dingo samples are independent
and identically distributed, trivially enabling full par-
allelization of likelihood evaluations. This is a crucial
advantage compared to inherently sequential methods
such as MCMC.
Results.—For our experiments, we prepare Dingo net-
works as described in [
10
], with several modifications.
First, we extend the priors over component masses to
m1, m2
[10
,
120] M
and dimensionless spin magnitudes
to
a1, a2
[0
,
0
.
99]. We also use the waveform models IM-
RPhenomXPHM [
22
] and SEOBNRv4PHM [
23
], which
include higher radiative multipoles and more realistic
precession. Finally, in addition to networks for the first
3
Mean JSD Max JSD log p(d)
Dingo 2.2 7.2 (α) -
Dingo-IS 0.5 1.4 (dL)15831.87 ±0.01
Bilby 1.8 4.0 (dL)15831.78 ±0.10
Dingo 9.0 53.4 (Mc) -
Dingo-IS 0.7 2.2 (α)16412.88 ±0.01
Bilby 1.1 4.1 (α)16412.73 ±0.09
Table I. Performance for GW150914 (upper block) and
GW151012 (lower) with waveform model IMRPhenomXPHM.
The Jensen-Shannon divergence (JSD) quantifies the deviation
from LALInference-MCMC for one-dimensional marginal
posteriors (all values in 10
3
nat). The mean is taken across all
parameters. Posteriors with a maximum JSD
2
×
10
3
nat
are considered indistinguishable [
19
]; here, maxima occur for
right ascension
α
, luminosity distance
dL
, and chirp mass
Mc
.
We also report Bilby-dynesty results.
observing run of LIGO and Virgo (O1), we also train
networks based on O3 noise. For the O3 analyses, we
found performance improved by training separate Dingo
models with distance priors [0
.
1
,
3] Gpc, [0
.
1
,
6] Gpc and
[0
.
1
,
12] Gpc. We continue to use frequency-domain strain
data in the range [20
,
1024] Hz with ∆
f
= 0
.
125 Hz and
identical data conditioning as in [
10
]. The network archi-
tecture, hyperparameters, and training algorithm are also
unchanged. We consider the two LIGO [
44
] detectors for
all analyses, and leave inclusion of Virgo [
45
] data to a
future publication of a complete catalog.
In our experiments, we found that Dingo often has
difficulty resolving the phase parameter
ϕc
. Although
ϕc
itself is of little physical interest, it is nevertheless needed
to evaluate the likelihood for importance sampling. We
therefore sample
ϕc
synthetically, by first evaluating the
likelihood across a
ϕc
grid and caching the waveform
modes for efficiency (see Supplemental Material). This
approach is similar to standard phase marginalization [
17
,
46
,
47
], but it is valid even with higher modes; it can
therefore be adapted also to stochastic samplers.
For Dingo-IS, with 10
5
proposal samples per event,
the total time for inference using one NVIDIA A100
GPU and 64 CPU cores is typically less than 1 hour for
IMRPhenomXPHM and
10 hours for SEOBNRv4PHM.
In both cases, the computation time is dominated by
waveform simulations, which could be further reduced
using more CPUs. The rest of the time is taken up to
generate the initial Dingo proposal samples.3
We first validate Dingo-IS against standard inference
codes for two real events, GW150914 and GW151012,
using IMRPhenomXPHM. (For SEOBNRv4PHM it is
3
It takes longer to generate the proposal than to produce low-
latency Dingo samples (
20
s
) because of the group-equivariant
NPE (GNPE) algorithm [
10
,
48
] (which breaks access to the
density) and the synthetic phase recovery. See Supplemental
Material for details.
LALInference (IMRPhenomXPHM)
Dingo (IMRPhenomXPHM)
0.25
0.50
0.75
q
2
4
6
α
15
18
21
24
Mc[M]
0.6
0.0
0.6
1.2
δ
0.25
0.50
0.75
q
2
4
6
α
0.6
0.0
0.6
1.2
δ
LALInference (IMRPhenomXPHM)
Dingo-IS (IMRPhenomXPHM)
Dingo-IS (SEOBNRv4PHM)
0.25
0.50
0.75
q
2
4
6
α
15
18
21
24
Mc[M]
0.6
0.0
0.6
1.2
δ
0.25
0.50
0.75
q
2
4
6
α
0.6
0.0
0.6
1.2
δ
Figure 1. Chirp mass (
Mc
), mass ratio (
q
) and sky position
(
α, δ
) parameters for GW151012, comparing inference with
Dingo and LALInference-MCMC. Even when initial Dingo
results deviate from LALInference posteriors (upper panel),
IS leads to almost perfect agreement (lower). For comparison,
the lower panel also shows results for SEOBNRv4PHM.
not feasible to run classical samplers, and one would in-
stead need to use faster methods such as RIFT [
49
,
50
].)
We generate reference posteriors using LALInference-
MCMC [
17
], and compare one-dimensional marginalized
posteriors for each parameter using the Jensen-Shannon
divergence (Tab. I). For both events, the initial small
deviations of Dingo samples from the reference are made
摘要:

NeuralImportanceSamplingforRapidandReliableGravitational-WaveInferenceMaximilianDax,1,∗StephenR.Green,2,†JonathanGair,2MichaelP¨urrer,2,3,4JonasWildberger,1JakobH.Macke,1,5AlessandraBuonanno,2,6andBernhardSch¨olkopf11MaxPlanckInstituteforIntelligentSystems,Max-Planck-Ring4,72076T¨ubingen,Germany2Max...

展开>> 收起<<
Neural Importance Sampling for Rapid and Reliable Gravitational-Wave Inference Maximilian Dax1Stephen R. Green2Jonathan Gair2Michael P urrer2 3 4 Jonas Wildberger1Jakob H. Macke1 5Alessandra Buonanno2 6and Bernhard Sch olkopf1.pdf

共15页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:15 页 大小:2.07MB 格式:PDF 时间:2025-04-24

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 15
客服
关注