Strong Gravitational Lensing Parameter Estimation with Vision Transformer Kuan-Wei Huang1 Geo Chih-Fan Chen2 Po-Wen Chang3 Sheng-Chieh

2025-05-02 0 0 1.26MB 17 页 10玖币

侵权投诉

Strong Gravitational Lensing Parameter

Estimation with Vision Transformer

Kuan-Wei Huang1,∗, Geoﬀ Chih-Fan Chen2,∗, Po-Wen Chang3, Sheng-Chieh

Lin4, Chia-Jung Hsu5, Vishal Thengane6, and Joshua Yao-Yu Lin7,∗

1Carnegie Mellon University

2University of California, Los Angeles

3Ohio State University

4University of Kentucky

5Chalmers University of Technology

6Mohamed bin Zayed University of Artiﬁcial Intelligence

7University of Illinois at Urbana-Champaign

∗equal contribution

yaoyuyl2@illinois.edu

Abstract. Quantifying the parameters and corresponding uncertainties

of hundreds of strongly lensed quasar systems holds the key to resolv-

ing one of the most important scientiﬁc questions: the Hubble constant

(H0) tension. The commonly used Markov chain Monte Carlo (MCMC)

method has been too time-consuming to achieve this goal, yet recent work

has shown that convolution neural networks (CNNs) can be an alterna-

tive with seven orders of magnitude improvement in speed. With 31,200

simulated strongly lensed quasar images, we explore the usage of Vision

Transformer (ViT) for simulated strong gravitational lensing for the ﬁrst

time. We show that ViT could reach competitive results compared with

CNNs, and is speciﬁcally good at some lensing parameters, including the

most important mass-related parameters such as the center of lens θ1and

θ2, the ellipticities e1and e2, and the radial power-law slope γ0. With

this promising preliminary result, we believe the ViT (or attention-based)

network architecture can be an important tool for strong lensing science

for the next generation of surveys. The open source of our code and data

is in https://github.com/kuanweih/strong_lensing_vit_resnet.

1 Introduction

The discovery of the accelerated expansion of the Universe [1,2] and observations

of the Cosmic Microwave Background (CMB; e.g., [3]) established the standard

cosmological paradigm: the so-called Λcold dark matter (CDM) model, where

Λrepresents a constant dark energy density. Intriguingly the recent direct 1.7%

H0measurements from Type Ia supernovae (SNe), calibrated by the traditional

Cepheid distance ladder (H0= 73.2±1.3 km s−1Mpc−1; SH0ES collaboration

[4]), show a 4.2σtension with the Planck results (H0= 67.4±0.5 km s−1Mpc−1

[5]). However, a recent measurement of H0from SNe Ia calibrated by the Tip of

the Red Giant Branch (H0= 69.8±0.8(stat) ±1.7(sys) km s−1Mpc−1; CCHP

arXiv:2210.04143v1 [astro-ph.CO] 9 Oct 2022

2 K.-W. Huang et al.

collaboration [6]) agrees with both the Planck and SH0ES results. The spread

in these results, whether due to systematic eﬀects or not, clearly demonstrates

that it is crucial to reveal unknown systematics through diﬀerent methodology.

Strongly lensed quasar system provides such a technique to constrain H0

at low redshift that is completely independent of the traditional distance ladder

approach (e.g., [7,8,9]). When a quasar is strongly lensed by a foreground galaxy,

its multiple images have light curves that are oﬀset by a well-deﬁned time delay,

which depends on the mass proﬁle of the lens and cosmological distances to

the galaxy and the quasar [10]. However the bottleneck of using strongly lensed

quasar systems is the expensive cost of computational resources and man power.

With commonly used Markov chain Monte Carlo (MCMC) procedure, modeling

single strongly lensed quasar system requires experienced modelers with a few

months eﬀort in order to obtain robust uncertainty estimations and up to years

to check the systematics (e.g., [11,12,13,14,15,16]). This is infeasible as ∼2600

of such systems with well-measured time delays are expected to be discovered in

the upcoming survey with the Large Synoptic Survey Telescope [17,18].

Fig. 1. Left panel: simulated strong lensing imaging with real point spread functions

(top two: space-based telescope images; bottom: ground-based adaptive-optics images).

Each image contains the lensing galaxy in the middle, the multiple-lensed quasar im-

ages, and the lensed background host galaxies (arc). Right panel: Vision Transformer

attention map: the overall average attentions are focusing on the strong lens system.

Each individual head is paying attention to diﬀerent subjects such as attention heads

#2 are focusing the center of lens, heads #1 and #3 are looking into particular lensed

quasars, and heads #4 are dealing with the arc.

ECCV-22 submission ID 184 3

Deep learning provides a workaround for the time-consuming lens modeling

task by directly mapping the underlying relationships between the input lens-

ing images and the corresponding lensing parameters and their uncertainties.

Hezaveh et al. [19] and Perreault Levasseur et al. [20] ﬁrst demonstrated that

convolution neural networks (CNNs) can be an alternative to the maximum

likelihood procedures with seven orders of magnitude improvement in speed.

Since then, other works adpot CNN for strong lensing science related inference

[21,22,23,24,25,26,27,28,29,30].

In this work, instead of using traditional CNN-based models, we explore the

attention-based Vision Transformer (ViT, [31,32]) that has been shown to be

more robust compared with CNN-based models [33]. Furthermore, ViT retains

more spatial information than ResNet [34] and hence is perfectly suitable for the

strong lensing imaging as the quasar conﬁguration and the spatially extended

background lensed galaxy provide rich information on the foreground mass dis-

tribution (see Figure 1).

2 Data and Models

In Section 2.1, we describe the strong lensing simulation for generating the

datasets in this work. In Section 2.2, we describe the deep learning models we use

to train on the simulated dataset for strong lensing parameters and uncertainty

estimations.

2.1 Simulation and Datasets

Simulating strong lensing imaging requires four major components: the mass dis-

tribution of the lensing galaxy, the source light distribution, the lens light distri-

bution, and the point spread function (PSF), which convolves images depending

on the atmosphere distortion and telescope structures. We use the lenstron-

omy package [35,36] to generate 31,200 strong lensing images with the cor-

responding lensing parameters for our imaging multi-regression task. For the

mass distribution, we adapt commonly used (e.g., [37,15]) elliptically symmetric

power-law distributions [38] to model the dimensionless surface mass density of

lens galaxies,

κpl(θ1, θ2) = 3−γ0

1 + q θE

pθ2

1+θ2

2/q2!γ0−1

,(1)

where γ0is the radial power-law slope (γ0= 2 corresponding to isothermal), θE

is the Einstein radius, and qis the axis ratio of the elliptical isodensity contour.

The light distribution of the lens galaxy and source galaxy are described by

elliptical S´ersic proﬁle,

IS(θ1, θ2) = Isexp 

−k

 pθ2

1+θ2

2/q2

Reﬀ !1/ns´ersic

−1



,(2)

4 K.-W. Huang et al.

where Isis the amplitude, kis a constant such that Reﬀ is the eﬀective radius, qL

is the minor-to-major axis ratio, and ns´ersic is the S´ersic index [39]. For the PSFs,

we use six diﬀerent PSF structures including three real Hubble space telescope

PSFs generated by Tinytim [40] and corrected by the real HST imaging [15],

and three adaptive-optics (AO) PSFs reconstructed from ground-based Keck

AO imaging [41,42,43]. Three example images are shown in Figure 1.

We split the whole simulated dataset of 31,200 images into a training set of

27,000 images, a validation set of 3,000 images, and a test set of 1,200 images.

We rescale each image as 3 ×224 ×224 and normalize pixel values in each

color channel by the mean [0.485,0.456,0.406] and the standard deviation

[0.229,0.224,0.225] of the datasets. Each image has eight target variables to

be predicted in this task: the Einstein radius θE, the ellipticities e1and e2, the

radial power-law slope γ0, the coordinates of mass center θ1and θ2, the eﬀective

radius Reﬀ, and the S´ersic index ns´ersic.

2.2 Models

We use the Vision Transformer (ViT) as the main model for our image multi-

regression task of strong lensing parameter estimations. Inspired by the origi-

nal Transformer models [31] for natural language processing tasks, Google Re-

search proposed the ViT models [32] for computer vision tasks. In this paper,

we leverage the base-sized ViT model (ViT-Base), which was pre-trained on the

ImageNet-21k dataset and ﬁne-tuned on the ImageNet 2012 dataset [44].

Taking advantage of the transfer learning concept, we start with the pre-

trained ViT-Base model downloaded from the module of HuggingFace’s Trans-

formers [45], and replace the last layer with a fully connected layer whose num-

ber of outputs matches the number of target variables in our regression tasks.

The ViT model we use thus has 85,814,036 trainable parameters, patch size of

16, depth of 12, and 12 attention heads.

Alongside the ViT model, we also train a ResNet152 model [46] for the same

task as a comparison between ViT and the classic benchmark CNN-based model.

We leverage the pre-trained ResNet152 model from the torchvision package

[47] and modify the last layer accordingly for our multi-regression purpose.

For regression tasks, the log-likelihood can be written as a Gaussian log-

likelihood [48]. Thus for our task of Ktargets, we use the negative log likelihood

as the loss function [20]:

Lossn=−L (yn,ˆyn,ˆsn)

2 K

k=1

e−ˆsn,k kyn,k −ˆyn,kk2+ ˆsn,k + ln 2π!(3)

where (yn,ˆyn,ˆsn) are the (target, parameter estimation, uncertainty estimation)

for the nth sample, and (yn,k, ˆyn,k, ˆsn,k) are the (target, parameter estimation,

uncertainty estimation) for the n-th sample of the k-th target. We note that in

practice, working with the log-variance ˆsn= ln ˆσ2

ninstead of the variance ˆσ2

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

StrongGravitationalLensingParameterEstimationwithVisionTransformerKuan-WeiHuang1;,GeoChih-FanChen2;,Po-WenChang3,Sheng-ChiehLin4,Chia-JungHsu5,VishalThengane6,andJoshuaYao-YuLin7;1CarnegieMellonUniversity2UniversityofCalifornia,LosAngeles3OhioStateUniversity4UniversityofKentucky5ChalmersUniversi...

展开>> 收起<<

Strong Gravitational Lensing Parameter Estimation with Vision Transformer Kuan-Wei Huang1 Geo Chih-Fan Chen2 Po-Wen Chang3 Sheng-Chieh.pdf

共17页,预览4页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Strong Gravitational Lensing Parameter Estimation with Vision Transformer Kuan-Wei Huang1 Geo Chih-Fan Chen2 Po-Wen Chang3 Sheng-Chieh

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: