What can we learn about a generated image corrupting its latent representation Agnieszka Tomczak12 Aarushi Gupta3 Slobodan Ilic12 Nassir Navab14 and

2025-04-26 0 0 1.07MB 11 页 10玖币

侵权投诉

What can we learn about a generated image

corrupting its latent representation?

Agnieszka Tomczak1,2, Aarushi Gupta3, Slobodan Ilic1,2, Nassir Navab1,4, and

Shadi Albarqouni4,5,1[0000−0003−2157−2211]

1Faculty of Informatics, Technical University of Munich, Munich, Germany

2Siemens AG, Munich, Germany

3Indian Institute of Technology Delhi, New Delhi, India

4Clinic for Diagnostic and Interventional Radiology, University Hospital Bonn,

Germany

5Helmholtz AI, Helmholz Zentrum Munich, Munich, Germany

Corresponding author: Agnieszka Tomczak: a.tomczak@tum.de

Abstract. Generative adversarial networks (GANs) oﬀer an eﬀective

solution to the image-to-image translation problem, thereby allowing for

new possibilities in medical imaging. They can translate images from

one imaging modality to another at a low cost. For unpaired datasets,

they rely mostly on cycle loss. Despite its eﬀectiveness in learning the

underlying data distribution, it can lead to a discrepancy between input

and output data. The purpose of this work is to investigate the hypothesis

that we can predict image quality based on its latent representation in the

GANs bottleneck. We achieve this by corrupting the latent representation

with noise and generating multiple outputs. The degree of diﬀerences

between them is interpreted as the strength of the representation: the

more robust the latent representation, the fewer changes in the output

image the corruption causes. Our results demonstrate that our proposed

method has the ability to i) predict uncertain parts of synthesized images,

and ii) identify samples that may not be reliable for downstream tasks,

e.g., liver segmentation task.

Keywords: GANs ·Image Synthesis ·Uncertainty ·Image quality.

1 Introduction

Generative Adversarial Networks (GANs) [8] are state-of-the-art methods for

image-to-image translation problems. It has been found that GANs are a promis-

ing technique for generating images of one modality based on another. Creating

such images in a clinical setting could be highly eﬀective, but only if the im-

ages retain anatomical details and serve the downstream task. As Cohen et

al. [5] described, GANs based on cycle consistency can "hallucinate" features

(for example tumors) causing potentially wrong diagnoses. A clinically useful

example of modality translation would be generating Computer Tomography

(CT) images from Magnetic Resonance Imaging (MRI) scans, and vice versa.

arXiv:2210.06257v1 [cs.CV] 12 Oct 2022

2 A. Tomczak et al.

This problem has already been investigated multiple times [3,23,6,4,7,20] as an

image-to-image translation task with multiple approaches: some of them focus-

ing on shape consistency to preserve anatomical features [21,23,6,7,11], others

proposing a multimodal approach to deal with the scalability issues [19,12,16].

Nevertheless, the main challenge remains: how to determine when a generative

adversarial network can be trusted? The issue has a considerable impact on the

medical ﬁeld, where generated images with fabricated features have no clinical

value. Recently, Upadhyay et al. [18] tackled this problem by predicting not only

the output images but also the corresponding aleatoric uncertainty and then

using it to guide the GAN to improve the ﬁnal output. Their method requires

changes in the optimization process (additional loss terms) and network archi-

tecture (additional output). They showed that for paired datasets it results in

improved image quality, but did not clearly address the point that in medical

imaging the visual quality of images does not always transfer to the performance

on a downstream task. Our goal was to examine this problem from a diﬀerent

perspective and test the hypothesis: the more robust the image representation,

the better the quality of the generated output and the end result. To this end,

we present a noise injection technique that allows generating multiple outputs,

thus quantifying the diﬀerences between these outputs and providing a conﬁ-

dence score that can be used to determine the uncertain parts of the generated

image, the quality of the generated sample, and to some extent their impact on

a downstream task.

2 Methodology

We design a method to test the assumption that the stronger the latent rep-

resentation the better the quality of a generated image. In order to check the

validity of this statement, we corrupt the latent representation of an image with

noise drawn from normal distribution and see how it inﬂuences the generated

output image. In other terms, given an image x∈X, domain y∈Yand a Gen-

erative Adversarial Network G, we assume a hidden representation h=E(x)

with dimensions n, m, l, where Estands for the encoding part of G, and Dfor

the decoding part. We denote the generated image as ˆx. Next, we construct k

corrupted representation latent codes ˆ

h, adding to hnoise vector η:

η1,..,k ∼N(0, ασ2

h1,...,l )(1)

where σ2

h1,...,l is channel-wise standard deviation of input representation h. We

can control the noise level with factor α. Before we add the noise vector, we

eliminate the background noise with operation bin by masking it with zeros for

all the channels where the output pixels are equal to zero, so do not contain any

information.

bin(h) = h[ˆx > 0]

h1,...,k =h1+bin(h1)η1, ..., hk+bin(hk)ηk

(2)

Title Suppressed Due to Excessive Length 3

Having now multiple representations for a single input image we can pass them

to decoder Dand generate multiple outputs:

ˆx1,...,k =D(ˆ

h1), ..., D(ˆ

hk)(3)

We use the multiple outputs to quantify the uncertainty connected with the rep-

resentation of a given image. We calculate two scores: the variance (the average

of the squared deviations from the mean) γof our kgenerated images

γ=V ar( ˆx1, .., ˆxk)(4)

and the Mutual Information (MI) between the multiple outputs and our primary

output ˆxproduced without noise injection.

MI(X;Y) = X

y∈YX

x∈X

p(x, y) log p(x, y)

p(x)p(y)

δ=1

MIˆx, ( ˆxi)

(5)

We interpret the γand δas the measures of the representation quality. The

variance γcan be considered as an uncertainty score - the higher the variance

of generated outputs with the corrupt representations, the more uncertain the

encoder is about produced representation. On the other hand, the MI δscore can

be interpreted as a conﬁdence score, quantifying how much information is pre-

served between the original output ˆxand the outputs produced from corrupted

representations ˆx1, ..., ˆxk. We calculate the MI based on a joint (2D) histogram,

with number of bins equal to bpn/5c, where n is a number of pixels per image

as proposed by [2].

3 Experiments and Results

We conducted a number of experiments using state-of-the-art architectures with

the goal of demonstrating the eﬀectiveness of our proposed method and conﬁrm-

ing our hypothesis that the stronger the latent representation, the better and

more reliable the image quality. Our proposed method was evaluated on two

publicly available datasets, namely CHAOS [14] and LiTS [1] datasets.

3.1 Network Architectures and implementation details

TarGAN As our main baseline we use TarGAN [3] network which uses a shape-

consistency loss to preserve the shape of the liver. We trained the model for

100 epochs. We kept all the parameters unchanged with respect to the oﬃcial

implementation provided by the authors of TarGAN. We use PyTorch 1.10 to

implement all the models and experiments. During inference, we constructed

k= 10 corrupted representation with noise level α= 3 and used them for

evaluation of our method.

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

Whatcanwelearnaboutageneratedimagecorruptingitslatentrepresentation?AgnieszkaTomczak1;2,AarushiGupta3,SlobodanIlic1;2,NassirNavab1;4,andShadiAlbarqouni4;5;1[0000000321572211]1FacultyofInformatics,TechnicalUniversityofMunich,Munich,Germany2SiemensAG,Munich,Germany3IndianInstituteofTechnologyDelhi,New...

展开>> 收起<<

What can we learn about a generated image corrupting its latent representation Agnieszka Tomczak12 Aarushi Gupta3 Slobodan Ilic12 Nassir Navab14 and.pdf

共11页,预览3页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

What can we learn about a generated image corrupting its latent representation Agnieszka Tomczak12 Aarushi Gupta3 Slobodan Ilic12 Nassir Navab14 and

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: