What can we learn about a generated image corrupting its latent representation Agnieszka Tomczak12 Aarushi Gupta3 Slobodan Ilic12 Nassir Navab14 and

2025-04-26 0 0 1.07MB 11 页 10玖币
侵权投诉
What can we learn about a generated image
corrupting its latent representation?
Agnieszka Tomczak1,2, Aarushi Gupta3, Slobodan Ilic1,2, Nassir Navab1,4, and
Shadi Albarqouni4,5,1[0000000321572211]
1Faculty of Informatics, Technical University of Munich, Munich, Germany
2Siemens AG, Munich, Germany
3Indian Institute of Technology Delhi, New Delhi, India
4Clinic for Diagnostic and Interventional Radiology, University Hospital Bonn,
Germany
5Helmholtz AI, Helmholz Zentrum Munich, Munich, Germany
Corresponding author: Agnieszka Tomczak: a.tomczak@tum.de
Abstract. Generative adversarial networks (GANs) offer an effective
solution to the image-to-image translation problem, thereby allowing for
new possibilities in medical imaging. They can translate images from
one imaging modality to another at a low cost. For unpaired datasets,
they rely mostly on cycle loss. Despite its effectiveness in learning the
underlying data distribution, it can lead to a discrepancy between input
and output data. The purpose of this work is to investigate the hypothesis
that we can predict image quality based on its latent representation in the
GANs bottleneck. We achieve this by corrupting the latent representation
with noise and generating multiple outputs. The degree of differences
between them is interpreted as the strength of the representation: the
more robust the latent representation, the fewer changes in the output
image the corruption causes. Our results demonstrate that our proposed
method has the ability to i) predict uncertain parts of synthesized images,
and ii) identify samples that may not be reliable for downstream tasks,
e.g., liver segmentation task.
Keywords: GANs ·Image Synthesis ·Uncertainty ·Image quality.
1 Introduction
Generative Adversarial Networks (GANs) [8] are state-of-the-art methods for
image-to-image translation problems. It has been found that GANs are a promis-
ing technique for generating images of one modality based on another. Creating
such images in a clinical setting could be highly effective, but only if the im-
ages retain anatomical details and serve the downstream task. As Cohen et
al. [5] described, GANs based on cycle consistency can "hallucinate" features
(for example tumors) causing potentially wrong diagnoses. A clinically useful
example of modality translation would be generating Computer Tomography
(CT) images from Magnetic Resonance Imaging (MRI) scans, and vice versa.
arXiv:2210.06257v1 [cs.CV] 12 Oct 2022
2 A. Tomczak et al.
This problem has already been investigated multiple times [3,23,6,4,7,20] as an
image-to-image translation task with multiple approaches: some of them focus-
ing on shape consistency to preserve anatomical features [21,23,6,7,11], others
proposing a multimodal approach to deal with the scalability issues [19,12,16].
Nevertheless, the main challenge remains: how to determine when a generative
adversarial network can be trusted? The issue has a considerable impact on the
medical field, where generated images with fabricated features have no clinical
value. Recently, Upadhyay et al. [18] tackled this problem by predicting not only
the output images but also the corresponding aleatoric uncertainty and then
using it to guide the GAN to improve the final output. Their method requires
changes in the optimization process (additional loss terms) and network archi-
tecture (additional output). They showed that for paired datasets it results in
improved image quality, but did not clearly address the point that in medical
imaging the visual quality of images does not always transfer to the performance
on a downstream task. Our goal was to examine this problem from a different
perspective and test the hypothesis: the more robust the image representation,
the better the quality of the generated output and the end result. To this end,
we present a noise injection technique that allows generating multiple outputs,
thus quantifying the differences between these outputs and providing a confi-
dence score that can be used to determine the uncertain parts of the generated
image, the quality of the generated sample, and to some extent their impact on
a downstream task.
2 Methodology
We design a method to test the assumption that the stronger the latent rep-
resentation the better the quality of a generated image. In order to check the
validity of this statement, we corrupt the latent representation of an image with
noise drawn from normal distribution and see how it influences the generated
output image. In other terms, given an image xX, domain yYand a Gen-
erative Adversarial Network G, we assume a hidden representation h=E(x)
with dimensions n, m, l, where Estands for the encoding part of G, and Dfor
the decoding part. We denote the generated image as ˆx. Next, we construct k
corrupted representation latent codes ˆ
h, adding to hnoise vector η:
η1,..,k N(0, ασ2
h1,...,l )(1)
where σ2
h1,...,l is channel-wise standard deviation of input representation h. We
can control the noise level with factor α. Before we add the noise vector, we
eliminate the background noise with operation bin by masking it with zeros for
all the channels where the output pixels are equal to zero, so do not contain any
information.
bin(h) = h[ˆx > 0]
ˆ
h1,...,k =h1+bin(h1)η1, ..., hk+bin(hk)ηk
(2)
Title Suppressed Due to Excessive Length 3
Having now multiple representations for a single input image we can pass them
to decoder Dand generate multiple outputs:
ˆx1,...,k =D(ˆ
h1), ..., D(ˆ
hk)(3)
We use the multiple outputs to quantify the uncertainty connected with the rep-
resentation of a given image. We calculate two scores: the variance (the average
of the squared deviations from the mean) γof our kgenerated images
γ=V ar( ˆx1, .., ˆxk)(4)
and the Mutual Information (MI) between the multiple outputs and our primary
output ˆxproduced without noise injection.
MI(X;Y) = X
yYX
xX
p(x, y) log p(x, y)
p(x)p(y)
δ=1
k
k
X
0
MIˆx, ( ˆxi)
(5)
We interpret the γand δas the measures of the representation quality. The
variance γcan be considered as an uncertainty score - the higher the variance
of generated outputs with the corrupt representations, the more uncertain the
encoder is about produced representation. On the other hand, the MI δscore can
be interpreted as a confidence score, quantifying how much information is pre-
served between the original output ˆxand the outputs produced from corrupted
representations ˆx1, ..., ˆxk. We calculate the MI based on a joint (2D) histogram,
with number of bins equal to bpn/5c, where n is a number of pixels per image
as proposed by [2].
3 Experiments and Results
We conducted a number of experiments using state-of-the-art architectures with
the goal of demonstrating the effectiveness of our proposed method and confirm-
ing our hypothesis that the stronger the latent representation, the better and
more reliable the image quality. Our proposed method was evaluated on two
publicly available datasets, namely CHAOS [14] and LiTS [1] datasets.
3.1 Network Architectures and implementation details
TarGAN As our main baseline we use TarGAN [3] network which uses a shape-
consistency loss to preserve the shape of the liver. We trained the model for
100 epochs. We kept all the parameters unchanged with respect to the official
implementation provided by the authors of TarGAN. We use PyTorch 1.10 to
implement all the models and experiments. During inference, we constructed
k= 10 corrupted representation with noise level α= 3 and used them for
evaluation of our method.
摘要:

Whatcanwelearnaboutageneratedimagecorruptingitslatentrepresentation?AgnieszkaTomczak1;2,AarushiGupta3,SlobodanIlic1;2,NassirNavab1;4,andShadiAlbarqouni4;5;1[0000000321572211]1FacultyofInformatics,TechnicalUniversityofMunich,Munich,Germany2SiemensAG,Munich,Germany3IndianInstituteofTechnologyDelhi,New...

展开>> 收起<<
What can we learn about a generated image corrupting its latent representation Agnieszka Tomczak12 Aarushi Gupta3 Slobodan Ilic12 Nassir Navab14 and.pdf

共11页,预览3页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:11 页 大小:1.07MB 格式:PDF 时间:2025-04-26

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 11
客服
关注