T2CI-GAN Text to Compressed Image generation using Generative Adversarial Network

2025-05-02 0 0 1.68MB 16 页 10玖币
侵权投诉
T2CI-GAN: Text to Compressed Image
generation using Generative Adversarial
Network
Bulla Rajesh1,2[0000000257319755], Nandakishore Dusa1, Mohammed
Javed1[0000000230197401], Shiv Ram Dubey1[0000000245328996], and P.
Nagabhushan1,2
1Department of IT, IIIT Allahabad, Prayagraj, U.P, 211015, Idia
2Department of CSE, Vignan University, Guntur, A.P, 522213, India
{rsi2018007, iwm2016002, javed, srdubey, pnagabhushan}@iiita.ac.in
Abstract. The problem of generating textual descriptions for the vi-
sual data has gained research attention in the recent years. In contrast
to that the problem of generating visual data from textual descriptions
is still very challenging, because it requires the combination of both Nat-
ural Language Processing (NLP) and Computer Vision techniques. The
existing methods utilize the Generative Adversarial Networks (GANs)
and generate the uncompressed images from textual description. How-
ever, in practice, most of the visual data are processed and transmitted
in the compressed representation. Hence, the proposed work attempts to
generate the visual data directly in the compressed representation form
using Deep Convolutional GANs (DCGANs) to achieve the storage and
computational efficiency. We propose GAN models for compressed image
generation from text. The first model is directly trained with JPEG com-
pressed DCT images (compressed domain) to generate the compressed
images from text descriptions. The second model is trained with RGB
images (pixel domain) to generate JPEG compressed DCT representa-
tion from text descriptions. The proposed models are tested on an open
source benchmark dataset Oxford-102 Flower images using both RGB
and JPEG compressed versions, and accomplished the state-of-the-art
performance in the JPEG compressed domain. The code will be publicly
released at GitHub after acceptance of paper.
Keywords: Compressed Domain ·Deep Learning ·DCT Coefficients ·
T2CI-GAN ·JPEG Compression ·Compressed Domain Pattern Recog-
nition ·Text to Compressed Image.
1 Introduction
Generating visually realistic images based on the natural text descriptions is an
interesting research problem that warrants knowledge of both language process-
ing and computer vision. Unlike the problem of image captioning that generates
arXiv:2210.03734v1 [cs.CV] 1 Oct 2022
2 Bulla. Rajesh et al.
Fig. 1. JPEG Compression and Decompression architecture and extraction of JPEG
Compressed DCT image which is used in the proposed approach.
text descriptions from image, the challenge here is to generate semantically suit-
able images based on proper understanding of the text descriptions. Many inter-
esting techniques have been proposed in the literature to explore the problem of
generating pixel images from the given input texts [20], [27], [26], [16]. Moreover,
a very recent attempt by [11] is aimed to generate images in the compressed for-
mat. The whole idea here is to avoid synthesis of RGB images and subsequent
compression stage. In fact, in the current digital scenario, more and more images
and image frames (videos) are being stored and transmitted in compressed rep-
resentation. The compressed data in the internet world has reached more than
90% [19] of traffic. On the other hand, different compressed domain technologies
are being explored both by the software giants, like Uber [4] and Xerox [17], and
academia [9], [13], [23], [2], that can directly process and analyse compressed
data without decompression and re-compression. Some of the prominent works
in compressed document images are discussed in [7,8,10] and [19,18]. This gives
us strong motivation for exploring the idea of generating compressed images
directly from natural text descriptions, and that is attempted in this research
paper.
Recently, Generative Adversarial Network (GAN) models have been success-
fully used for generating realistic images from diverse inputs such as layouts [5],
texts[25], and scenes [1]. However, early GAN models [20] have generated im-
ages of low resolutions from the input text. In [20], the GAN model was used
to generate image from a single sentence. This method was implemented in two
stages. Initially the text sentence was encoded into a feature matrix using deep
CNNs and RNNs to extract the significant features. Then those features were
utilized to generate a picture. In order to improve the quality, a stacked GAN
was reported in [27]. It generated the output picture using two GANs. In the
first step, GAN-1 produced a low resolution image with basic shape and colors
along with the background generated from a random noise vector. In the second
Title Suppressed Due to Excessive Length 3
step, GAN-2 improvised the produced image by adding details and making some
required corrections. MirrorGAN was reported in [16] for text to image trans-
lation through re-description. This model has reported the improved semantic
consistency between text and produced output image. In [26], authors proposed
a Semantics Disentangling Generative Adversarial Network (SD-GAN) which
exploited the semantics of text description. However, all the GAN based tech-
niques discussed above were trained using RGB pixel images meant to generate
RGB images. Hence, our work is focused on employing the significant features of
GAN for generating compressed images directly from the given text descriptions.
In the recent literature, a GAN model was proposed for generating direct
compressed images from noise vector [11]. Since JPEG compression was the
most used format, the authors attempted to generate direct JPEG compressed
images rather than generating RGB images and compressing them separately.
Their GAN framework consists of Generator, Decoder and Discriminator sub
networks. The Generator consists of locally connected layers, quantization lay-
ers, and chroma subsampling layers. These locally connected layers perform the
block based operations similar to JPEG compression methods to generate JPEG
compressed images. In between the Generator and the Discriminator, a Decoder
was used to decompress the image to facilitate the comparison with ground truth
RGB image by the Discriminator network. In specific, this decoder performed
de-quantization and Inverse Discrete Cosine Transformation (IDCT) followed
by YCbCr to RGB transformations on the compressed images generated by the
Generator. Unlike [11] which generates the compressed images from noise, our
model generates the compressed images based on the given input text descrip-
tions.
Overall, this research paper propose two novel GAN models for generating
compressed images from text descriptions. The first GAN model is trained di-
rectly with JPEG compressed DCT images to generate compressed images from
text description. The second GAN model is trained with RGB images to gener-
ate compressed images from text descriptions. The proposed models have been
tested on Oxford-102 Flower images benchmark dataset using both the RGB and
JPEG compressed versions, reporting state-of-the-art performance in the com-
pressed domain. Rest of the paper is organized as follows: Section II presents the
preliminaries of used concepts. Section III discusses the proposed methodology
and GAN architectures. Section IV reports the detailed experimental results and
analysis. Finally, Section V concludes the paper with a summary.
2 Preliminaries
In this section, a brief description of JPEG compression, GAN model and GloVe
model is presented.
2.1 JPEG Compression
JPEG compression algorithm achieves compression by discarding the high fre-
quency components. Firstly, the RGB channels of the image are converted into
4 Bulla. Rajesh et al.
YCbCr format to separate the luminance (Y) and chrominance (CbCr) channels
as, Y= (0.299 ×r+ 0.587 ×g+ 0.114 ×b) (1)
Cb = (0.1687 ×r0.3313 ×g+ 0.5×b+ 128) (2)
Cr = (0.5×r0.4187 ×g0.0813 ×b+ 128).(3)
Then each channel is divided into 8×8 non-overlapping pixel blocks. Forward Dis-
crete Cosine Transform (DCT) is applied on each block in each channel to convert
the 8 ×8 pixel block (let’s say P(x, y)) from spatial domain to frequency do-
main. Each DCT block, i.e., F(u, v), is quantized to keep only the low frequency
coefficients. Then Differential pulse code modulation (DPCM) is applied on the
DC components and Run Length Encoding (RLE) on AC components. Huffman
Coding is used to encode the DC and AC components in smaller number of bits.
In order to perform the decompression, Entropy decoding, De-Quantization, and
Inverse DCT (IDCT) are applied in the given order on the compressed image to
obtain the uncompressed image. The compression and decompression stages are
illustrated in Fig. 1. In the proposed work, the JPEG compressed DCT images
are directly extracted from the JPEG compressed stream and used for training
the deep learning model. The decompression is done only for the performance
analysis, otherwise it is not required in practice.
2.2 Generative Adversarial Network (GAN)
Generative Adversarial Network (GAN) [3] is a deep learning model built with
two networks, including Generator and Discriminator. The Generator (G) gen-
erates new images in the training images distribution and the Discriminator (D)
classifies the images between actual and generated images into real and fake cat-
egories, respectively. These two sub models are trained alternatively such that
Generator(G) tries to fool the Discriminator by generating data similar to real
domain, whereas the Discriminator is optimized to distinguish the generated im-
ages from the real images. Overall, the Generator and the Discriminator play a
two player min-max game. The objective function of the GAN is given as follows:
min
Gmax
DF(G, D) = Eykd[log D(x)]+
Ezkz[log(1 D(G(z))] (4)
where yindicates real image sampled from kd(true data distribution), zindicates
noise vector sampled from kz(uniform or Gaussian distribution).
The Conditional GAN model [12] makes use of some additional information
along with the noise. Both Generator (G) and Discriminator (D) use this ad-
ditional information which is referred as conditioning variable ‘c’ that can be
text or any other data. Thus, the Generator on Conditional GAN generates the
images conditioned on variable ‘c’ as depicted in Fig. 2.
摘要:

T2CI-GAN:TexttoCompressedImagegenerationusingGenerativeAdversarialNetworkBullaRajesh1;2[0000000257319755],NandakishoreDusa1,MohammedJaved1[0000000230197401],ShivRamDubey1[0000000245328996],andP.Nagabhushan1;21DepartmentofIT,IIITAllahabad,Prayagraj,U.P,211015,Idia2DepartmentofCSE,VignanUniversity,Gun...

展开>> 收起<<
T2CI-GAN Text to Compressed Image generation using Generative Adversarial Network.pdf

共16页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:图书资源 价格:10玖币 属性:16 页 大小:1.68MB 格式:PDF 时间:2025-05-02

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 16
客服
关注