PalGAN Image Colorization with Palette Generative Adversarial Networks Yi Wang1 Menghan Xia2 Lu Qi3 Jing Shao4 and Yu Qiao1

2025-05-06 0 0 4.67MB 17 页 10玖币
侵权投诉
PalGAN: Image Colorization with Palette
Generative Adversarial Networks
Yi Wang1, Menghan Xia2, Lu Qi3, Jing Shao4, and Yu Qiao1
1Shanghai AI Laboratory 2Tencent AI Lab 3CUHK 4SenseTime Research
{wangyi,qiaoyu}@pjlab.org.cn menghanxyz@gmail.com luqi@cse.cuhk.edu.hk
shaojing@senseauto.com
Fig. 1: Our colorization results. 1st row: inputs, and 2nd row: our predictions.
Abstract. Multimodal ambiguity and color bleeding remain challenging
in colorization. To tackle these problems, we propose a new GAN-based
colorization approach PalGAN, integrated with palette estimation and
chromatic attention. To circumvent the multimodality issue, we present a
new colorization formulation that estimates a probabilistic palette from
the input gray image first, then conducts color assignment conditioned on
the palette through a generative model. Further, we handle color bleed-
ing with chromatic attention. It studies color affinities by considering
both semantic and intensity correlation. In extensive experiments, Pal-
GAN outperforms state-of-the-arts in quantitative evaluation and visual
comparison, delivering notable diverse, contrastive, and edge-preserving
appearances. With the palette design, our method enables color transfer
between images even with irrelevant contexts.
Keywords: Image Colorization, Generative Adversarial Networks, At-
tention, Color Transfer
1 Introduction
Colorization means to predict the missing chrome information from the given
gray image. It is an interesting and practical task in computer vision, widely used
in legacy footage processing [27], color transfer [1,39], and other visual editing
applications [3,52]. It is also exploited as a proxy task for self-supervised learning
[25], since predicting perceptually natural colors from the given grayscale image
Corresponding author
arXiv:2210.11204v1 [cs.CV] 20 Oct 2022
2 Y. Wang et al.
heavily relies on scene understanding. However, even the ground-truth color
is available for supervision, it is still very challenging to predict pixel colors
from gray images, due to the ill-posed nature that one input grayscale could
correspond to multiple possible color variants.
Most current methods [54,56,26,12,23,38,49,17,3] formulate colorization as a
pixel-level regression task, suffering from multimodal representation more or less.
With the large-scale training data and end-to-end learning models, they can learn
the color distribution prior conveniently, e.g. vegetation greenish tones, human
skin colors, etc.. Anyhow, when it comes to objects with inherently color am-
biguity (e.g. human clothes, cars, and other man-made stuff), these approaches
tend to predict the brownish average colors. To tackle such multi-modality, re-
searches [54,56,24] proposed to formulate the color prediction as pixel-level color
classification, which allows multiple colors to be assigned to each pixel based
on posterior probability. Unfortunately, these suffer from regional color incon-
sistency due to the independent pixel-wise sampling mechanism. In this regard,
means of utilizing the sequential modeling [12,23] can only partially help the
sampling issue, because the unidirectional sequential dependence of 2D flattened
pixel primitives causes error accumulation and hinders the learning efficiency.
Apart from the multimodal issue, color bleeding is another common issue
in colorization due to inaccurate identification of semantic boundaries. To sup-
press such visual artifacts, most works [54,56,26,38,49,17,3] resort to Generative
adversarial networks (GAN) to encourage the generated chrome distribution to
be indistinguishable from that of the real-life color images. Currently, no spe-
cial algorithms or modules for deep models have been proposed to enhance the
performance of this aspect, which matters the visual pleasantness considerably.
To avoid modeling the color multimodality pixel-wisely, we propose a new
colorization framework PalGAN that predicts the pixel colors in a coarse-to-fine
paradigm. The key idea is to first predict the global palette probability (e.g.
palette histogram) from the grayscale. It does not collapse into a single specific
colorization solution but represents a certain color distribution of the potential
color variants. Then, the uncertainty about the per-pixel color assignment is
modeled with a generative model in the GAN framework, conditioned on the
grayscale and palette histogram. Therefore, multiple colorization results could
be achieved by changing the palette histogram input.
To guarantee the color assignment with semantic correctness and regional
consistency, we study color affinities by a proposed chromatic attention mod-
ule. It explicitly aligns color affinity with both semantics and low-level charac-
teristics. In structure, chromatic attention includes global interaction and lo-
cal delineation. The former enables global context utilization for color infer-
ence by using semantic features in the attention mechanism. The latter pre-
serves regional details by mapping the gray input to color through local affine
transformation. The transformation is explicitly parameterized by the correla-
tion between gray input and color feature. Experiments illustrate the effective-
ness of our method. It achieves impressive visual results (Fig. 1) and quantita-
tive superiority over state-of-the-art approaches over ImageNet [9] and COCO-
PalGAN: Image Colorization with Palette Generative Adversarial Networks 3
Stuff [5]. Our method also works well with the user-specified palette histogram
from a reference image, which could even have no content correlation with the
input grayscale. So, by nature, our method supports diverse coloring results
with certain controllability. Our code and pretrained models are available in
https://github.com/shepnerd/PalGAN.
Generally, our contributions are three-fold: i) We propose a new coloriza-
tion framework PalGAN that decomposes colorization to palette estimation and
pixel-wise assignment. It circumvents the challenges of color ambiguity and re-
gional homogeneity effectively, and supports diverse and controllable colorization
by nature. ii) We explore the less-touched color affinities and propose an effec-
tive module named chromatic attention. It considers both semantic and local
detail correspondence, applying such correlations to color generation. It allevi-
ates notable color bleeding effects. iii) Our method surpasses state-of-the-arts
in perceptual quality (FID [16] and LPIPS [55]) notably. It is known that there
exists a trade-off between perceptual and fidelity results in multiple low-level
tasks. We argue perceptual effects matter more than fidelity as colorization aims
to produce realistic colorized results rather than restore identical pixel-wise col-
ors as the ground truth. Regardless, our method can achieve best both fidelity
(PSNR and SSIM) and perceptual performance with proper tuning.
2 Related Work
2.1 Colorization
User Guided Colorization Some of early works [7,8,18,29,39,47,21,6,34] in col-
orization turn to a reference image for transferring its color statistics to the
given gray one. With the prevalence of deep learning, such color transfer is char-
acterized in neural feature space for introducing semantic consistency [15]. These
works perform decently when the reference and input share similar semantics.
Its applications are limited by the reference retrieval quality, especially when
handling complicated scenes.
Besides of reference images, several systems require users to give sufficient
local color hints (usually in scribble form) before colorizing inputs [27,37,52,21].
Then approaches propagate the given colors based on their local affinities. Be-
sides, some attempts are made [3] to explore other modalities like languages to
instruct what colors are used and how they are distributed.
Learning-based Colorization This line of work [54,56,10,17,24,19] gives colorful
images only from the gray inputs, learning a pixel-to-pixel mapping. Large-scale
datasets are exploited in a self-supervised fashion, converting colorful pictures
to gray ones for pair-wise training. Iizuka et. al. [17] utilize image-level labels for
associating predicted color with global semantics, using a global-and-local con-
volutional neural network. Larsson et. al. [24] and Zhang et. al. [54] introduce
pixel-level color distribution matching by classification, alleviating color unbal-
ance and multi-modal outputs. Besides, extra input hints are integrated into
learning systems by simulation in [56], providing automatic and semi-automatic
ways to colorize images. Recently, transformer architectures are explored for this
task considering their expressiveness on non-local modeling [23].
4 Y. Wang et al.
EGD
Forward flow
Conv CA PalNorm Palette Concatenation
T
or
F
L
Fig. 2: Our colorization system framework.
Some work explicitly exploits additional priors from pretrained models for
colorization. Su et. al. [38] study leveraging instance-level annotations (e.g., in-
stance bounding boxes and classes) by using an off-the-shelf detector. It will make
the colorization model focuses on color rendering without the need of recogniz-
ing high-level semantics. In addition to the mentioned pretrained discriminative
models, pretrained generative ones are also exploitable in improving colorization
performance in diversity. Wu et. al. [49] explore to incorporate generative color
prior from a pretrained BigGAN [4] to help a deep model produce colored re-
sults with diversities. They design an extra encoder to project the given gray
image into latent code, then estimate colorful images from BigGAN. With such
primary predictions, they further refine the color results by the intermediate
features in BigGAN. Afifi et. al. [1] propose employing a pretrained StyleGAN
[20] for image recoloring, and color is controlled by histogram features.
2.2 GAN-based Image-to-image Translation
Image-to-image translation aims to learn the transformation between the input
and output image. Colorization can be formulated to this task and handled by
Generative Adversarial Networks [11] (GAN) based approaches [19,41,35,30,44].
They employ an adversarial loss that learns to discriminate between real and
generated images, and then minimize this loss by updating the generator to
make the produced results look realistic [57,28,31,50,36,51,45,46,42].
3 Method
PalGAN aims to colorize grayscale images. It formulates colorization as a palette
prediction and assignment problem. Compared with directly learning the pixel-
to-pixel mapping from gray to color as adopted by most learning-based methods,
this disentanglement fashion not only brings empirical colorization improvements
(Section 4), but also enables us to manipulate global color distributions by ad-
justing or regularizing palettes.
For PalGAN, its input is a grayscale image (i.e. the luminance channel of
color images) L∈ Rh×w×1, and the output is the estimated chromatic map
ˆ
C∈ Rh×w×2that will be used as the complementary ab channels together with
Lin CIE Lab color space. PalGAN consists of palette generator TE, palette
assignment generator TG, and a color discriminator D. In inference, only TE
and TGare employed. The whole framework is given in Fig. 2.
摘要:

PalGAN:ImageColorizationwithPaletteGenerativeAdversarialNetworksYiWang1,MenghanXia2,LuQi3,JingShao4,andYuQiao11ShanghaiAILaboratory2TencentAILab3CUHK4SenseTimeResearch{wangyi,qiaoyu}@pjlab.org.cnmenghanxyz@gmail.comluqi@cse.cuhk.edu.hkshaojing@senseauto.comFig.1:Ourcolorizationresults.1strow:inputs,...

收起<<
PalGAN Image Colorization with Palette Generative Adversarial Networks Yi Wang1 Menghan Xia2 Lu Qi3 Jing Shao4 and Yu Qiao1.pdf

共17页,预览4页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!

相关推荐

分类:图书资源 价格:10玖币 属性:17 页 大小:4.67MB 格式:PDF 时间:2025-05-06

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 17
客服
关注